Code Coverage Conundrum

I once received an email that said:

Please be prepared to explain how you will increase your unit test coverage to  target %

A week later on another project I saw something similar:

Code coverage is “green”, 100% test coverage.

So here is the conundrum, test automation is one of the best ways I know to design, build, and create a sustainable system.  So isn’t more better and wouldn’t 100% be nirvana? Nope.  Remember this is a conundrum (what a fun word to say, conundrum, conundrum, conundrum).  Here is the conundrum with code coverage:

Goodhart’s Law: Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.  – George Dinwiddie

Now George was stating the above in the context of velocity, but if you read his forward to the book, Software Development Metrics, you will see that he applies that statement more broadly to any metric.

In my experience, individuals who grasp at a desire for increased code coverage are under pressure from people who understand managing risk more than they understand managing software development.  Which presents two issues: 1) dealing with the reality of these people (and the reality that you usually can’t ignore them) and 2) how to measure the effectiveness of your unit testing.

First, let’s explore the space of an objective metric that allows everyone to sleep better at night.  Sorry there isn’t one–blame Goodhart.  So the cheap answer is that we right click our project in eclipse and generate test automation stubs which immediately turns our code coverage 100% green. #GoTeamGoodhart.  Really, isn’t there a better way?  Sorry, any way you slice or dice it, Goodhart’s law comes into play.  No matter how clever you are in trying to get code coverage to be a meaningful metric, human nature will kick in.

Warning: Rabbit Hole Ahead (feel free to skip over)

Creating automated tests are a Quality Assurance (QA) activity.  They help the team understand the software problem better.  When you inspect code coverage, you are critiquing the “product” with the product being the automated tests.  This inspection is a Quality Control activity.  Perhaps the reason why code coverage fails is that you are using a QC technique on a QA technique.  QC should always be about the product, not the process.

Now let’s move onto something a bit more useful, if we can’t use code coverage to measure the effectiveness of our unit testing, then what can we use?  Let’s try the 5 whys technique and see where it takes us.

  1. Why do I want 100% unit test coverage?  Because unit tests are good.  They help me create a simple design, understand requirements, form a regression test suite, increase my mpg, give me confidence in my builds, etc.
  2. Why do I want these good things? I want to deploy early, frequently, and embrace change.  Those agile principles are hard (impossible) to achieve without automated tests.
  3. Why can’t I achieve those principles without unit tests?  Well I can, but my chances of success, especially in the long term, increase (almost in lockstep) with the effectiveness of my unit test program.
  4. Ok, maybe I asked the wrong question. Why do I want to deploy more frequently?  Good things happen when I deploy more frequently, most importantly my feedback loops shorten and I accelerate learning.
  5. Hmm, I think I’m getting closer (which good because I’m at my fifth why).  Why do unit tests accelerate learning?  Well they accelerate learning both as I build the functionality by understanding and thinking about the software’s behavior and once built by providing stable builds that end-users can give me focused feedback.

So now that I understand my goal better: accelerate learning, how do i measure if learning is occuring?  Measuring learning is much more complicated than clicking a button to view a dashboard widget, but I suggest that is the core problem; doing what is easy (looking at a meaningless dashboard widget) versus what is difficult, but meaningful (measuring learning).  How do you measure learning?  That is a highly debated issue, which translates into, no one has yet figured that out. So while we may not know how to measure something, we at least know what not to measure.