Goading the IT Geek: Code Coverage Metrics & Cyclomatic Complexity

Controversial one this time. How valuable is cyclomatic complexity? How valuable are code coverage metrics?

These two concepts are not entirely unrelated. As it happens I am a fan of both methods, since path coverage calculations ultimately use elements of cyclomatic complexity to calculate the paths through the programme to check each line has been covered. First, a recap:

Code Coverage (aka Test Coverage)

This is the amount of the source code of a programme which is covered by a test suite. There are a few traditional breakdowns of this, including:

Statement coverage - How much of the [executable] source code of a programme is touched in tests
Path Coverage - Perhaps the more interesting metric, the number of paths through the programme which are exercised through tests.

There is one crucial thing to note here. You cannot have path coverage without statement coverage! However, you can certainly have statement coverage without path coverage (for example, a statement could call a method, but not all branches within that method are tested, since statement coverage will get to an IF statement, say, and not go further into the nesting - after all, it's hit the IF statement). If your tool measures code coverage using statement coverage methods, you don't have anywhere near enough confidence that your code doesn't have bugs due to missing tests in the test suite.

So the crucial, real, significant measure of test coverage is Path coverage, not statement coverage. You get it all with Path coverage. A lot of commentators have made the sweeping statement that test coverage is useless because of this, but what they're actually saying is Statement coverage is the weakest form of test coverage. In the maths world, we use the description, necessary but not sufficient. People also wrongly associate quality with statement coverage and one thing it's not, is a measure of quality. More on the difference between path and statement coverage here.

Paths, Paths and More Paths

Consider the following example C# code.

public bool IsLegalDriver(int age, bool hasLicense, DateTime carTaxExpiry, bool hasInsurance)
{
return ( age > 17 ) && hasLicense
&& ( carTaxExpiry >= DateTime.Today ) && hasInsurance;
}

This piece of code has the need for 16 different tests to cover all combinations of:

Age - Aged under or equal to 17 over 17
License - Has and hasn't got a license
Car Tax - Expired or not
Insurance - Driver insured or not

That is 2 to the power of 4 combinatorial possibilities. So full coverage is 16 tests. Statement coverage will register only 1. This isn't sufficient to exercise all possibilities.

Expand the Graph

For those who have written languages and compilers (or at least syntax analysers) in their time, you'll know that statements can effectively be expanded into a syntax tree. In a similar way, the above return statement can be expanded through it's syntax tree and then the introduction of the terminal characters to become a series of subtrees which can be combined into a whole complex tree of possibilities.

To illustrate it, consider the tree from the point of view of the branch (IF) statements, which basically create the following 4 subtrees.

Now, start to combine these as you read the RETURN statement from left to right (bearing in mind the return is based on the AND of these, so the optimised* code resolution path looks like):

optimised tree of the RETURN statement - When one AND is false, entire RETURN statement is false

But tree only has 5 terminal points, right? So why 16 tests? Well, the clue is in the caption. Remember an AND statement only requires one of it's binary inputs to be false for the whole statement to be false. What the above figure of 16 tests gives you is a need to test all unoptimised paths. So let's de-optimise this tree, which gives us the control flow through the programme.

This time, we have the full gamut of all 16 endpoints, one for each test! As you can see, it's a combination of all IF statement resolutions of TRUE and FALSE. After all, it's the terminal states we're interested in (they are the post-conditions/Assertions). It's tests for the positive and negative paths through the system. Does this mean the previous tree with 5 terminal points is useless?

No!

Understanding the Role of Cyclomatic Complexity

You might be asking yourself where this fits in. If not, then you might want to. The cyclomatic complexity of the system is the path of control through the application. The most famous measure of cyclomatic complexity is that of McCabe, developed in 1976 (http://en.wikipedia.org/wiki/Cyclomatic_complexity). This metric in software is mapped to:

C = E - V + 2P

Where:

C = Cyclomatic Complexity
E = Number of branches and lines of a piece of code (control flow)
V = Number of statements
P = Number of programmes, method or functions in the flow. For a single method, this equals 1.

So for the above RETURN statement, expanded as an IF statement, the cyclomatic complexity is:

E = 8 (+1 for the entry point to the method)
V = 6 (all statements + the entry point + exit point RETURN)
P = 1 (a single method)

So C = 9 - 6 + (2 x 1) = 5. Recognise that number? It's the post-conditions (end points) in the middle graph.

Why are they different?

This may sound daft, but materially they're not! If you look at the number of tests we're running, a lot of them are asserting against the same end result. Specifically, the paths that return FALSE all return FALSE for exactly the same reason. They failed one section of the AND return statement. It doesn't mater if one, two or three subconditions evaluated to false, as effectively, they are the same test assertion (i.e. return FALSE)

So what is 100% coverage?

This is where it gets interesting. 100% coverage should be the number of tests required to cover the whole control flow of the programme. However, using the example, people often confuse this with having to cover that return statement with 16 tests and not 5! 16 is the maximum number of tests you'll have to cover. This often matches with exploratory testing techniques, since you have to fill in all combinations of data to determine that there are only 5 relevant execution paths anyway. The 5 is a supremum of the subset of all possible test coverages, that cover the code 100% (or more technically).

Why is that? I'll cover the mathematical treatment of that in a future post, which will also introduce ways to determine the number of tests you actually need. However, in short, it all revolves around the AND statement. Any one of those can allow it to return FALSE, so the internal control flow can just return FALSE without evaluating anything in the AND chain after that point. However, there is only one that allows it to return TRUE. Th is is why you only need to have 5 tests instead of 16.

If you consider all the tests that offer 100% (or above) coverage, you only need to test to the 100% point and that's it (it's the supremum you want, not the maximum). Covering the other evaluations of the AND just duplicate the Assert.IsFalse(...) tests, which is near enough pointless.

Conclusion

I personally find test coverage metrics extremely important. As you sail through the sea of a development programme, they are the robustness of the regression bucket you'll need to bail with when bugs are found in your system. The lower the coverage, the more holes the bucket has, the less water you can bail out of your canoe and the more likely you are to sink. Because it offers a shield against the slings and arrows of outrageous misfortune, you're more likely to find out if shooting Jaws shot a hole in your bucket too.

Coverage metrics are both governance and risk management for a code-base. If someone says to you "Code coverage metrics don't define software quality" I'd agree on the semantics, since it is not software quality, but I'd also argue that indirectly it can very definitely show you where there are holes in your process which are most likely by far to introduce poor quality software into the enterprise. So where systems have value, don't skimp! Cyclomatic complexity should match the number of tests you have for the main control flow (obviously, add more for exceptions as needed). If they don't, then you're either missing some, or you've likely got duplication.

Happy TDD!

Pages

Sunday, 1 February 2015

Code Coverage Metrics & Cyclomatic Complexity