Showing posts with label TDD. Show all posts
Showing posts with label TDD. Show all posts

Saturday, 25 July 2015

FAIL is not a dirty word!

Short one this one. I'm spending a bit of time on an OSS project, so don't have the time to go into this in more detail.

TL;DR; Failing tests identify where your system is incomplete or inconsistent

I was at the North West Tester Gathering a couple of weeks ago. The theme of the night was Failure. There were a diverse variety of speakers, some from the BBC, Sage and one from SkyBet, Leigh Rathbone (@villabone) presented a talk entitled "FAIL is not a dirty word". It reminded me of yet another blog post I've been meaning to write for a few months on it, so I figured I'd get this down on a screen somewhere before I forgot or my time got chewed up yet again.

The thing with failure is it is a central part of doing anything in uncertain environments. Whatever the environment, whether it is Marketing, Lean-Startup, TDD'ing software or anything else with a high degree of uncertainty or variance, it is important to fail for many reasons.

Failure in TDD

"Write a failing test" - This is one of the most crucial mantras that is often espoused by us in the lean/agile world. This actually deals with a number of different problems all at once.

Aside from the orthodox answers, it also addresses two fundamental concepts to all systems thinking. If you're into theoretical computer science or mathematical logic (predicate logic or propositional calculus), these two concepts will be very familiar. I'll introduce the concepts first then name the theorem for those not familiar with them.

  • Developers who start to code a new story, start with a test. A failing test shows the boundary of the system relative to it's context. When you modify or expand the code to make the test pass, then you have made the software more complete. If the test correctly codifies the story and it happens to pass, then the system and your knowledge was more complete than you thought it was. Our knowledge is now more complete, which the code and tests happen to also represent ("code communicates intent" - @datoon83).
  • Bug tests - Those issues resulting in live (or UAT if your team works like that) show you that the software, and our associated knowledge, which satisfied the acceptance tests, isn't consistent. You write a test which exposes the bug, then you fix the system to satisfy the test. Thereby making your system more consistent.

Those familiar with these two ideas will immediately notice Godel's [1st] incompleteness theorem, which for us IT software/systems folk basically translates to:

"A system cannot be both complete and consistent at exactly the same time"

So, we have a choice. We can try to code the world, which would make it complete, putting in all the possible use cases that anyone would ever want (and perhaps many they won't) never delivering anything and we'd lose the consistency of the system anyway. Alternatively, we can constrain ourselves and accept a level of incompleteness and go for consistency (low bug count). Software development/engineering naturally lends itself to the latter. This is natural, since the system is complete to the stories that are done, not in the backlog or in-progress. Lean-Startup also introduces the concept of an MVP (Minimum Viable Product) with the aim of solidifying that MVP over time.

Summary

Fixing a bug by starting with a red test which surfaces the bug, identifies where your software is inconsistent.

Starting a new scenario with a red test, helps you identify the bounds of your system, gain more knowledge about what it should and can do, and naturally makes you extend that system, increasing the sphere of completeness.

It is important to recognise the contribution failure makes to software development. I am often frustrated when I look at code which hasn't been developed that way. It often has far too much coverage in one area, not enough coverage in others and I see the odd Assert.True() thrown in. Crucially, you can go on proving something is true the same way forever.

Code can also 'suffer' from confirmation bias as much as we can as humans. After all, the code is a manifestation of our knowledge of the domain. If we don't have that failing test, that appreciation that we have stretched the code, the system and ourselves past our limit of knowledge, we don't have that ability to fill in any gaps in that context.

So I'd certainly go further than Leigh on this one. Not only is failure not a dirty word, it's absolutely and unequivocally MANDATORY!

Sunday, 1 February 2015

Code Coverage Metrics & Cyclomatic Complexity

Controversial one this time. How valuable is cyclomatic complexity? How valuable are code coverage metrics?

These two concepts are not entirely unrelated. As it happens I am a fan of both methods, since path coverage calculations ultimately use elements of cyclomatic complexity to calculate the paths through the programme to check each line has been covered. First, a recap:

Code Coverage (aka Test Coverage)

This is the amount of the source code of a programme which is covered by a test suite. There are a few traditional breakdowns of this, including:


  • Statement coverage - How much of the [executable] source code of a programme is touched in tests
  • Path Coverage - Perhaps the more interesting metric, the number of paths through the programme which are exercised through tests.
There is one crucial thing to note here. You cannot have path coverage without statement coverage! However, you can certainly have statement coverage without path coverage (for example, a statement could call a method, but not all branches within that method are tested, since statement coverage will get to an IF statement, say, and not go further into the nesting - after all, it's hit the IF statement). If your tool measures code coverage using statement coverage methods, you don't have anywhere near enough confidence that your code doesn't have bugs due to missing tests in the test suite.


So the crucial, real, significant measure of test coverage is Path coverage, not statement coverage. You get it all with Path coverage. A lot of commentators have made the sweeping statement that test coverage is useless because of this, but what they're actually saying is Statement coverage is the weakest form of test coverage. In the maths world, we use the description, necessary but not sufficient. People also wrongly associate quality with statement coverage and one thing it's not, is a measure of quality.  More on the difference between path and statement coverage here.

Paths, Paths and More Paths

Consider the following example C# code.


        public bool IsLegalDriver(int age, bool hasLicense, DateTime carTaxExpiry, bool hasInsurance)
        {
            return ( age > 17 ) && hasLicense 
               && ( carTaxExpiry >= DateTime.Today ) && hasInsurance;
        }


This piece of code has the need for 16 different tests to cover all combinations of: 
  • Age - Aged under or equal to 17 over 17
  • License - Has and hasn't got a license
  • Car Tax - Expired or not
  • Insurance - Driver insured or not
That is 2 to the power of 4 combinatorial possibilities. So full coverage is 16 tests. Statement coverage will register only 1. This isn't sufficient to exercise all possibilities.

Expand the Graph

For those who have written languages and compilers (or at least syntax analysers) in their time, you'll know that statements can effectively be expanded into a syntax tree. In a similar way, the above return statement can be expanded through it's syntax tree and then the introduction of the terminal characters to become a series of subtrees which can be combined into a whole complex tree of possibilities.

To illustrate it, consider the tree from the point of view of the branch (IF) statements, which basically create the following 4 subtrees.


Now, start to combine these as you read the RETURN statement from left to right (bearing in mind the return is based on the AND of these, so the optimised* code resolution path looks like):


optimised tree of the RETURN statement - When one AND is false, entire RETURN statement is false


But tree only has 5 terminal points, right? So why 16 tests? Well, the clue is in the caption. Remember an AND statement only requires one of it's binary inputs to be false for the whole statement to be false. What the above figure of 16 tests gives you is a need to test all unoptimised paths.  So let's de-optimise this tree, which gives us the control flow through the programme.



This time, we have the full gamut of all 16 endpoints, one for each test! As you can see, it's a combination of all IF statement resolutions of TRUE and FALSE. After all, it's the terminal states we're interested in (they are the post-conditions/Assertions). It's tests for the positive and negative paths through the system. Does this mean the previous tree with 5 terminal points is useless?

No!

Understanding the Role of Cyclomatic Complexity

You might be asking yourself where this fits in. If not, then you might want to. The cyclomatic complexity of the system is the path of control through the application. The most famous measure of cyclomatic complexity is that of McCabe, developed in 1976 (http://en.wikipedia.org/wiki/Cyclomatic_complexity). This metric in software is mapped to:

C = E - V + 2P

Where:

C = Cyclomatic Complexity
E = Number of branches and lines of a piece of code (control flow)
V = Number of statements
P = Number of programmes, method or functions in the flow. For a single method, this equals 1.

So for the above RETURN statement, expanded as an IF statement, the cyclomatic complexity is:

E = 8 (+1 for the entry point to the method)
V = 6 (all statements + the entry point + exit point RETURN)
P = 1 (a single method)

So C = 9 - 6 + (2 x 1) = 5. Recognise that number? It's the post-conditions (end points) in the middle graph.

Why are they different?

This may sound daft, but materially they're not! If you look at the number of tests we're running, a lot of them are asserting against the same end result. Specifically, the paths that return FALSE all return FALSE for exactly the same reason. They failed one section of the AND return statement. It doesn't mater if one, two or three subconditions evaluated to false, as effectively, they are the same test assertion (i.e. return FALSE)

So what is 100% coverage?

This is where it gets interesting. 100% coverage should be the number of tests required to cover the whole control flow of the programme. However, using the example, people often confuse this with having to cover that return statement with 16 tests and not 5! 16 is the maximum number of tests you'll have to cover. This often matches with exploratory testing techniques, since you have to fill in all combinations of data to determine that there are only 5 relevant execution paths anyway. The 5 is a supremum of the subset of all possible test coverages, that cover the code 100% (or more technically).

Why is that? I'll cover the mathematical treatment of that in a future post, which will also introduce ways to determine the number of tests you actually need. However, in short, it all revolves around the AND statement. Any one of those can allow it to return FALSE, so the internal control flow can just return FALSE without evaluating anything in the AND chain after that point. However, there is only one that allows it to return TRUE. Th is is why you only need to have 5 tests instead of 16.

If you consider all the tests that offer 100% (or above) coverage, you only need to test to the 100% point and that's it (it's the supremum you want, not the maximum). Covering the other evaluations of the AND just duplicate the Assert.IsFalse(...) tests, which is near enough pointless.

Conclusion

I personally find test coverage metrics extremely important. As you sail through the sea of a development programme, they are the robustness of the regression bucket you'll need to bail with when bugs are found in your system. The lower the coverage, the more holes the bucket has, the less water you can bail out of your canoe and the more likely you are to sink. Because it offers a shield against the slings and arrows of outrageous misfortune, you're more likely to find out if shooting Jaws shot a hole in your bucket too.

Coverage metrics are both governance and risk management for a code-base. If someone says to you "Code coverage metrics don't define software quality" I'd agree on the semantics, since it is not software quality, but I'd also argue that indirectly it can very definitely show you where there are holes in your process which are most likely by far to introduce poor quality software into the enterprise. So where systems have value, don't skimp! Cyclomatic complexity should match the number of tests you have for the main control flow (obviously, add more for exceptions as needed). If they don't, then you're either missing some, or you've likely got duplication.

Happy TDD!

Thursday, 16 January 2014

Why do we test?

In my last contract at a digital agency, I was having a discussion with a chap I worked with. He's good at TDD and we were discussing why we write tests.

The benefits of testing at developer level are well known. Writing them first and cleanly, amongst other things, provides:

  • An acceptance framework containing specifications to develop software against, using success criteria.
  • An ability to continually, reliably and consistently test against the same test case 
  • Provides developer 'documentation', in turn providing an understanding of how the system is supposed to work. 
  • Confidence that what you changed hasn't borked anything
  • When combined with automated test and reporting frameworks, they give fast feedback and a degree of progress reporting. 
One thing that came up was that we shouldn't refactor tests. I am personally dead against this idea, since this introduces extra work to change the spec if the functionality need changing. The give-away is over 100% test path coverage. That means you WILL be editing more tests than you should do, which is just wasted time and effort. Note though, if it's a choice of 150% tests or 90% tests, I would choose 150% every time.

However, one thing that wasn't brought up, which I personally think is equally important, especially if the project is high value, high risk or highly sensitivity, is governance.

Ewww... Management speak.

Yes, basically it is.

We have to remember that being agile involves being multi-skilled. Part of that is managing risk and whilst tests give you some degree of that, the test coverage gives you the governance of the code and development that you need. If you have 10% coverage in your project, people have not been doing TDD now have they?

Self-sufficient agile development teams are able to govern the development of the system. Some might think governance is used because you don't trust the developers, but really, it's just as much about managing the risks with the code and gaining the biggest bang for the developer buck (effort). As a team of developers, in order to become self-sufficient, we have to be able to govern the development of the software and by proxy, the team. 

QAs and BAs have a pivotal role in this process. They know the real business value in the system and as such, they can guide where to put the testing effort. Developers can also get a sense of the importance of the code because they'll have touched specific code more than once. 

Cyclomatic complexity can also be key to all this, since the greater the cyclomatic complexity, the greater the number of tests required. If the cyclomatic complexity is high, the number of tests is high simply because of the combinatorial nature of the tests required to cover this cyclomatic complexity metric.

For example, we are human beings and we are not faultless. Anyone who claims otherwise is deluded. So if you have a piece of code which is touched by several developers, or even the same developers multiple times, it is more likely to contain bugs over time than a piece of code written the same way once and not touched since. The purpose of the tests, amongst other things, is to make sure you don't break anything when the next person touches it or you next touch it. It gives you the confidence to refactor it too. Without automated tests, and the quick feedback it brings, refactoring becomes a nightmare. After all, logical bugs don't go through the automated test sections of the covered code, they fall through the holes where the code isn't (whether due to the lack of acceptance criteria or missed path coverage).

Q: Isn't cyclomatic complexity useless?

Nope. I do often wonder why people say that. The explanations I keep seeing or hearing show they actually understand none of it. You also get criticism from developers that QAs insist on a metric such as cyclmatic complexity less than 10 and we end up coding too much 'crap'. However, let's look at why we have it.

Let's use a variation of the one shown on the MSDN website. This code is deliberately rubbish, without full statement coverage but could still be created using TDD (characterisation tests first) and before refactoring anything.

Using FxCop, dotCover and the Code Analysis Powertools in VS2010, you can analyse the solution and get the following:

dotCover statement coverage and the FxCop equivalent code metrics.

The key metric we're focussing on is the cyclomatic complexity (CC) of the method named 'Method'. This is high for the method, but what does this actually mean?

Well, the path test coverage on this would require the developer to write a test for each of those cyclomatic paths through the system. In this case, 15 of them. Filling in the rest as proof:


dotCover and FxCop analysis of code

Now, the interesting thing is we have 16 tests for a cyclomatic complexity of 16. Let's understand what this means:

  • There are only 10 lines of code in one method, and we have had to write 16 test for it - In itself, not a problem
  • We can see some clear areas for refactoring. Again, a good thing (see below as I go through it)
However, let's compare this with a wrapper  if statement (which tests for whether you care or not), then we can see we have to change more code than we otherwise would have to and this costs the business more.

I've refactored it to use a surrounding 'if' wrapper. The tests still pass, but the cyclomatic complexity has reduced to 9.

dotCOver and FxCop analysis after refactor
However, we still have 15 tests. So our coverage now sits at about 167%. That is 67 percentage points more code that another developer (or even ourselves) would have to change if we needed to change the class to do something. This is also only one level of nesting. Adding this method to another method as full, which is covered 166%  means the total combinatorial effect means we have 276% coverage. If the combined TDD development time of a module is 5 days at say, 3:2 dev to test during the TDD iterations, then the tests should only account for about 0.72 of a day and development taking the same amount of time (3 days), the other 1.28 days is just sheer waste. That's 26% of the development time.

Scale this (best case) to each time a block of code changes and suddenly, if the code is changed 10 times across stories in an iteration, you suddenly have 12.8 days that just disappeared out of a 50 day project! That's a lot of effort, time and money wasted. Development teams should respect the people that trust them to deliver, who also pay their wages.

The following is the NUnit TestFixture:

using NUnit.Framework;
 
namespace WhatCanIDo.Test
{
    [TestFixture]
    public class DayOfTheWeekTest
    {
        [Test]
        public void WhenItsMondayAndYouCareThenSayItsMonday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Monday, true), Is.EqualTo("Today is Monday!"));
        }
 
        [Test]
        public void WhenItsMondayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Monday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsTuesdayAndYouCareThenSayItsTuesday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Tuesday, true), Is.EqualTo("Today is Tuesday!"));
        }
 
        [Test]
        public void WhenItsTuesdayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Tuesday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsWednesdayAndYouCareThenSayItsWednesday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Wednesday, true), Is.EqualTo("Today is Wednesday!"));
        }
 
        [Test]
        public void WhenItsWednesdayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Wednesday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsThursdayAndYouCareThenSayItsThursday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Thursday, true), Is.EqualTo("Today is Thursday!"));
        }
 
        [Test]
        public void WhenItsThursdayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Thursday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsFridayAndYouCareThenSayItsFriday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Friday, true), Is.EqualTo("Today is Friday!"));
        }
 
        [Test]
        public void WhenItsFridayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Friday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsSaturdayAndYouCareThenSayItsSaturday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Saturday, true), Is.EqualTo("Today is Saturday!"));
        }
 
        [Test]
        public void WhenItsSaturdayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Saturday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsSundayAndYouCareThenSayItsSunday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Sunday, true), Is.EqualTo("Today is Sunday!"));
        }
 
        [Test]
        public void WhenItsSundayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Sunday, false), Is.EqualTo("You don't care!"));
        }
 
        [Test]
        public void WhenItsDunnoDayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.DunnoDay, false), Is.EqualTo("You don't care!"));
        }
    }
}

You can see that as we went along, we 'coded' the tests for each of the "don't care what day it is" scenarios right into the code. The excess ones now are not needed at all, since whatever we do, the don't care is actually decided separately from each check on the day. So these can be removed, which will remove 7 test cases.


Important Notes
If you do refactor tests, and I personally think you should, then:

  1. NEVER refactor tests if the code is red! 
  2. Use the tests to green the code and the code to green the tests but never change both at once!
Removing the 7 extraneous tests then makes the test class look like:


    [TestFixture]
    public class DayOfTheWeekTest
    {
        [Test]
        public void WhenItsMondayAndYouCareThenSayItsMonday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Monday, true), Is.EqualTo("Today is Monday!"));
        }
 
        [Test]
        public void WhenItsTuesdayAndYouCareThenSayItsTuesday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Tuesday, true), Is.EqualTo("Today is Tuesday!"));
        }
 
        [Test]
        public void WhenItsWednesdayAndYouCareThenSayItsWednesday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Wednesday, true), Is.EqualTo("Today is Wednesday!"));
        }
 
 
        [Test]
        public void WhenItsThursdayAndYouCareThenSayItsThursday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Thursday, true), Is.EqualTo("Today is Thursday!"));
        }
 
 
        [Test]
        public void WhenItsFridayAndYouCareThenSayItsFriday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Friday, true), Is.EqualTo("Today is Friday!"));
        }
 
        [Test]
        public void WhenItsSaturdayAndYouCareThenSayItsSaturday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Saturday, true), Is.EqualTo("Today is Saturday!"));
        }
 
        [Test]
        public void WhenItsSundayAndYouCareThenSayItsSunday()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.Sunday, true), Is.EqualTo("Today is Sunday!"));
        }
 
        [Test]
        public void WhenItsDunnoDayAndIDontCareThenSayYouDontCare()
        {
            Assert.That(DayOfWeekConverter.Method(DayOfWeek.DunnoDay, false), Is.EqualTo("You don't care!"));
        } 

    } 

gives us 8 remaining tests and sure enough:

dotCover results after removal of 7 extraneous tests

Summary

In short, cyclomatic complexity is a brilliant metric for governing how many tests are needed in your solution. There are two main useful comparisons in isolation and teams should take heed:
  • If cyclomatic complexity is greater than the number of tests, then you're missing test scenarios and risk introducing bugs into the system. If there is a hole in a bucket, Henry can't carry as much water. Most Agilists should aim to get here as a bare minimum in today's industry.
  • If the cyclomatic complexity of a method is less than the number of unit tests around it, then you have over 100% coverage and you introduce waste into your process. This comes when you step agile up to the lean plate!
So to be lean, 100% should be the norm. Anything else is suboptimal or worse.