Thursday, 4 June 2015

SAMPLE: Azure v AWS - Judging Trade-Offs.

Judging cloud platforms is one of the things I find myself doing a lot these days. Working mainly but not exclusively on the Microsoft stack, this generally boils down to two main options. AWS or Azure.

Now, personally, I go for AWS by default. However, for various reasons, I refuse to tie myself to any one vendor. Plus, it allow an effective, vendor neutral position to be taken and for those that know me, it cuts straight through sales cr*p to see if what vendors are saying actually matches their promises (in the main a lot don't). I tend to do this in conjunction with the organisation procuring, since it's important not just to check that vendor systems work, but that it works in context. This is especially the case when organisations are aiming to become more agile, since they will have a much closer working relationship with vendors than most vendors may feel comfortable with. So it is another tool in the toolbox to help evaluate how the line of business as a whole (business, data, application, technology, support and security) works.

How Do I Evaluate the Difference for Stories?

Trade-off analysis doesn't start with this question, but with a previous question, which is "What is it I want to achieve?" since this then leads to the all important question "What question(s) do I need to ask to evaluate vendors?" and there may be multiple ones you need answers for.

Throughout this short blog, let's use the example goal:

"Given I have to host a new room booking platform,
 I want the highest on-demand available infrastructure for the lowest monthly cost 
 So that I can extend the application in the most cost effective way"

Once we've understood the value of terms like 'cost effective', we can now look at what the availability needs are.

Let's use Microsoft Azure's own infrastructure diagrams for this. Attached is a snip of a Microsoft Blueprint for Azure hosted infrastructure.

fig 1 - Microsoft Blueprint

Comparing OnDemand costs is simply a matter of adding up all the costs for the components of network, data-store and VM for similarly matched specifications. Comparing the market price of Azure and AWS components, we see:

fig 2 - AWS v Azure Platform On-Demand Pricing


So that's the price... and AWS is cheaper... for 'bigger' hardware (same pricing tier, though did the story contain anything about application hardware specs?). Still, it's one of the two variables you need to determine cost-effectiveness. The other variable is availability guarantee.

Measuring Availability

Using the same techniques found here, it looks like it gets worse for Microsoft when looking at systemic availability. 

Azure: 99.9996%
AWS:  99.9999 %

Note, systemic availability is actually the important thing in every platform. The availability of individual components is next to no use to you as an enterprise. It only takes one component to fail irrevocably and your platform is done. 

Heard the old adage "You're only as strong as your weakest link?" When thinking systemically, such effects are a lot worse than your weakest link, since you can never make up for one weakness without impacting other elements. This is one of the reasons we host on two different, load balanced servers. Since for one single application-services-data stack on 3 VMs, each with 99% availability, we can only have an expectation of 98.01% uptime in total.


Summary and Future Posts

I started writing up comparisons for Reserved and Up-front pricing on AWS and Azure and felt the original post getting too long, even for me. So I've split it into a couple of posts to launch bit-by-bit.

The crux of all of these is to always know the question you're trying to answer. It's not a matter of boiling the ocean on day-1. After all, that's the promise of cloud. You can scale the frying pan later. 

Also, don't forget that you have a number of other options to bring these costs down. MSDN subscriptions and BizSpark give you varying levels of Azure Cloud credits and AWS gives you 'free-tier' infrastructure for 12 months which might cover your needs entirely. So you have to consider a more holistic approach to understanding your options and constraints, since the latter is your job, not the vendors.

Sunday, 19 April 2015

Lowering Chances, Mitigating Risks or Both?

I was talking at Lean-Agile Manchester this week. It was a choc-full event which necessitated the adoption of extra chairs.

A number of the XP Manchester folk were in, which is always entertaining, since the two groups have overlapping common interests but as with many agile vs lean schools, we don't necessarily come to an agreement on the best way forward for things.

There were some great questions through the night! Including the ones form the hecklers. It centred around data from some graphs I showed from a previous blog post tried not to go into the maths of due to the typical spread of the audience. So I offered to take it offline so as not to bore the audience, but there wasn't the appetite form the questioner, so smackdown happened and they then agreed to take it offline but never got back to me, darn it! (#invitestillopen)

Background

What's the reason for the graphs?

Several years ago, I was working in a company which was on the proverbial agile journey. They were still thinking in very big-design ways and were managing programmes of work through standard programme and project management methods. The company's attempt to have conversations around agile programming were not really working and the second attempt at them (i.e. just do the work and they will come) didn't reach far enough for anyone in positions of enough power to take the effort seriously. This resulted in a somewhat disconnected hybrid method which saw lower levels doing the work with upper levels of management and EA imposing design on the teams, with PMs backing up the EAs as authority on that work.

In addition to that, teams spent the vast majority of retrospective time generating new ideas for working together (good, bad, change) including grouping tasks, voting and setting options for the next iteration. However, no retrospective ever came back to check that these did indeed improve the process and any overhead we introduced as part of the each task was actually worth it. Further actions just built on top of these actions and you gradually built up greater overhead in each iteration.

The team had successfully implemented WIP limits (though that started off quite painfully) and were measuring cycle time and throughput since this was easy for them to visualise in a JIRA Dashboard. We saw a burn down but it wasn't clear whether our flow was any good and indeed, whether we were improving at all.

Add to this the need from classical project management to get an idea of the length of time things would take as well programme management to align the streams of work meant we had to get to know something about whether we can actually hit the hard deadline. Those that know me know I think aligning work the SAFe way or classical PERT way introduces inherent risks, but the environment was what it was and each change begins with a small step, not a 'Big-Destroy Enterprise Programme'. After all, as a dev, you're an easy replacement anyway to that style of culture (not that you necessarily have to worry about it in the IT game but it's an important consideration).

Who wanted it?

The graph/points estimation wasn't necessarily to get the team to improve delivery per se. That was not the purpose of the exercise. It was to give confidence that when we were challenged to produce an estimate, we could do so reliably and provide some confidence to the supporting classical thinking personnel we're talking to that we can and have delivered x features in t. It was to lower the variation and give confidence to those who wanted to support us that we could deliver and were improving. This was a tool to help them do that and get the buy in they needed, which took half an hour a week for someone to do (indeed, I did it - but any scrum-master or tech lead can do it in an enterprise context).

Why should you care?

The answer depends on the context you work in. In an agile-sympathetic environment, this isn't really necessary at all. After all, everyone is confidence and comfortable with change. However, where a hybrid exist or companies are transitioning, sometimes these conversations are necessary. Later on, they may not be relevant any more. Enterprises can evolve as much as people do.

The Follow-up Questions

During the talk, some questions were asked and I agreed to produce some follow-up graphs from the data. In order to understand some parts of this, I'd suggest you go back and read the method presented in that blog post, as this will explain what look like 2-pt and 5-p story 'anomalies' as we shifted our understanding of story sizes.

Cone of Uncertainty - Variation Over-time

Specifically, taking the variation between our expectation and actual delivery, plotting it and calculating the Coefficient of Variation to standardise the scales of the graphs, we can plot the change in the coefficient over time. What we see for each story size (in points) is this:


Story point variation (CV) and polynomial trend line

To keep things simple(r), I've added a cubic polynomial trend line to illustrate a smoothed variation. I haven't done anything else to the trend line and Excel has chosen the shape that minimises the sum of squares. We can relate actual uncertainty to the variation in story point figures. The same downward trend on variation is seen in linear and logarithmic trend lines. As you can see, most trends show the reduction in uncertainty as we recalibrate our positions.

Limitations

The only exception to the general trends are the 8-pt story sizes, which curve slightly upwards (not significantly enough over linear to be concerned about). Additionally, due to the team rightly reducing larger 13 point stories into smaller stories, there are only a few 13 points stories in the dataset. I argued there were not enough to come to a conclusion or indeed worry about going forward, especially most became 8-point stories as a natural part of story splitting and recalibration (again, read the previous blog post).

Conclusion

As I explained in the talk the other day, estimation such as this isn't an end goal. This is a technique in the repertoire to provide confidence for those who can support us to become more agile. After all. working in the Enterprise Architecture space necessitates communicating in many different companies, with many different types of stakeholder, including non-technical personnel/those without a software development background. Not ever EA problem is a software development problem. Indeed, to approach it from that perspective architects before it's necessary, if it needs it at all!

Digression

As an example, consider walking skeletons, which can be just as problematic in code, since they make explicit choices on the technology stack way before a decisions is needed on the suitability or otherwise of the tech, but they are useful tools to experiment when you have a tech stack already and gain certainty. However, employing just a walking skeleton is like having Maslow's Hammer. It risks introducing technology into a non-existent current stack when the basics of what people want are unknown. In this case, you don't need a skeleton per se. Just throw together a UI mock up and deploy that to a static environment (even a file system) to get people using it to input data that never gets stored. This can be done in a few minutes compared to creating a walking skeleton which can take a couple of hours to get the same amount of feedback and can be potentially constrained by infrastructure problems and will require some prerequisite work. So bang for buck, if the question is trying to find out of Henry Ford's customers wanted faster horses, this would be cheaper to do than a walking skeleton and yields just as much value. The second meeting can fill this out with a skeleton if you want, since by this point you have more information to base choices on.

Risk and Sensitivity

You have two non-mutually exclusive choices to deal with risk. The first is to reduce the chance of it occurring, which this technique fits into. The other is to mitigate the impact should the risk occur. Which this doesn't address and isn't intended to. So this can only be one of many tools in the team's arsenal in dealing with tracking, recalibration and risk reduction and as we can see, there are specific scenarios this addresses really well. The question is, what other techniques exist to address the same problem?

Further Updates

I will answer some of the other questions in time and post them as updates to this blog.

Wednesday, 1 April 2015

Lean-Agile Metrics: Like it or Not, Stats Rules!

I've been wanting to write this blog for the best part of 4 years (I have a few of these I've been meaning to write up to be fair). I've only just round to finally doing the necessary mathematical proof...

Wait, where are you going? Come back!!

*sigh*

If you don't listen to this one, you likely aren't data driving your retros, aren't effectively self-managing and could stall your agile transformations. It's not just about the coding you know! You can't embrace change if you don't know what is changing around you!

What prompted this?

Ignoring the shoes I'm wearing... wait, you mean me to write it up?... Ah, yeah. That...

*shifty look*

It was a LinkedIn group discussion, as it yet again has become abundantly clear that we're missing some understanding around lean in the software world.

Tell you what, I'll make it simple. I'll use terms you're used to before you freak out. I'll use the context of software development, since this is an arena I'm intimately familiar with. The key bit to concentrate on is the cycle-time.

Cycle-Time isn't quite what you think

Cycle-time as we know it is the average time take to process a thing. From the point of view of software, let's consider a #NoEstimate or single size story-point ticket (I prefer to move beyond that, but for now, this will do) on a super simple Kanban board of 'Doing' and 'Done'. However, this generalises to any type of flow.

Single stage Kanban board


Each item's individual lead time in days say, can be modelled as shown just under the stage box. This states that cycle time t for any individual ticket is the average cycle time (t 'bar') of ALL tickets through this stage plus a variation (delta-t) around it. For example, if the average cycle-time is 5 days and this task takes 6 days, the variation is 6 - 5 = 1. This can also be rewritten as 6 = 5 + 1 which describes that the cycle-time for a task is the average cycle-time for that stage plus the variation.

However, we can't make a decision on one data point. That is like flipping a coin, getting heads and stating it will always be heads. So we run it again and again, which happens naturally in an iteration as you deliver tickets and ideally, you'll deliver at least 25 ticket which gives us a good level of certainty in any results we draw at the retro... you are data-driving your retros aren't you? ;-)  If you are not delivering 25, then this may be an opportunity to recalibrate by resizing the stories you have so that you can get enough data points, which naturally makes the variance on each story smaller anyway. I'll be giving a talk on this soon (shameless plug), so if you're in Manchester in April, pop in to Lean-Agile Manchester and I'll try to explain it in a slightly friendlier way... but not much. It's just the way I roll.

For the sake of illustration, I've used just 5 samples so you can see how it fits together. You get an average from this, which comes out as 31 in the example and the average deviation which is 2.828 (2 x square root of 2 on the right). The coefficient of variation is simply the standard deviation divided by the average which is in this case, 9.1% of a day. Pretty small.

Kanban: Cycle-Time for Multiple Stages

This small deviation isn't the same for larger exercises. If we chain a series of these together, say into a 3 stage Kanban board (Elaborate, Doing, QA, Done) we get

3 stage Kanban

Again, we can determine the variation as before, but this time, the total variation is influenced by the earliest finish time of the first task, to latest finish time of the final task. The proof is above, and the numbers tell the story. 8.7 / 83.2 = 10.46% of a day, which is an increase in the coefficient of variation of 14.9% for this Kanban configuration and the cycles-time through each stage. You'll note I deliberately didn't compare means, since there is nearly 3 times as much 'work' going on and I didn't directly compare variances with each other, since we know the variance is the earliest start time to the latest finish time on a longer chain.

The coefficient of variation basically normalises the standard deviation relative to the size of the tasks at hand. Hence, this is the best comparator and is something that can be used between teams to compare team certainty if you feel like being dark and monitoring at programme level.

Real World Applications

The beauty of this is that it scales 'fractally'. The maths can apply to a person, a stage in a board, a team, a business vertical/systemic flow of multiple teams, a programme etc. both classical and modern agile groups have been guilty of just concentrating on the 'average throughput' and 'average cycle time' when there comes a point where this doesn't wash any more and consistency becomes key. Hence, understanding and controlling for the variation allows you to gain a level of predictability you otherwise wouldn't achieve.

Basically, the lower the coefficient of variation relative to the costs-benefit of getting there, the better! This another reason why I agree with a number of commentators who propose that we include [business] value in stories, since this hard-to-say measure is in there from the start.

Sweet Spot

This very much depends on a host of factors, including the organisation's appetite for risk, the value they hope to achieve, when they go live to achieve it any contingency budget and of course, how well the team recalibrate along the way. Indeed, I'd even go so far as to say it's a range of values.

Hence, in software, there are practises such as Continuous Deployment and deploying MVPs which are better suited to this than most, value increments are zero before delivery and A/B-testing new changes should aims to improve the delivered value relative to the uncertainty. So anything I'd say here would be a conjecture without a theoretical base, but I'll give you one conjecture.

This is really a link between the expected path you could take and the amount of variance to the point that the variance breaches a series of control limits. In older, larger batch flows, with long lead times, this compounding variance causes a very wide variation by the end of a project. This is the cone of uncertainty. I've covered this before in in the faces of the #NoEstimates  movement last year.

To understand how the cone of uncertainty applies here, let's put ourselves at the origin and look toward deadline date. This it the solid red line in the bottom graph. The further forward we look, the more uncertain looks our future.



The above shows two graphs aligned to each other. The top is the usual J-Curve and the lines around it, green or red, show the uncertainty as it would be defined by the coefficient of variation, since that's the measure of dispersion as tickets and value accumulate.

In classical environments (top graph, red dotted line), those limits are wide as the uncertainty is wide and are still regularly breached. By contrast, the green dotted lines show the coefficient after each iteration has complete and we reassess the coefficients during each retrospective. To understand that uncertainty, curve, look at the bottom graph. As we progress into projects, each iteration we deliver is not uncertain any more, since we've delivered it. It's out! The only uncertainty that remains is the rest of the project which often 'resets' the uncertainty to the levels now understood from the actual delivered functions. This is the same as the 10 coin toss post from last year.

This naturally means the control limits move. Hence, overlaying this on the J-curve like we did with the red dotted line, we can see how the range progressively narrows as each iteration delivers. The key part to this though is that you can only get this narrowing of uncertainty if you are measuring and acting on something! Some would argue waterfall measured, which is did, but it rarely acted as it more often required a huge movement and if that was attempted slower than the market changed it was set for a huge crash.

The sweet spot range is that coefficient of variation at each stage in the project life-cycle. At i = 0,1,2,3...,n and the more frequently you sample, the less likely the coefficient of variation that is being tracked will fall outside those limits, which again, are value [at risk] dependent. Indeed, if you look at this from the point of view of a dynamical system, Lyaponov exponents relating the actual delivery to the coefficient of variation are likely to give you a nice threshold measure, but that's my one conjecture :)


Conclusion 

This is a heavy topic for most to grasp, but one that once you have the fundamentals, can massively transform the way you think about constraints and systems, especially people ones. It's only appropriate for the most advanced lean-teams. I appreciate that a lot of people will find this very scary, so you're welcome to get in touch via email at ethar [at] axelisys.co.uk with specific questions, with your value measure and I'll see what I can do to help.



E

Tuesday, 17 February 2015

Tackling Complexity in Projects & Enterprises

There is an analogy doing the rounds at the moment which I think is pretty good as a necessary intro, but not at all sufficient to understand some of the complexity enterprise programmes and transformations face. It's the comparison between simple pendulums and double or compound pendulums. For those of you who don't know what the difference is, the first part of this video shows the simple pendulum, the second part, after the unbolting (turning the single rod into two parts  at 0:10) shows the double pendulum.

YouTube vid stevenbtroy

Now, simple enough, predict the motion of the pendulum in each case.

Dependency

The key point to note here is that when the fixed point is freed to become a compound double pendulum, there is a single variable dependency between the two parts of the compound pendulum. That's where they couple.

Coupling and dependency certainly isn't new. It exists in code, in relationships between component, people, tasks, cars, concepts etc. The term 'conessience' is making the rounds, which for maths folk is simply a dependency.

These dependencies exist all over every type of system you can think of. Indeed, it's impossible for there not to be in a sustainable way (even mythical perpetual motion machines are dependent on something such as gravity). An enterprise and indeed, a project isn't any different.

Consider the following example Gantt chart:



The key thing to note are the arrows between the tasks as these define the dependency. The main variable of note is time (hinted at by the calendar along the top). The on-time deliverability [I don't think that's a word] of tasks on the right is wholly dependent on the delivery of the tasks to the left of it in the dependency chain. Each link is like the pivot on the video at the start of this blog. So in essence, even the red stream alone of this simple Gantt chart has 4 dependencies (tasks 2, 4, 6, 8) and each of these dependencies is a pivot in a pendulum. How successful were you in predicting the motion of the compound pendulum ahead of time? Now quadruple the number of pivots and imagine predicting that. Note, in enterprise projects, 7 tasks across two teams isn't exactly considered a large project. So why do we subject ourselves to such unpredictability and risk?


Fixed Points

In august of last year, I wrote a post about the similarities of enterprise agility and astrophysics. I explain the n-body problem and how it relates to enterprises. I'm only going to expand a little more on this by asking you to imagine that as a CxO, you pay for everything in the path the base of the pendulum takes and get a return when a pendulum hits its maximum position on the other extreme (in the video, it is swung from the right, so you get a return every time it reaches the maximum horizontal distance on the left hand side).

To make this next bit easy, I've run an experiment for 10 for you using the pendulum simulator at http://labs.minutelabs.io/Chaotic-Pendulum/

Simple Pendulum



In 10 seconds, the simple pendulum reaches it's maximum horizontal latitude twice. The amount of 'money' (which is simply the length of the arc) is about the same each time and is represented by the path traced out by the furthest point away from the pivot.

Compound Pendulum
Contrast this with the path covered by the double pendulum in the same 10 second period. Remember, the path covered is the direction all your projects are going in with the money you have invested in their success. I have set the pendulum to be about the same length, which I show you how to do, and I run it for 10 seconds. Whilst watching this, answer the following:

  1. How many times did the goal of hitting the maximum left hand side happen? 
  2. Where did the money go? 
  3. What value or return did the investment yield?




Additionally, remembering that entropy always increases, this sort of disorder just gets worse the more energy that's put into it.

Side-by-Side

Consider how long the arcs are and where they are relative to the direction of travel. Which looks more chaotic and unpredictable? 

Simple versus Compound paths


Conclusion

This analogous post highlights the importance of understanding complexity in enterprises. Whilst a basic understanding of trigonometry will see you through the simple harmonic motion of the simple pendulum, it isn't sufficient to understand the full chaotic motion of the compound system, which is both non-linear and sensitively dependent anyway (so calculus of variations as a base skill comes in handy. Though you're not taught that at A-level or in pretty much any undergraduate or postgraduate computing course - You have to go to subjects like physics or mathematics for that. If you're from business, sales or marketing backgrounds, you'll really struggle). Just like simple IT or business only thinking will see you through localised problems in individual architecture domains, but isn't sufficient for the enterprise as a whole. The trouble comes in the relationship (aka dependency) within the organisational system. The relationship between functions, disciplines, people, teams etc. etc. this is where architecture lives. Between the things.

Enterprise projects are still regarded as somewhat simple ideas when they are anything but. There are so many factors to consider. As I tried to illustrate at the top, the likelihood of delivering bigger projects, with higher numbers of dependencies get slimmer by the link. It's the whole chain that matters. This equally well applies to activities. Enterprise functions are also always changing. Whether they should be if it's not aligned to the overall enterprise goal is another matter. So why do we still persist in managing projects like this? As you can see, you're just wasting money!


E

P.S. The other edge to this double edged sword is how complexity can arise out of simple behaviour. Shhh... I'll tell you a secret. I will come to this another day ;)

Sunday, 1 February 2015

Code Coverage Metrics & Cyclomatic Complexity

Controversial one this time. How valuable is cyclomatic complexity? How valuable are code coverage metrics?

These two concepts are not entirely unrelated. As it happens I am a fan of both methods, since path coverage calculations ultimately use elements of cyclomatic complexity to calculate the paths through the programme to check each line has been covered. First, a recap:

Code Coverage (aka Test Coverage)

This is the amount of the source code of a programme which is covered by a test suite. There are a few traditional breakdowns of this, including:


  • Statement coverage - How much of the [executable] source code of a programme is touched in tests
  • Path Coverage - Perhaps the more interesting metric, the number of paths through the programme which are exercised through tests.
There is one crucial thing to note here. You cannot have path coverage without statement coverage! However, you can certainly have statement coverage without path coverage (for example, a statement could call a method, but not all branches within that method are tested, since statement coverage will get to an IF statement, say, and not go further into the nesting - after all, it's hit the IF statement). If your tool measures code coverage using statement coverage methods, you don't have anywhere near enough confidence that your code doesn't have bugs due to missing tests in the test suite.


So the crucial, real, significant measure of test coverage is Path coverage, not statement coverage. You get it all with Path coverage. A lot of commentators have made the sweeping statement that test coverage is useless because of this, but what they're actually saying is Statement coverage is the weakest form of test coverage. In the maths world, we use the description, necessary but not sufficient. People also wrongly associate quality with statement coverage and one thing it's not, is a measure of quality.  More on the difference between path and statement coverage here.

Paths, Paths and More Paths

Consider the following example C# code.


        public bool IsLegalDriver(int age, bool hasLicense, DateTime carTaxExpiry, bool hasInsurance)
        {
            return ( age > 17 ) && hasLicense 
               && ( carTaxExpiry >= DateTime.Today ) && hasInsurance;
        }


This piece of code has the need for 16 different tests to cover all combinations of: 
  • Age - Aged under or equal to 17 over 17
  • License - Has and hasn't got a license
  • Car Tax - Expired or not
  • Insurance - Driver insured or not
That is 2 to the power of 4 combinatorial possibilities. So full coverage is 16 tests. Statement coverage will register only 1. This isn't sufficient to exercise all possibilities.

Expand the Graph

For those who have written languages and compilers (or at least syntax analysers) in their time, you'll know that statements can effectively be expanded into a syntax tree. In a similar way, the above return statement can be expanded through it's syntax tree and then the introduction of the terminal characters to become a series of subtrees which can be combined into a whole complex tree of possibilities.

To illustrate it, consider the tree from the point of view of the branch (IF) statements, which basically create the following 4 subtrees.


Now, start to combine these as you read the RETURN statement from left to right (bearing in mind the return is based on the AND of these, so the optimised* code resolution path looks like):


optimised tree of the RETURN statement - When one AND is false, entire RETURN statement is false


But tree only has 5 terminal points, right? So why 16 tests? Well, the clue is in the caption. Remember an AND statement only requires one of it's binary inputs to be false for the whole statement to be false. What the above figure of 16 tests gives you is a need to test all unoptimised paths.  So let's de-optimise this tree, which gives us the control flow through the programme.



This time, we have the full gamut of all 16 endpoints, one for each test! As you can see, it's a combination of all IF statement resolutions of TRUE and FALSE. After all, it's the terminal states we're interested in (they are the post-conditions/Assertions). It's tests for the positive and negative paths through the system. Does this mean the previous tree with 5 terminal points is useless?

No!

Understanding the Role of Cyclomatic Complexity

You might be asking yourself where this fits in. If not, then you might want to. The cyclomatic complexity of the system is the path of control through the application. The most famous measure of cyclomatic complexity is that of McCabe, developed in 1976 (http://en.wikipedia.org/wiki/Cyclomatic_complexity). This metric in software is mapped to:

C = E - V + 2P

Where:

C = Cyclomatic Complexity
E = Number of branches and lines of a piece of code (control flow)
V = Number of statements
P = Number of programmes, method or functions in the flow. For a single method, this equals 1.

So for the above RETURN statement, expanded as an IF statement, the cyclomatic complexity is:

E = 8 (+1 for the entry point to the method)
V = 6 (all statements + the entry point + exit point RETURN)
P = 1 (a single method)

So C = 9 - 6 + (2 x 1) = 5. Recognise that number? It's the post-conditions (end points) in the middle graph.

Why are they different?

This may sound daft, but materially they're not! If you look at the number of tests we're running, a lot of them are asserting against the same end result. Specifically, the paths that return FALSE all return FALSE for exactly the same reason. They failed one section of the AND return statement. It doesn't mater if one, two or three subconditions evaluated to false, as effectively, they are the same test assertion (i.e. return FALSE)

So what is 100% coverage?

This is where it gets interesting. 100% coverage should be the number of tests required to cover the whole control flow of the programme. However, using the example, people often confuse this with having to cover that return statement with 16 tests and not 5! 16 is the maximum number of tests you'll have to cover. This often matches with exploratory testing techniques, since you have to fill in all combinations of data to determine that there are only 5 relevant execution paths anyway. The 5 is a supremum of the subset of all possible test coverages, that cover the code 100% (or more technically).

Why is that? I'll cover the mathematical treatment of that in a future post, which will also introduce ways to determine the number of tests you actually need. However, in short, it all revolves around the AND statement. Any one of those can allow it to return FALSE, so the internal control flow can just return FALSE without evaluating anything in the AND chain after that point. However, there is only one that allows it to return TRUE. Th is is why you only need to have 5 tests instead of 16.

If you consider all the tests that offer 100% (or above) coverage, you only need to test to the 100% point and that's it (it's the supremum you want, not the maximum). Covering the other evaluations of the AND just duplicate the Assert.IsFalse(...) tests, which is near enough pointless.

Conclusion

I personally find test coverage metrics extremely important. As you sail through the sea of a development programme, they are the robustness of the regression bucket you'll need to bail with when bugs are found in your system. The lower the coverage, the more holes the bucket has, the less water you can bail out of your canoe and the more likely you are to sink. Because it offers a shield against the slings and arrows of outrageous misfortune, you're more likely to find out if shooting Jaws shot a hole in your bucket too.

Coverage metrics are both governance and risk management for a code-base. If someone says to you "Code coverage metrics don't define software quality" I'd agree on the semantics, since it is not software quality, but I'd also argue that indirectly it can very definitely show you where there are holes in your process which are most likely by far to introduce poor quality software into the enterprise. So where systems have value, don't skimp! Cyclomatic complexity should match the number of tests you have for the main control flow (obviously, add more for exceptions as needed). If they don't, then you're either missing some, or you've likely got duplication.

Happy TDD!

Thursday, 11 December 2014

What's wrong with a Little predictability?

I was asked recently about Little's law. For the uninitiated, it is a fundamental, but elegant result in queuing theory. It's akin to the simplicity of Einstein's 'E' equals mc squared as it reduces a whole heap of complexity into a few simple variables. It is now finally being applied to software Kanban having existed way before the field of software engineering ever existed.

In software, it's pretty simple and relates the average number of cards in play (between the backlog and done) to the average cycle time and arrival rate. If your arrival rate is the same as your service rate, which in Scrum you would expect it to be if you're delivering all your cards in that Sprint's time period, you end up with a pretty good link.

So what's the problem?


The issue is (again) that people miss a crucial detail. It's how KISS differs from Occam's razor and how folk abuse the agile manifesto. Remember the items on the right? Now do you remember the last statement that references them? ("Whilst there is value in the items on the right, we value the items on the left more").

With Little's law, it is that the team has to attain predictability. That predictability is the team consistently delivering the same number of points every sprint and/or having a consistent cycle time. Little's law doesn't technically have a stochastic component, so obviously needs stability to attain a zero variance. The problem you have, especially at the beginning of each 'project' [*grumble* *humbug* need #NoProjects] is that you do not have that stability. Teams can under or over-perform, so there isn't stability. That said, a team that is also improving and delivering 'more', which is always desirable, then has the disadvantage that they're not naturally stable! They are delivering more, so naturally the average changes.

But isn't improving a good thing?

Totally! It's the best thing you can do! However, if you are hoping to use Little's law to project/forecast in an environment which is improving, you can't do it because of this. At least, you can't do it without the introduction of a stochastic component, or comparing against the desired burn-up. Believe it or not, improving is instability which naturally increases the variance of the delivery as a whole. That's your trade off! Continual improvement means you cannot gain the stability needed to use Little's law!

*shock horror*

Are you sure?

Yep, very!

Consider the following graph. it shows a team's data where they do not improve their delivery and are running late to start. If projecting forwards, their variance is very narrow. You're going to be very late, but you're pretty sure they are going to be late. If you plot the projection of the end of the 'project' through the average burn-up as you accumulate ACTUAL data, you'll see where it's likely to be:


Team who do not improve


Little's Law could be used here to project where they're going to be and if you look at the range of possible outcomes in the time allotted or the time variance needed to complete the scope (remembering the golden triangle) you'll see this is much narrower than the team who improve below!

Team practising continuous improvement
Here Little's Law is pretty much no use! Indeed, in most teams, you can't get enough of a data set for each improvement to measure the average and deviation reliably.

Conclusion: What to do?

At the end of the day, you're just trying to give yourself the best chance. It’s not intuitive to applaud greater variance, since that’s normally greater risk, but because the variance needs to ‘cross’ a value (the average, which in this case is the original burn-up. i.e. 'fixed' at the outset if scope is fixed), it’s the more points you deliver ‘above the thick red line’ that count. If it swings wildly with the majority of the mass below the red line, you’re scr3wed. If it’s above, you're rocking! This is why I prefer to get the average of teams to be on or above the red line and then reduce the variance, since this gives you greater certainty about the burn-up rate.

So, in short, there are a million and one tools out there to help folk with software development and predictability. Teams have to be careful they don't pick a tool and misapply it and it's these limits that often tell us whether it is appropriate or not to use it. The situation where it doesn't work may outnumber the ones that do. We're not all hammering nails, after all.

Friday, 21 November 2014

#LeanConf 2014: 4 Fave Presenters

Short one this one, as it has been an exhausting week!

I was at #LeanConf in Manchester this week and of the amazing and inspiring speakers, there were a few that stood out. My top 4 were:

Ton Wesseling 

twitter handle: @tonw

Hands down my personal favourite presenter there! Being a bit of a data geek myself, I loved the data and educational elements of his presentation. Whilst not new to me, he's the sort of guy in the industry who can help organisations close the leaning loop by allowing you to truly understand your data, improvements, A/B-test results, what to focus on and what to ignore. When you are the only guy in pretty much every single company you go into who walks and talk agile metrics, performance, statistics, learning, data, data and more data, it can get to be a very lonely place until you find another person in the world who shares the same passion, knows what's just enough, and both its importance and pitfalls.

I took the time to speak to Ton after his presentation, specifically about how to get the statistical thinking into some teams as this often requires bridging a huge skills gap ad his answer was pretty simple. Employ psychologists! I have long thought that psychologists have a place in organisations, but I as yet to be convinced that I could justify suggesting a formal psychologist role at team level so steered clear of suggesting them. Psychologists bring both human psychodynamics AND statistics to the table, since they have to study it. So having this suggestion come from someone who's done it does add some validity to the idea, so I look forward to trying it out.

Janice Fraser 

twitter handle: @clevergirl

My favourite presentation from an entertainment point of view. It was awesome to see her present and she had me and the rest of the audience in fits of laughter! My stomach was aching the whole day after as if I'd had a session at the gym... and I do go to the gym! Her presentation about Gab Zichermann's new educational system and use of games and puzzles to educate helped promote curiosity and traditional skills in education. I have to vouch for this, as whilst I was classically educated, it was the stuff I did outside school that put it into practise and hence, allowed me to score highly in school/college/uni yet not have to do a single day's worth of revision, because these were skills I used all the time. Definitely think there is something in this.

Tristan Kromer 

twitter handle: @TriKro

My best memory award goes to Tristan. His slides didn't work unfortunately, but he blasted through the whole presentation, by heart, without missing a step. Awesome professionalism!

This isn't to say that other presenters weren't good, as it was a tough choice. Everyone will have a different favourite 3. For example, Barry O'Reilly from ThoughtWorks provided an informative talk on a classical Enterprise Agile problem, optical illusions and plenty of Watermelons :)

Ash Maurya

twitter handle: @ashmaurya

The author of Running Lean spoke about how companies are basically customer factories. Thy produce happy customers. He also talked about testing the market and the crucial feedback loop that allows the factory to respond to market opinion and change. He's certainly well aware of the need to consider the data when deciding how much to invest and work with.

Enjoyed #LeanConf! Especially since I won a copy of Ash Maurya's book, Running Lean for asking a question at the right time. Looking forward to next year! :)