Wednesday 1 April 2015

Lean-Agile Metrics: Like it or Not, Stats Rules!

I've been wanting to write this blog for the best part of 4 years (I have a few of these I've been meaning to write up to be fair). I've only just round to finally doing the necessary mathematical proof...

Wait, where are you going? Come back!!


If you don't listen to this one, you likely aren't data driving your retros, aren't effectively self-managing and could stall your agile transformations. It's not just about the coding you know! You can't embrace change if you don't know what is changing around you!

What prompted this?

Ignoring the shoes I'm wearing... wait, you mean me to write it up?... Ah, yeah. That...

*shifty look*

It was a LinkedIn group discussion, as it yet again has become abundantly clear that we're missing some understanding around lean in the software world.

Tell you what, I'll make it simple. I'll use terms you're used to before you freak out. I'll use the context of software development, since this is an arena I'm intimately familiar with. The key bit to concentrate on is the cycle-time.

Cycle-Time isn't quite what you think

Cycle-time as we know it is the average time take to process a thing. From the point of view of software, let's consider a #NoEstimate or single size story-point ticket (I prefer to move beyond that, but for now, this will do) on a super simple Kanban board of 'Doing' and 'Done'. However, this generalises to any type of flow.

Single stage Kanban board

Each item's individual lead time in days say, can be modelled as shown just under the stage box. This states that cycle time t for any individual ticket is the average cycle time (t 'bar') of ALL tickets through this stage plus a variation (delta-t) around it. For example, if the average cycle-time is 5 days and this task takes 6 days, the variation is 6 - 5 = 1. This can also be rewritten as 6 = 5 + 1 which describes that the cycle-time for a task is the average cycle-time for that stage plus the variation.

However, we can't make a decision on one data point. That is like flipping a coin, getting heads and stating it will always be heads. So we run it again and again, which happens naturally in an iteration as you deliver tickets and ideally, you'll deliver at least 25 ticket which gives us a good level of certainty in any results we draw at the retro... you are data-driving your retros aren't you? ;-)  If you are not delivering 25, then this may be an opportunity to recalibrate by resizing the stories you have so that you can get enough data points, which naturally makes the variance on each story smaller anyway. I'll be giving a talk on this soon (shameless plug), so if you're in Manchester in April, pop in to Lean-Agile Manchester and I'll try to explain it in a slightly friendlier way... but not much. It's just the way I roll.

For the sake of illustration, I've used just 5 samples so you can see how it fits together. You get an average from this, which comes out as 31 in the example and the average deviation which is 2.828 (2 x square root of 2 on the right). The coefficient of variation is simply the standard deviation divided by the average which is in this case, 9.1% of a day. Pretty small.

Kanban: Cycle-Time for Multiple Stages

This small deviation isn't the same for larger exercises. If we chain a series of these together, say into a 3 stage Kanban board (Elaborate, Doing, QA, Done) we get

3 stage Kanban

Again, we can determine the variation as before, but this time, the total variation is influenced by the earliest finish time of the first task, to latest finish time of the final task. The proof is above, and the numbers tell the story. 8.7 / 83.2 = 10.46% of a day, which is an increase in the coefficient of variation of 14.9% for this Kanban configuration and the cycles-time through each stage. You'll note I deliberately didn't compare means, since there is nearly 3 times as much 'work' going on and I didn't directly compare variances with each other, since we know the variance is the earliest start time to the latest finish time on a longer chain.

The coefficient of variation basically normalises the standard deviation relative to the size of the tasks at hand. Hence, this is the best comparator and is something that can be used between teams to compare team certainty if you feel like being dark and monitoring at programme level.

Real World Applications

The beauty of this is that it scales 'fractally'. The maths can apply to a person, a stage in a board, a team, a business vertical/systemic flow of multiple teams, a programme etc. both classical and modern agile groups have been guilty of just concentrating on the 'average throughput' and 'average cycle time' when there comes a point where this doesn't wash any more and consistency becomes key. Hence, understanding and controlling for the variation allows you to gain a level of predictability you otherwise wouldn't achieve.

Basically, the lower the coefficient of variation relative to the costs-benefit of getting there, the better! This another reason why I agree with a number of commentators who propose that we include [business] value in stories, since this hard-to-say measure is in there from the start.

Sweet Spot

This very much depends on a host of factors, including the organisation's appetite for risk, the value they hope to achieve, when they go live to achieve it any contingency budget and of course, how well the team recalibrate along the way. Indeed, I'd even go so far as to say it's a range of values.

Hence, in software, there are practises such as Continuous Deployment and deploying MVPs which are better suited to this than most, value increments are zero before delivery and A/B-testing new changes should aims to improve the delivered value relative to the uncertainty. So anything I'd say here would be a conjecture without a theoretical base, but I'll give you one conjecture.

This is really a link between the expected path you could take and the amount of variance to the point that the variance breaches a series of control limits. In older, larger batch flows, with long lead times, this compounding variance causes a very wide variation by the end of a project. This is the cone of uncertainty. I've covered this before in in the faces of the #NoEstimates  movement last year.

To understand how the cone of uncertainty applies here, let's put ourselves at the origin and look toward deadline date. This it the solid red line in the bottom graph. The further forward we look, the more uncertain looks our future.

The above shows two graphs aligned to each other. The top is the usual J-Curve and the lines around it, green or red, show the uncertainty as it would be defined by the coefficient of variation, since that's the measure of dispersion as tickets and value accumulate.

In classical environments (top graph, red dotted line), those limits are wide as the uncertainty is wide and are still regularly breached. By contrast, the green dotted lines show the coefficient after each iteration has complete and we reassess the coefficients during each retrospective. To understand that uncertainty, curve, look at the bottom graph. As we progress into projects, each iteration we deliver is not uncertain any more, since we've delivered it. It's out! The only uncertainty that remains is the rest of the project which often 'resets' the uncertainty to the levels now understood from the actual delivered functions. This is the same as the 10 coin toss post from last year.

This naturally means the control limits move. Hence, overlaying this on the J-curve like we did with the red dotted line, we can see how the range progressively narrows as each iteration delivers. The key part to this though is that you can only get this narrowing of uncertainty if you are measuring and acting on something! Some would argue waterfall measured, which is did, but it rarely acted as it more often required a huge movement and if that was attempted slower than the market changed it was set for a huge crash.

The sweet spot range is that coefficient of variation at each stage in the project life-cycle. At i = 0,1,2,3...,n and the more frequently you sample, the less likely the coefficient of variation that is being tracked will fall outside those limits, which again, are value [at risk] dependent. Indeed, if you look at this from the point of view of a dynamical system, Lyaponov exponents relating the actual delivery to the coefficient of variation are likely to give you a nice threshold measure, but that's my one conjecture :)


This is a heavy topic for most to grasp, but one that once you have the fundamentals, can massively transform the way you think about constraints and systems, especially people ones. It's only appropriate for the most advanced lean-teams. I appreciate that a lot of people will find this very scary, so you're welcome to get in touch via email at ethar [at] with specific questions, with your value measure and I'll see what I can do to help.



Post a Comment

Whadda ya say?