Sunday, 19 April 2015

Lowering Chances, Mitigating Risks or Both?

I was talking at Lean-Agile Manchester this week. It was a choc-full event which necessitated the adoption of extra chairs.

A number of the XP Manchester folk were in, which is always entertaining, since the two groups have overlapping common interests but as with many agile vs lean schools, we don't necessarily come to an agreement on the best way forward for things.

There were some great questions through the night! Including the ones form the hecklers. It centred around data from some graphs I showed from a previous blog post tried not to go into the maths of due to the typical spread of the audience. So I offered to take it offline so as not to bore the audience, but there wasn't the appetite form the questioner, so smackdown happened and they then agreed to take it offline but never got back to me, darn it! (#invitestillopen)

Background

What's the reason for the graphs?

Several years ago, I was working in a company which was on the proverbial agile journey. They were still thinking in very big-design ways and were managing programmes of work through standard programme and project management methods. The company's attempt to have conversations around agile programming were not really working and the second attempt at them (i.e. just do the work and they will come) didn't reach far enough for anyone in positions of enough power to take the effort seriously. This resulted in a somewhat disconnected hybrid method which saw lower levels doing the work with upper levels of management and EA imposing design on the teams, with PMs backing up the EAs as authority on that work.

In addition to that, teams spent the vast majority of retrospective time generating new ideas for working together (good, bad, change) including grouping tasks, voting and setting options for the next iteration. However, no retrospective ever came back to check that these did indeed improve the process and any overhead we introduced as part of the each task was actually worth it. Further actions just built on top of these actions and you gradually built up greater overhead in each iteration.

The team had successfully implemented WIP limits (though that started off quite painfully) and were measuring cycle time and throughput since this was easy for them to visualise in a JIRA Dashboard. We saw a burn down but it wasn't clear whether our flow was any good and indeed, whether we were improving at all.

Add to this the need from classical project management to get an idea of the length of time things would take as well programme management to align the streams of work meant we had to get to know something about whether we can actually hit the hard deadline. Those that know me know I think aligning work the SAFe way or classical PERT way introduces inherent risks, but the environment was what it was and each change begins with a small step, not a 'Big-Destroy Enterprise Programme'. After all, as a dev, you're an easy replacement anyway to that style of culture (not that you necessarily have to worry about it in the IT game but it's an important consideration).

Who wanted it?

The graph/points estimation wasn't necessarily to get the team to improve delivery per se. That was not the purpose of the exercise. It was to give confidence that when we were challenged to produce an estimate, we could do so reliably and provide some confidence to the supporting classical thinking personnel we're talking to that we can and have delivered x features in t. It was to lower the variation and give confidence to those who wanted to support us that we could deliver and were improving. This was a tool to help them do that and get the buy in they needed, which took half an hour a week for someone to do (indeed, I did it - but any scrum-master or tech lead can do it in an enterprise context).

Why should you care?

The answer depends on the context you work in. In an agile-sympathetic environment, this isn't really necessary at all. After all, everyone is confidence and comfortable with change. However, where a hybrid exist or companies are transitioning, sometimes these conversations are necessary. Later on, they may not be relevant any more. Enterprises can evolve as much as people do.

The Follow-up Questions

During the talk, some questions were asked and I agreed to produce some follow-up graphs from the data. In order to understand some parts of this, I'd suggest you go back and read the method presented in that blog post, as this will explain what look like 2-pt and 5-p story 'anomalies' as we shifted our understanding of story sizes.

Cone of Uncertainty - Variation Over-time

Specifically, taking the variation between our expectation and actual delivery, plotting it and calculating the Coefficient of Variation to standardise the scales of the graphs, we can plot the change in the coefficient over time. What we see for each story size (in points) is this:


Story point variation (CV) and polynomial trend line

To keep things simple(r), I've added a cubic polynomial trend line to illustrate a smoothed variation. I haven't done anything else to the trend line and Excel has chosen the shape that minimises the sum of squares. We can relate actual uncertainty to the variation in story point figures. The same downward trend on variation is seen in linear and logarithmic trend lines. As you can see, most trends show the reduction in uncertainty as we recalibrate our positions.

Limitations

The only exception to the general trends are the 8-pt story sizes, which curve slightly upwards (not significantly enough over linear to be concerned about). Additionally, due to the team rightly reducing larger 13 point stories into smaller stories, there are only a few 13 points stories in the dataset. I argued there were not enough to come to a conclusion or indeed worry about going forward, especially most became 8-point stories as a natural part of story splitting and recalibration (again, read the previous blog post).

Conclusion

As I explained in the talk the other day, estimation such as this isn't an end goal. This is a technique in the repertoire to provide confidence for those who can support us to become more agile. After all. working in the Enterprise Architecture space necessitates communicating in many different companies, with many different types of stakeholder, including non-technical personnel/those without a software development background. Not ever EA problem is a software development problem. Indeed, to approach it from that perspective architects before it's necessary, if it needs it at all!

Digression

As an example, consider walking skeletons, which can be just as problematic in code, since they make explicit choices on the technology stack way before a decisions is needed on the suitability or otherwise of the tech, but they are useful tools to experiment when you have a tech stack already and gain certainty. However, employing just a walking skeleton is like having Maslow's Hammer. It risks introducing technology into a non-existent current stack when the basics of what people want are unknown. In this case, you don't need a skeleton per se. Just throw together a UI mock up and deploy that to a static environment (even a file system) to get people using it to input data that never gets stored. This can be done in a few minutes compared to creating a walking skeleton which can take a couple of hours to get the same amount of feedback and can be potentially constrained by infrastructure problems and will require some prerequisite work. So bang for buck, if the question is trying to find out of Henry Ford's customers wanted faster horses, this would be cheaper to do than a walking skeleton and yields just as much value. The second meeting can fill this out with a skeleton if you want, since by this point you have more information to base choices on.

Risk and Sensitivity

You have two non-mutually exclusive choices to deal with risk. The first is to reduce the chance of it occurring, which this technique fits into. The other is to mitigate the impact should the risk occur. Which this doesn't address and isn't intended to. So this can only be one of many tools in the team's arsenal in dealing with tracking, recalibration and risk reduction and as we can see, there are specific scenarios this addresses really well. The question is, what other techniques exist to address the same problem?

Further Updates

I will answer some of the other questions in time and post them as updates to this blog.

Wednesday, 1 April 2015

Lean-Agile Metrics: Like it or Not, Stats Rules!

I've been wanting to write this blog for the best part of 4 years (I have a few of these I've been meaning to write up to be fair). I've only just round to finally doing the necessary mathematical proof...

Wait, where are you going? Come back!!

*sigh*

If you don't listen to this one, you likely aren't data driving your retros, aren't effectively self-managing and could stall your agile transformations. It's not just about the coding you know! You can't embrace change if you don't know what is changing around you!

What prompted this?

Ignoring the shoes I'm wearing... wait, you mean me to write it up?... Ah, yeah. That...

*shifty look*

It was a LinkedIn group discussion, as it yet again has become abundantly clear that we're missing some understanding around lean in the software world.

Tell you what, I'll make it simple. I'll use terms you're used to before you freak out. I'll use the context of software development, since this is an arena I'm intimately familiar with. The key bit to concentrate on is the cycle-time.

Cycle-Time isn't quite what you think

Cycle-time as we know it is the average time take to process a thing. From the point of view of software, let's consider a #NoEstimate or single size story-point ticket (I prefer to move beyond that, but for now, this will do) on a super simple Kanban board of 'Doing' and 'Done'. However, this generalises to any type of flow.

Single stage Kanban board


Each item's individual lead time in days say, can be modelled as shown just under the stage box. This states that cycle time t for any individual ticket is the average cycle time (t 'bar') of ALL tickets through this stage plus a variation (delta-t) around it. For example, if the average cycle-time is 5 days and this task takes 6 days, the variation is 6 - 5 = 1. This can also be rewritten as 6 = 5 + 1 which describes that the cycle-time for a task is the average cycle-time for that stage plus the variation.

However, we can't make a decision on one data point. That is like flipping a coin, getting heads and stating it will always be heads. So we run it again and again, which happens naturally in an iteration as you deliver tickets and ideally, you'll deliver at least 25 ticket which gives us a good level of certainty in any results we draw at the retro... you are data-driving your retros aren't you? ;-)  If you are not delivering 25, then this may be an opportunity to recalibrate by resizing the stories you have so that you can get enough data points, which naturally makes the variance on each story smaller anyway. I'll be giving a talk on this soon (shameless plug), so if you're in Manchester in April, pop in to Lean-Agile Manchester and I'll try to explain it in a slightly friendlier way... but not much. It's just the way I roll.

For the sake of illustration, I've used just 5 samples so you can see how it fits together. You get an average from this, which comes out as 31 in the example and the average deviation which is 2.828 (2 x square root of 2 on the right). The coefficient of variation is simply the standard deviation divided by the average which is in this case, 9.1% of a day. Pretty small.

Kanban: Cycle-Time for Multiple Stages

This small deviation isn't the same for larger exercises. If we chain a series of these together, say into a 3 stage Kanban board (Elaborate, Doing, QA, Done) we get

3 stage Kanban

Again, we can determine the variation as before, but this time, the total variation is influenced by the earliest finish time of the first task, to latest finish time of the final task. The proof is above, and the numbers tell the story. 8.7 / 83.2 = 10.46% of a day, which is an increase in the coefficient of variation of 14.9% for this Kanban configuration and the cycles-time through each stage. You'll note I deliberately didn't compare means, since there is nearly 3 times as much 'work' going on and I didn't directly compare variances with each other, since we know the variance is the earliest start time to the latest finish time on a longer chain.

The coefficient of variation basically normalises the standard deviation relative to the size of the tasks at hand. Hence, this is the best comparator and is something that can be used between teams to compare team certainty if you feel like being dark and monitoring at programme level.

Real World Applications

The beauty of this is that it scales 'fractally'. The maths can apply to a person, a stage in a board, a team, a business vertical/systemic flow of multiple teams, a programme etc. both classical and modern agile groups have been guilty of just concentrating on the 'average throughput' and 'average cycle time' when there comes a point where this doesn't wash any more and consistency becomes key. Hence, understanding and controlling for the variation allows you to gain a level of predictability you otherwise wouldn't achieve.

Basically, the lower the coefficient of variation relative to the costs-benefit of getting there, the better! This another reason why I agree with a number of commentators who propose that we include [business] value in stories, since this hard-to-say measure is in there from the start.

Sweet Spot

This very much depends on a host of factors, including the organisation's appetite for risk, the value they hope to achieve, when they go live to achieve it any contingency budget and of course, how well the team recalibrate along the way. Indeed, I'd even go so far as to say it's a range of values.

Hence, in software, there are practises such as Continuous Deployment and deploying MVPs which are better suited to this than most, value increments are zero before delivery and A/B-testing new changes should aims to improve the delivered value relative to the uncertainty. So anything I'd say here would be a conjecture without a theoretical base, but I'll give you one conjecture.

This is really a link between the expected path you could take and the amount of variance to the point that the variance breaches a series of control limits. In older, larger batch flows, with long lead times, this compounding variance causes a very wide variation by the end of a project. This is the cone of uncertainty. I've covered this before in in the faces of the #NoEstimates  movement last year.

To understand how the cone of uncertainty applies here, let's put ourselves at the origin and look toward deadline date. This it the solid red line in the bottom graph. The further forward we look, the more uncertain looks our future.



The above shows two graphs aligned to each other. The top is the usual J-Curve and the lines around it, green or red, show the uncertainty as it would be defined by the coefficient of variation, since that's the measure of dispersion as tickets and value accumulate.

In classical environments (top graph, red dotted line), those limits are wide as the uncertainty is wide and are still regularly breached. By contrast, the green dotted lines show the coefficient after each iteration has complete and we reassess the coefficients during each retrospective. To understand that uncertainty, curve, look at the bottom graph. As we progress into projects, each iteration we deliver is not uncertain any more, since we've delivered it. It's out! The only uncertainty that remains is the rest of the project which often 'resets' the uncertainty to the levels now understood from the actual delivered functions. This is the same as the 10 coin toss post from last year.

This naturally means the control limits move. Hence, overlaying this on the J-curve like we did with the red dotted line, we can see how the range progressively narrows as each iteration delivers. The key part to this though is that you can only get this narrowing of uncertainty if you are measuring and acting on something! Some would argue waterfall measured, which is did, but it rarely acted as it more often required a huge movement and if that was attempted slower than the market changed it was set for a huge crash.

The sweet spot range is that coefficient of variation at each stage in the project life-cycle. At i = 0,1,2,3...,n and the more frequently you sample, the less likely the coefficient of variation that is being tracked will fall outside those limits, which again, are value [at risk] dependent. Indeed, if you look at this from the point of view of a dynamical system, Lyaponov exponents relating the actual delivery to the coefficient of variation are likely to give you a nice threshold measure, but that's my one conjecture :)


Conclusion 

This is a heavy topic for most to grasp, but one that once you have the fundamentals, can massively transform the way you think about constraints and systems, especially people ones. It's only appropriate for the most advanced lean-teams. I appreciate that a lot of people will find this very scary, so you're welcome to get in touch via email at ethar [at] axelisys.co.uk with specific questions, with your value measure and I'll see what I can do to help.



E