Goading the IT Geek: risk

Wednesday, 29 October 2014

Cone Head!!

A topic that seems to come up time and time again that folk seem to either take to or not, is the idea of an 'uncertainty cone'. I briefly touched on this in a previous post where I was violently disagreeing with Woody Zuil and Nick Killick, not on their principle of #NoEstimates, since the method has definite merit, but on the specifics of the merit that it has.

I'll take this time to explain a little more about the cone of uncertainty for those who are not familiar with it, or who would like to see a more practical example of what it is. To do so, let's consider 10 flips of a coin as the example. There are 2 to the power of 10 possible combinations of 10 head or tale results.

Before rolling the die a first time, I want you to guess what the final total may be. How many heads do you think you'll get?

Well, if you think about all the combination (0 heads, 1 head, 2 head...) and thus build a histogram of all results, you get this:

number of heads when flipping a coin 10 times - University of North Carolina (via Google images)

I'll come back to this later, but you should have a number in your head. Let's now consider the range of all possible numbers of heads at the end from this point. i.e. before the first flip. You can either get a minimum of 0 heads, or a maximum of 10 heads, or of course, anything in between right? Cool.

1st Flip

When you flip the coin the first time, it comes up say, tails. This does two crucial things:

It gives you an actual result to work with, so you now have 9 uncertain results and 1 actual result.
Now that you have flipped a tail, you cannot get 10 heads. Given that in the above histogram which applies all the time, there is only one scenario, that scenario is now out! The best you can hope for is 9 heads, given you've flipped 1 tail.

Drawing up the table of min and max heads after the first flip we can see:

2nd Flip

Flipping the coin the second time, it comes up say, heads. This also does two crucial things:

It gives you an actual result to work with, so you now have 8 uncertain results and 2 actual results.
Now that you have flipped a head, you cannot get 0 heads, because you have at least 1. Given that in the above histogram which applies all the time, there is only one scenario with 0 heads, that scenario is now out! Your rage is now 1 head to 9 heads.

Put this in the table and flip again. Follow the rule that if you flip a head, you increment the minimum by one, otherwise you have flipped a tail so decrement the maximum heads by one (because you now don't have enough flips to get the previous maximum).

<<Fast Forward>>

10-Flips

10-flips completed

So we've completed the whole 10 flips, incrementing the minimum if we get a head and decrementing the maximum if we get a tail. Surprise surprise, by the end, we have two ends that meet in the middle (which is correct, because by that point, we have 10 actual results and thus, no uncertainty at all). You can double check this by counting the number of heads you got, which is 4 in this case, against the meeting point of the maximum and minimum, which is 4. If you don't, then you've banjaxed your counting, so you might want to ask a 3 or 4 year old for help next time.

Making the Cone

From this table, we simply have to plot the flip number against the minimum and maximum number of heads. So let's do that. I've also included the trend lines, in black, which show the trajectory of the minimum and maximum numbers. The gap in the middle is the level of uncertainty or variance:

cone of uncertainty

Let's recap what happened. At the beginning, we had no idea where we were going to end up [with how many heads], aside from the range of 0 to 10 heads. As we progressed, we reduced the size of the range of possible 'options' or heads we could get and by the end, we were where we were.

Map this to typical IT projects. At the beginning, we have no idea where we're going to finish. As we progress and choices are made (which honestly do sometimes seem random), we reduce the total number of potential options that we have (which isn't always a bad thing, especially if we discount the highest waste or risk options) and eventually, we come to rest somewhere. Also, despite everything, we always know where we're are starting. We're starting 'here'. The end of the last cone (or part thereof).

And the First Histogram?

Returning to the histogram, which is built up from a knowledge of all possible combination of coin flip (it is a closed probability space mind you, which isn't always the case in software), you can see straight from this that the best options for your guess is 5 closely followed by 4 and 6 heads. The curve is a bell curve, aka Normal Distribution, and in this case it is fine.

Epilogue

The only real difference with development is the probabilities in software development are somewhat conditional, since the decisions we make are not random, but somewhat stochastic, or at least Bayesian, since we ourselves learn and make better decisions or become more productive, which help us descend the cone faster. It's good enough, so should still be used, but if you're a masochist, then I best at least tell you that something has recently come to my attention in the field of theoretical statistics which may be useful for the part that is currently quantified normally. That something is the Tracy-Widom distributions, which appear ever to slightly skewed to the right. It's not something I've used [yet] and it is somewhat advanced, but I am excited to see where this field goes.

Monday, 16 June 2014

Network Analysis: The 2nd Coming

Many moons ago, when I was young and you were even younger... that's not strictly true, I am probably younger than a lot of you even if the face in my mirror doesn't show it... techniques such as PERT and network analysis were fairly mainstream. Indeed, process improvement methods such as Kaizen and Six-Sigma still use network analysis as a mainstream tool to flesh out some of the flow in much the same way as modern systemic flow diagrams are used to track flow of working software from business ownership to production.

A fellow HiveMind expert practitioner, Ian Carroll uses presents systemic flow mapping in his website, which is well worth a read if you're not familiar with the concepts. He also expands on the evolution of this through different stages or 'mindsets', each of which brings benefit and adds to process maturity.

There are many benefits to mapping your process this way. Flow is one thing, as following that chain back to front you can find the bottlenecks in your system. However, as with a lot of agile techniques, a lesser known benefit is that it allows you to understand risk very well. Systemic flow gives these classic techniques of applied mathematics a new lease of life, especially when considering it as part of base-lining business architecture or business process during a transformation programme and using systems thinking approach.

The Problem

Having mapped a systemic flow, or when creating a classic PERT chart in ye olde world, you often find a series of dependent tasks. That in itself is cool and systemic flow mapping doesn't add much that's new in that regard. In PERT, each stage of a chain had a probability of hitting its expected date m (the 50% threshold) and a standard deviation of s. I'll save the details of that for another day, but the important thing to note is there is a level of risk around this and this risk would propagate through a chain.

Now substitute the term 'statistical dependence for 'risk' in my previous sentence and read it to yourself. I hope you can see how this more general concept can be applied to any chain of any type and can help you understand trade-offs as well, such as parallel tasks versus risk.

To illustrate this, consider the two chains below. The GO LIVE is specified for a particular date in the future, whether a hard deadline through a compliance or regulatory reporting requirement, competitive advantage in seasonal industries or other such reason.

Sequential Chains

Sequential task processing

Here there are 3 tasks that need to be completed. The percentages show the probability that a task will complete 'on time' (either in waterfall projects, or indeed, delivering those tasks in a sprint). The tasks are effectively mutually exclusive, given they don't occupy the same probability space, but they do have a dependency, which means that the probability of their success is dependent on the task before. For those conversant with statistics, you'll recognise this as:

This level of uncertainty is normally gleaned from previous performance and 'experience'. For example, manufacturing processes or system design activities. I previously covered how to reduce these risks by chopping up tasks into thin slices, as improves the variance and hence certainty.

So considering the chance of Going live on time, all tasks have to complete on time. So the probability of completing on time is simply the multiple of all the success probabilities, given these conditions:

Sequential Conditional Probabilties

So a 28% chance of completing on time!

Shock horror!

Parallel Tasks

"Cha-HAAAA! We'll just run the tasks in parallel!"...

...I hear you cry. OK, maybe not quite like that (stop with the mock Kung-fu already!). The point is, contrary to popular believe, this only improves the probability of success if running those tasks in parallel then gives each task a greater chance of completing before the GO LIVE date! After all, they all still have to complete:

parallel version of the same tasks

In this scenario, the go live can only happen when all individual components come in at the same time. Hence, we can model this with non-conditional probabilities and yet again, the chance of hitting the deadline is:

However, most of the time this does result in some improvement in probability of success, but not usually as much as you think, as workload expands to fill the time available for it (Parkinsons Law). It's what project crashing was in the original PRINCE method, but because of this darn law, it never changed the risk profile (aka probability density function) and because the tasks were so big, the uncertainty around them was extremely high anyway.

I have come across parallel tasks like this several times, where say, Task 1 is the hardware platform, Task 2 is the code and Task 3 is a data migration. This is risky!

The Solution

As per vertical slicing, the key is to segment the tasks so that each can be deployed as a separate piece of work, able to deliver value to the organisation even if the rest of the project doesn't make it, is canned, or is late. It's about breaking the dependencies all the way along the chain, so that the statistical fluctuations of ToC are removed (so if the statistical fluctuations do happen, and they will, who cares?) for those of you familiar with theory of constraints or queuing theory. Looking at how this would works:

Three separate deployments to live

This time, tasks 1, 2 and 3 all deploy functional projects into production with the same risks as before. Looking at the individual risks, they are 50%, 80% and 70% respectively. Given the overall success rate of both the previous methods was 28%, this is a significant improvement, without even considering the real life benefits of greater certainty.

You can apply this thinking to much more complex streams of work. I'll leave the following exercise for you readers out there. Take note that the conditional probability 'carried over' to the next task has obviously got to be the same for each successive task. For example, Task 1 has a 60% chance of coming in on time and hence Tasks 2 and 3 both have the same probability coming in. I know how keen you are to give this a go ;)

Give this a go!

Conclusion

As you can see, where an organisation hasn't made it to the "Mature Synergistic Mindset" that Ian Carroll introduces in his blog (i.e. vertical slices) the structuring of projects and programmes can rely very heavily on this sort of process to find where stuff goes. Risk is only one aspect to this. You can use the same technique, where the arrows map waste time (i.e. time spent in inventory) and then use say, IPFP or linear programming with appropriate constraints to find an optimum point.

However, be careful that this is an in-between technique, not the goal. The goal isn't to have the analysis, it is to make your process more efficient by reducing dependency, and the impact of statistical fluctuation on your project.

Pages