Saturday, 22 August 2015

...And the battle rages on?

It's 8am here in the UK and I am still simmering over a twitter storm from about 3am my time. I made the mistake of looking at my phone after going to the bathroom (I washed my hands) and noticed more on the #NoEstimates conversation.

It all centred around the heated discussion the other day on #NoEstimates, except this time it got personal, with a few members of the discussion choosing to do the tabloid headline thing of taking one part of some of my material out of context and then basically making libellous inferences. I don't mind a heated debate at all, as long as it stays fair, but I was somewhat disgusted with the actions of a few folk, especially since they purport to work with probability and statistics, which folk who know me well, know is exactly my area of specialism in this domain. If you want to read the full article on my LinkedIn blog post and see how it's out of context, it's here, as opposed to reading the tabloid rubbish. They obviously TLDR or were out for the vendetta as opposed to admit where they were wrong. Too much riding on it I guess.

Needless to say, twitter is a really poor forum for these sorts of discussion (which is pretty much the only thing me and @WoodyZuill agree on). So I figured I'd explain it here, in a bit more detail, then post it back, as those folk are hell bent on not admitting their lack of understanding and fighting with people on 'their side' of the debate and to do that, needs a lot more than 140 characters to explain the gaps. However, before we get into how they fit within the discussion of estimates, we need to bridge some gaps and answer some criticisms.

Buzzwords

Now, I hate 'buzzwords' as much as the next guy. However, we in IT are probably more guilty of creating and using them than any other industry. Indeed, particular communities of practise create buzzwords that only those in those communities understand. Therefore it is a kind of 'private joke'. However, here's the rub, you can't get away from it. They are always necessary to succinctly communicate a concept. 'eXtreme programming', 'design patterns', 'TDD', 'Refactor' they are all examples of words used to communicate concepts in our domain. They mean nothing outside it to anyone not connected to it. So those people see it as a 'buzzword'. Is that their problem or ours?

Similarly, because we in software development are often in no way connected to accountancy and finance, when see words like 'NPV', 'IRR', 'ROR' we in the main don't get an illustration of the concepts in our minds. Hence, we see them as buzzwords. Their problem or ours?

The moment of violent agreement

So, hopefully we should now be on the same page around 'buzzwords'. Cool?

No? Do we not like hearing that?

Grow up!

Estimates (or None)

When working in an organisation, you're always going to have to justify your work's existence (sometimes even your salary/fee). It's how businesses work. Why are we doing this project? What is the business case? How much is it going to cost? What benefit am I getting out of it? The answers to all these questions are all estimates. Yes, we hate them because we are often held to them. However, being held to them is a people problem, not a problem with estimates. Business are held to estimates all the time!

Estimate Risk

Estimates are naturally probabilistic. What is worse is that the further out you look, more uncertain that probability becomes. To expand on a previous post from the past, using insignificant data volume as an example, if you imaging you have to deliver one small task and you estimate it to take 2 days and it takes 3 days, you have one data point, with one variation of 1-day (or 50% of it's expected duration - average absolute variation of 1-day). If you then get another task and you estimate it to be the same size and it takes 1-day, then you have a range of total variation of   -1 day (delivered early) to +1 day (delivered late) which is 2 days in total. You can't make a decision on one data point.

The average absolute deviation, which is is the average across the two, is 2/2 = 1-day. That's just standard statistics. Nothing special there. You can relate that to standard deviation really easily (sum of the residual differences) and this comes out as the square root of 2, since the mean of 3 days and 1 day is 2 and the variance is 2-days. Standard deviation is the square root of variance, ergo...

Now, let's suppose you classically estimate ten such elements (deliberately staying away from 'story' as to me, a story is an end-to-end, stand alone piece, so shouldn't have a classical dependency per se) in a dependency chain on a critical path and you don't improve your process to attain consistency, the total absolute variation goes from all of the tasks being delivered early, to all of them being delivered late. From the mean (2 x 10 = 20), this becomes a range of -10-days (1 day early for each task) to +10-days (1 day late for each task) a total absolute deviation for the whole project of 20-days on a 20-day expectation, even though the individual tasks still have an average total deviation of 1-day! 

Let's now imagine we're actually delivered stuff and look at the variation of the tasks remaining after these first 2 tasks on the board have been delivered and their variation was as stated previously. Those are now not uncertain. They have been delivered. There is no future uncertainty about those tasks and of course, no need to further estimate them. The only variation now exists for the remaining 8 tasks on the board. Again, 1-day average absolute variation, means the 8 tasks remaining now have a total systemic (i.e. whole project) variation of -8 to +8 days (16-days). So you can see the variation reduce as you deliver stuff. 

It's reduction makes that darn cone to look like it does! Since you're now 4 days into the project. You can plot that on a graph.The first point of uncertainty was +10 and -10 on day zero. 4 days in, this has reduces to +8 and -8. You keep going across the time on the x-axis as you deliver stuff and you always get it finishing on a final point. After all, once you have delivered everything, you have no more variation to contend with. Zero, zilch, nada!

example of a cone of uncertainty (src. wikipedia)

There is no getting away from this fact. It's as much of a fact as the law of gravity. To do anything that goes against it without understanding it, is like this. Which whilst fun and harmless (some might consider 'pioneering'), killed people when flight was first invented and in any case, spends money pointlessly, which is waste. We are in a position where we know better, why reinvent the wheel?

What does this have to do with Estimates?

Right, here is where we get back to the meat of the matter. 'How do estimates help me deliver better software'. 

In short, as far as software development alone is concerned, it doesn't. However, and this is the bit that ired me because people just didn't seem to want to hear it, software development by itself, is useless. We use software to solve problems. Without the problem, there is no need for software (indeed, there is no need for any solution). However, don't forget organisations themselves solve client problems and those clients themselves solve problems potentially for other clients! So software development doesn't exist in isolation. If you think it does, then you exist in the very silo mentality that you purport to want to break down. Do you not see the hypocrisy in this? I am sure many of the business readers do!

Again, grow up!

Teams Should Aim to use the closeness of their estimate and actual delivery performance as an informal internal indicator of the level of understanding of the codebase and their performance with it. No more. Businesses should not use the estimate to hold the team to account as there is a high level of variance around any numbers and the bigger the system being built, especially if it has a number of components in a chain, the worse the variance will be.

Improving?

The way to improve on estimates totally depends on the way the team itself works. Let's assume the team carried out retrospectives. This is their chance to employ practises to improve the way they work, quality of the work and/or pace at which they develop software. As a rule, the team can't go faster than it can go, but the quality of the code and the alignment of the team naturally affects the flow of tasks carried through to 'done' (production, live, whatever). 

Blockers and bugs naturally affect the flow of work through the team. Reducing them, improves the flow of work, as contention for the 'story time' of the team, which is a constrained resource, then isn't there. If you don't track bugs/blockers, then you are likely losing time (and money, if you're not working for free) as well as losing out on opportunity costs or potential income (probabilistic) in value for the business be delaying deployment into done and you'll have no idea if that applies or not. If it does, the business is getting hit on two fronts. 
  1. Delivering value later because you are fixing bugs in earlier processes
  2. Costing more money to deliver a feature because you are using 'story time' to fix bugs in earlier releases
The combination of the effects of the first and the second hits your NPV and hence, directly affects your IRR and also ROR and ROI (buzzword alert). However, most developers are too far away from finance to understand this and many who purport to understand it, don't.

How can methods like Kanban and ToC help?

OK, so it's no secret the IT world, the one I inhabit, has an extremely poor understanding of flow and indeed, does kanban 'wrong' relative to the real way lean happens in manufacturing and TPS. Kanban ultimately aims to optimise flow of X. Flow of stories, tickets, manufacturing items, cars, whatever.

My scribbles on importance of understanding variance from previous posts

The process is stochastic in nature, so there is no certainty around it but what most folk don't understand is that kanban inherently has naturally got waste in the process. Movement of items is one of the recognized 7 types of Muda waste

- Unnecessary transport and handling of goods
- Unnecessary motion of employees

Transportation of goods (read stories) is a movement of one item from one stage, to another. Often a development context to a QA one or into live. There is a change of 'mental model' at that point, from one mindset, say, development, to another, say QA. That is a form of context switch, just not using time, which shouldn't be new (after all, context switching happens with stack frames on CPUs when multi-threading - Take out and store the stack frame for one thread, introduce the frame of another) and just like all context switching, it never costs nothing to do.

In addition, as per ToC (buzzword alert), there is inventory, and indeed, a 'wait time' between stages where the item is ready to be pulled on demand can be considered an implied 'inventory' stage. This introduces another cost. Usually in not delivering the software into a production environment so it starts to yield knowledge or indeed, it's value.

Run a dojo and try this. Take one developer and make them code and QA one scenario. Time how long it takes to deploy that one thing into a production environment. Then take another developer and a tester and make them code one scenario and then QA that one scenarios in sequence. Time how long it takes. You'll never get faster with the QA and the dev. The cost to switch the task naturally elongates the cycle-time of the software delivery of that one task. If you did 10 tasks like this in an iteration, all sequential and the dev didn't pick up another one until the QA signed it off for live, then the throughput would be just 10 x the cycle time.

In short, introducing a kanban stage has introduced waste! You'd lose time/money as a business.

What's the benefit for this cost?  What's the trade-off?

To answer @PeterKretzman's retort

Still think so now it's been explained?

The systemic trade-off is pipelining tasks to make team delivery faster ( to be delivered by the team). Each stage can pick up a 'ready' task from the previous stage when they've finished their main involvement in their stage/phase of the story's flow through the pipeline.

Run the same experiment with 10 scenarios and this time the dev can pick up a task whilst the QA is testing the previous one. Suddenly this makes much more sense and your throughput, whilst still related to cycle-time, is not wholly dependent on it. So you are delivering the 10 scenarios much faster than you would do if it was sequential. After all CPUs use pipelining as a standard optimisation mechanism. This is obviously why we do what we do in the way that we do it in software, lean manufacturing, lean construction or anything else.

Can you get too small?

As I demonstrated in a talk I gave last year, the short answer is yes. If you keep adding columns to the point it doesn't add value i.e. isn't a step in the value chain (buzzword alert) then all you are introducing is the cost of the context switch out of that stage, with no value add, which then costs both time and money. Indeed, if you can run tasks wholly in parallel pipelines, it's much faster than kanban, but requires resources to be allocated accordingly.

To see this in the previous example, introduce a post-QA stage called 'stage' and all they do is sign a pieces of paper and then run a manual deployment. There is no value add in that process, since there are no other contentions for the 'stage' process in the organisation as it is at that moment in time. However, you're paying a post-QA personnel member money to stage it.


Conclusion

I hope folk can now see where I am coming from. However, make no mistake, I am extremely disappointed in the quality of understanding around this, the hypocrisy that exists in the field and the low down, dirty tabloid style tricks that some folk will stoop to just because they've never come across such a scenario, and as if they know it all from all organisations everywhere. The #NoEstimates movement is sadly littered with such folk who frankly seem to show a distinct lack of understanding of anything related to the field. Many show a distinct unwillingness to engage, inherently overly political standpoints to avoid having to admit a failing, limited success or understanding. After all, the only people who'd want to sell #NoEstimates if it doesn't mean anything are the #NoEstimates movement. It's a real shame as it's something I think needs to be discussed with a wider audience and as I have said previously, it has massive potential, but is being taken down a black hole with pointless discussion and constant justification across the board.

After all, if we can't constantly be responsibly critical of our field, our means of operation, then we can never ever improve what we do?


E

Tuesday, 18 August 2015

Story Points: Another tool, Not a Hammer!

*bang head on desk*

Nope.

*bangs head on desk again*

Nope. Still can't knock that alleged sense into me.

Today has been one of those days that started off OK, then I saw a conversation on twitter which got me all het-up (not necessarily in a bad way). It seems I'm returning yet again to the issue of story points and the #NoEstimates #BeyondEstimates movement. I've covered so many topics in this space it's getting frankly tedious to repeat myself. If you're interested in the kettle boiling, see:




What Ignited the Blue Touch Paper

I'm not all that bothered about story points. I use them a lot as they were intended. Relatively sizing tasks. I often also find myself using T-shirt sizing or occasionally Size, Complexity and Wooliness. They all have their merits depending on what the teams I work with decide they wish to use. The biggest problem is when I find some proponents of various methods, including of course Scrum, XP, RUP, Waterfall etc. trying to impose their way of thinking as the right way of thinking. We're just as guilty of this in the agile world as the 'waterfall' managers we often criticise.

Truth be told, with estimates, I don't care a jot which we use. If you believe every situation is different, then you should expect that the tools used may well be different and that's OK.

The problem we have is that many folk are critical of story points as they are used as a stick to beat developers with. If you've ever worked in business or perhaps even running a charity, then you'd know that this is only one of many possible outcomes of why estimates are important. It's just that developers seem to take offence to the idea more than most. Also, bear in mind the maturity of a team creates or negates the need for precise estimates. Indeed, if a DevOps team is mature enough to delivery through MVP (lean-thinking/startup) then adhering to 'hard' estimates is much less important as the outcome of a miss is simply the value in the missed version of the software, not value overall, since the client already has something they can work with. However, I digress...

Story Points to Reality: Parametric Equations

Many proponents standing against story points seem to fail to realise that a story point link to the real world exists whether we like it or not. A story takes time to do. You don't have negative time and you can't carry out zero duration tasks. It also doesn't cost zero, because the developers wages or rates are being paid (yes, you ar coting the business money - Sorry but it's true. Even if you work for free and are late you lose the company an opportunity cost). That is just as much a reality as the law of gravity. Just like gravity, your mind has to escape to outer space to escape that reality. The value a story delivers can also be quantified and analysed statistically. All of these re-quantifications have units of measure which can legitimately be attached to the parameter.

To recap, in A-level maths (senior high for those in the US and a heck of a lot younger in many other countries), most people should have come across the concept of a parametric equation. It usually includes a variable which itself has no units to simplify the process of reasoning about the model at hand. Consequently, it allows for much easier expression of much more complex structures and concepts in easier to use form. In a tenuous way, it's akin to the mathematical equivalent of using terms such as SOLID, IoC, TDD, BDD etc. since just using these words helps communicate ideas where communication is the goal. Just like in the software world, there is often a transformation in and out of the real world context of parametric equations (read, parameters). This is a normal, analytical approach to many problems in many more industries than software development or engineering. The only difference between these is that parametric equations contain a stochastic component when working with flow of tasks across a board. That doesn't often change the approach needed, just the skill of the person using them (which may or may not be desirable). But guess what? So do story points.

Crucially, and this is the bit that gets me wound up, just because people choose to play with the numbers incorrectly, which many project managers, scrum masters and product owners do, doesn't invalidate the analytical position, nor does it invalidate the statistics around these numbers. It also winds me up because it is very often the same folk who have made these statements that never followed process when more formal methods of software development were used. They just want to code. Lots of great noises, but when it's time to walk the walk...

*breathe*

Story point are just a tool. A tool like any other. If you misuse a tool, who is at fault?

Now #NoEstimates  #BeyondEstimates. I'd love for us to drop the NoEstimates term. It's got the dev world in the space of the top of the Gartner hype curve for absolutely no reason. #BeyondEstimates is a much better term for selling it, sure, but it also communicates the intent much much better. It's a term Woody Zuill came up with himself, which I think perfectly positions and communicates the goal of the movement. NoEstimates isn't about not estimating. It's about always looking to improve on estimates. So '#NoEstimates' is one of the worst phrases you can use to describe it. Plus, just like any tool, I suspect it's misuse will leave you in no better position than the standard evolving estimation processes, just with less understanding of where it all went wrong.

That said, overly precise estimates will leave you in worse positions than you'd otherwise be in. Get good at deciding how much effort needs to go into estimating things.

All Forecasts are Wrong

Yes, but what do you mean by 'wrong'? Wrong as in you'll never hit it? Yes. However, what's an acceptable deviation?

For example, do you get out and measure your parking space at work before then renting a fork lift truck to lift your car and spending 8 hours positioning it perfectly in the space with millimeter precision, only to have to get into it at the end of that day to go straight home? No, I suspect not. You estimate the position of the car in the space, sample the space to make sure you can get out or are in the spot and there we go. Job done. 15 seconds.

The amount of waste is the amount of unusable extra space around your car and even that definition depends on who you are. Statistically, most people are likely get into that space on their first try. Second and third try includes almost everyone. However, nobody attempts to just crash their car into that spot. That is good enough. Is it 'wrong' if measured by the deviation from the very center of the space? It certainly is! Is it good enough for the job? Yes it certainly is.

Is this your #NoEstimates approach?
In reality, the #BeyondEstimates movement is right to ask the question of the role of estimation in software development projects and beyond (pun intended). What I don't want to see though is people blame estimation methods or worse, maths, for the failings of people. That was agile c2000+ when most folk adopted the wrong ideas around agility and I can't stand to see another 10 years lost to needless bad practice.

This all means that teams have to get better at managing variation. Product owners have to get better at managing their own 'expectation' around that variation and both have to keep track of the scope of their deliverables and how likely they are hitting the commitments they make. Overall the culture has to support pivots, backtracking and encourage the raising of issues and also the organisation must be able to support changes of direction. This is a much bigger problem than either 'party' can solve alone.

</rant>

Monday, 3 August 2015

Fail: AWS EFS (Preview) on AWS EC2 Windows Instance

Gah! :( Poor show this morning.

Was hoping to write up a disk performance comparison of AWS EFS against EBS on Windows. Alas it wasn't to be. I am still writing this failure up as it may be useful to someone, but this story doesn't have a happy ending.

TL;DR; AWS EFS and Windows Server 2012+ are very incompatible due to lack of NFSv4 compatibility

I was aware of the limited NFS version support by AWS (i.e. only 4, whilst Windows only supports 2, 3 and 4.1). Yet, I wanted to see what could be done. Plus, it all works fine in Linux. Let's walk through it.

AWS Elastic File System

After the AWS Summit this year, which I was disappointed I couldn't attend, I was lucky enough to attend a replay at Amazon's offices at the beginning of June in London. During it, I got to hear about the Elastic File System preview. It was billed as the missing link in the AWS cloud storage offering, allowed flexible storage options to be delivered and charged without having to re-provision storage or manually add another NAS drive.

Creating and mounting the EFS storage instance in Linux is easy enough. It's still in preview in the US-West region (Oregon).

1. Walk through the EFS wizard to create your EC2 instance. Amazon recommend creating this across all the availability zones in a region. You certainly don't have to, especially as they are charged at 30 cents (roughly 18.75p) per GB-month.

AWS Elastic File System (EFS)
2. Set the security groups on mount points. It's important to note that the security groups will manage the connection between the EFS mount point and the EC2 instance you'll create.

Adding Mount point security groups




3. Spin the shizzle! Note, the full creation process can take about 5 minutes before finally marking the EFS volume as available.
When created, it's still spinning up

Creating the mount points

4. Add the NFS port (2049) to the security group in the usual manner.

5. It was at the point of spinning up and logging in to a Windows EC2 instance that the process went wrong. You could reach the mount point through the IP address. You could also use the 'net use' command to access the NFS volume using the IP address. However, despite the picture below, you couldn't write anything to it. Hence, I couldn't really test it, let along run the performance tests on it.


All the utilities, excited and ready to run :)




6. In order to test whether it was me, I decided to spin up a Linux instance to try to access the NFS volume. I'd do this to check if I could access it from there, create a file and read it on the Windows server. The first part of that went without a hitch.

  1. Spin up the EC2 instance
  2. Add the EC2 Instance's security group to the EFS mount points' security group (The screen is the same as shown above) 
  3. Create a private key using puttyGen from 
  4. Access it using the AWS Linux ec2-user via Putty
  5. Install the NFSv4 client package into the AWS Linux instance using yum via an elevated command > "sudo yum install nfs-utils"
  6. Create directory to mount it to
  7. Mount the file system onto that directory using "mount -t nfs4 <mount>:/ /<directory>" 
Voila! Perfect!

connecting to AWS Linux instance via Putty


NFS4 installed using Yum

The only other thing I did was chmod the directory to get access (you can take ownership of it as well) and I touched a file into the directory and sure enough...

Windows fail! :(

So side-by-side. You can see Linux and Windows don't have the same access. Refreshing or reconnecting makes no difference in Windows. I changed access to the file to everyone and still no luck. However, Linux had absolutely no issues. Just for fun, I removed the security group access and tested the mounting again and it worked fine to block access.

When you've not set security group on EFS mount points

Conclusion

At the time of writing (08/2015) this issue arises from the annoying lack of [free] support from Microsoft for NFSv4 and the choice by AWS to only support NFSv4 (the only version Windows doesn't support). Here I have to be honest, I don't know which is worse. Whilst there is the possibility this could be shared across a Samba connection, you've got to then run an EC2 instance just to do that and this will increase latency. I'd definitely say that this product isn't ready to take share from Microsoft in this arena in the enterprise, though of course it works on Linux based environment perfectly well. 

The product will continue to mature I'm sure. However, I'll have to put my Windows performance investigation on hold for now sadly.

Saturday, 25 July 2015

FAIL is not a dirty word!

Short one this one. I'm spending a bit of time on an OSS project, so don't have the time to go into this in more detail.

TL;DR; Failing tests identify where your system is incomplete or inconsistent

I was at the North West Tester Gathering a couple of weeks ago. The theme of the night was Failure. There were a diverse variety of speakers, some from the BBC, Sage and one from SkyBet, Leigh Rathbone (@villabone) presented a talk entitled "FAIL is not a dirty word". It reminded me of yet another blog post I've been meaning to write for a few months on it, so I figured I'd get this down on a screen somewhere before I forgot or my time got chewed up yet again.

The thing with failure is it is a central part of doing anything in uncertain environments. Whatever the environment, whether it is Marketing, Lean-Startup, TDD'ing software or anything else with a high degree of uncertainty or variance, it is important to fail for many reasons.

Failure in TDD

"Write a failing test" - This is one of the most crucial mantras that is often espoused by us in the lean/agile world. This actually deals with a number of different problems all at once.

Aside from the orthodox answers, it also addresses two fundamental concepts to all systems thinking. If you're into theoretical computer science or mathematical logic (predicate logic or propositional calculus), these two concepts will be very familiar. I'll introduce the concepts first then name the theorem for those not familiar with them.

  • Developers who start to code a new story, start with a test. A failing test shows the boundary of the system relative to it's context. When you modify or expand the code to make the test pass, then you have made the software more complete. If the test correctly codifies the story and it happens to pass, then the system and your knowledge was more complete than you thought it was. Our knowledge is now more complete, which the code and tests happen to also represent ("code communicates intent" - @datoon83).
  • Bug tests - Those issues resulting in live (or UAT if your team works like that) show you that the software, and our associated knowledge, which satisfied the acceptance tests, isn't consistent. You write a test which exposes the bug, then you fix the system to satisfy the test. Thereby making your system more consistent.

Those familiar with these two ideas will immediately notice Godel's [1st] incompleteness theorem, which for us IT software/systems folk basically translates to:

"A system cannot be both complete and consistent at exactly the same time"

So, we have a choice. We can try to code the world, which would make it complete, putting in all the possible use cases that anyone would ever want (and perhaps many they won't) never delivering anything and we'd lose the consistency of the system anyway. Alternatively, we can constrain ourselves and accept a level of incompleteness and go for consistency (low bug count). Software development/engineering naturally lends itself to the latter. This is natural, since the system is complete to the stories that are done, not in the backlog or in-progress. Lean-Startup also introduces the concept of an MVP (Minimum Viable Product) with the aim of solidifying that MVP over time.

Summary

Fixing a bug by starting with a red test which surfaces the bug, identifies where your software is inconsistent.

Starting a new scenario with a red test, helps you identify the bounds of your system, gain more knowledge about what it should and can do, and naturally makes you extend that system, increasing the sphere of completeness.

It is important to recognise the contribution failure makes to software development. I am often frustrated when I look at code which hasn't been developed that way. It often has far too much coverage in one area, not enough coverage in others and I see the odd Assert.True() thrown in. Crucially, you can go on proving something is true the same way forever.

Code can also 'suffer' from confirmation bias as much as we can as humans. After all, the code is a manifestation of our knowledge of the domain. If we don't have that failing test, that appreciation that we have stretched the code, the system and ourselves past our limit of knowledge, we don't have that ability to fill in any gaps in that context.

So I'd certainly go further than Leigh on this one. Not only is failure not a dirty word, it's absolutely and unequivocally MANDATORY!

Thursday, 4 June 2015

SAMPLE: Azure v AWS - Judging Trade-Offs.

Judging cloud platforms is one of the things I find myself doing a lot these days. Working mainly but not exclusively on the Microsoft stack, this generally boils down to two main options. AWS or Azure.

Now, personally, I go for AWS by default. However, for various reasons, I refuse to tie myself to any one vendor. Plus, it allow an effective, vendor neutral position to be taken and for those that know me, it cuts straight through sales cr*p to see if what vendors are saying actually matches their promises (in the main a lot don't). I tend to do this in conjunction with the organisation procuring, since it's important not just to check that vendor systems work, but that it works in context. This is especially the case when organisations are aiming to become more agile, since they will have a much closer working relationship with vendors than most vendors may feel comfortable with. So it is another tool in the toolbox to help evaluate how the line of business as a whole (business, data, application, technology, support and security) works.

How Do I Evaluate the Difference for Stories?

Trade-off analysis doesn't start with this question, but with a previous question, which is "What is it I want to achieve?" since this then leads to the all important question "What question(s) do I need to ask to evaluate vendors?" and there may be multiple ones you need answers for.

Throughout this short blog, let's use the example goal:

"Given I have to host a new room booking platform,
 I want the highest on-demand available infrastructure for the lowest monthly cost 
 So that I can extend the application in the most cost effective way"

Once we've understood the value of terms like 'cost effective', we can now look at what the availability needs are.

Let's use Microsoft Azure's own infrastructure diagrams for this. Attached is a snip of a Microsoft Blueprint for Azure hosted infrastructure.

fig 1 - Microsoft Blueprint

Comparing OnDemand costs is simply a matter of adding up all the costs for the components of network, data-store and VM for similarly matched specifications. Comparing the market price of Azure and AWS components, we see:

fig 2 - AWS v Azure Platform On-Demand Pricing


So that's the price... and AWS is cheaper... for 'bigger' hardware (same pricing tier, though did the story contain anything about application hardware specs?). Still, it's one of the two variables you need to determine cost-effectiveness. The other variable is availability guarantee.

Measuring Availability

Using the same techniques found here, it looks like it gets worse for Microsoft when looking at systemic availability. 

Azure: 99.9996%
AWS:  99.9999 %

Note, systemic availability is actually the important thing in every platform. The availability of individual components is next to no use to you as an enterprise. It only takes one component to fail irrevocably and your platform is done. 

Heard the old adage "You're only as strong as your weakest link?" When thinking systemically, such effects are a lot worse than your weakest link, since you can never make up for one weakness without impacting other elements. This is one of the reasons we host on two different, load balanced servers. Since for one single application-services-data stack on 3 VMs, each with 99% availability, we can only have an expectation of 98.01% uptime in total.


Summary and Future Posts

I started writing up comparisons for Reserved and Up-front pricing on AWS and Azure and felt the original post getting too long, even for me. So I've split it into a couple of posts to launch bit-by-bit.

The crux of all of these is to always know the question you're trying to answer. It's not a matter of boiling the ocean on day-1. After all, that's the promise of cloud. You can scale the frying pan later. 

Also, don't forget that you have a number of other options to bring these costs down. MSDN subscriptions and BizSpark give you varying levels of Azure Cloud credits and AWS gives you 'free-tier' infrastructure for 12 months which might cover your needs entirely. So you have to consider a more holistic approach to understanding your options and constraints, since the latter is your job, not the vendors.

Sunday, 19 April 2015

Lowering Chances, Mitigating Risks or Both?

I was talking at Lean-Agile Manchester this week. It was a choc-full event which necessitated the adoption of extra chairs.

A number of the XP Manchester folk were in, which is always entertaining, since the two groups have overlapping common interests but as with many agile vs lean schools, we don't necessarily come to an agreement on the best way forward for things.

There were some great questions through the night! Including the ones form the hecklers. It centred around data from some graphs I showed from a previous blog post tried not to go into the maths of due to the typical spread of the audience. So I offered to take it offline so as not to bore the audience, but there wasn't the appetite form the questioner, so smackdown happened and they then agreed to take it offline but never got back to me, darn it! (#invitestillopen)

Background

What's the reason for the graphs?

Several years ago, I was working in a company which was on the proverbial agile journey. They were still thinking in very big-design ways and were managing programmes of work through standard programme and project management methods. The company's attempt to have conversations around agile programming were not really working and the second attempt at them (i.e. just do the work and they will come) didn't reach far enough for anyone in positions of enough power to take the effort seriously. This resulted in a somewhat disconnected hybrid method which saw lower levels doing the work with upper levels of management and EA imposing design on the teams, with PMs backing up the EAs as authority on that work.

In addition to that, teams spent the vast majority of retrospective time generating new ideas for working together (good, bad, change) including grouping tasks, voting and setting options for the next iteration. However, no retrospective ever came back to check that these did indeed improve the process and any overhead we introduced as part of the each task was actually worth it. Further actions just built on top of these actions and you gradually built up greater overhead in each iteration.

The team had successfully implemented WIP limits (though that started off quite painfully) and were measuring cycle time and throughput since this was easy for them to visualise in a JIRA Dashboard. We saw a burn down but it wasn't clear whether our flow was any good and indeed, whether we were improving at all.

Add to this the need from classical project management to get an idea of the length of time things would take as well programme management to align the streams of work meant we had to get to know something about whether we can actually hit the hard deadline. Those that know me know I think aligning work the SAFe way or classical PERT way introduces inherent risks, but the environment was what it was and each change begins with a small step, not a 'Big-Destroy Enterprise Programme'. After all, as a dev, you're an easy replacement anyway to that style of culture (not that you necessarily have to worry about it in the IT game but it's an important consideration).

Who wanted it?

The graph/points estimation wasn't necessarily to get the team to improve delivery per se. That was not the purpose of the exercise. It was to give confidence that when we were challenged to produce an estimate, we could do so reliably and provide some confidence to the supporting classical thinking personnel we're talking to that we can and have delivered x features in t. It was to lower the variation and give confidence to those who wanted to support us that we could deliver and were improving. This was a tool to help them do that and get the buy in they needed, which took half an hour a week for someone to do (indeed, I did it - but any scrum-master or tech lead can do it in an enterprise context).

Why should you care?

The answer depends on the context you work in. In an agile-sympathetic environment, this isn't really necessary at all. After all, everyone is confidence and comfortable with change. However, where a hybrid exist or companies are transitioning, sometimes these conversations are necessary. Later on, they may not be relevant any more. Enterprises can evolve as much as people do.

The Follow-up Questions

During the talk, some questions were asked and I agreed to produce some follow-up graphs from the data. In order to understand some parts of this, I'd suggest you go back and read the method presented in that blog post, as this will explain what look like 2-pt and 5-p story 'anomalies' as we shifted our understanding of story sizes.

Cone of Uncertainty - Variation Over-time

Specifically, taking the variation between our expectation and actual delivery, plotting it and calculating the Coefficient of Variation to standardise the scales of the graphs, we can plot the change in the coefficient over time. What we see for each story size (in points) is this:


Story point variation (CV) and polynomial trend line

To keep things simple(r), I've added a cubic polynomial trend line to illustrate a smoothed variation. I haven't done anything else to the trend line and Excel has chosen the shape that minimises the sum of squares. We can relate actual uncertainty to the variation in story point figures. The same downward trend on variation is seen in linear and logarithmic trend lines. As you can see, most trends show the reduction in uncertainty as we recalibrate our positions.

Limitations

The only exception to the general trends are the 8-pt story sizes, which curve slightly upwards (not significantly enough over linear to be concerned about). Additionally, due to the team rightly reducing larger 13 point stories into smaller stories, there are only a few 13 points stories in the dataset. I argued there were not enough to come to a conclusion or indeed worry about going forward, especially most became 8-point stories as a natural part of story splitting and recalibration (again, read the previous blog post).

Conclusion

As I explained in the talk the other day, estimation such as this isn't an end goal. This is a technique in the repertoire to provide confidence for those who can support us to become more agile. After all. working in the Enterprise Architecture space necessitates communicating in many different companies, with many different types of stakeholder, including non-technical personnel/those without a software development background. Not ever EA problem is a software development problem. Indeed, to approach it from that perspective architects before it's necessary, if it needs it at all!

Digression

As an example, consider walking skeletons, which can be just as problematic in code, since they make explicit choices on the technology stack way before a decisions is needed on the suitability or otherwise of the tech, but they are useful tools to experiment when you have a tech stack already and gain certainty. However, employing just a walking skeleton is like having Maslow's Hammer. It risks introducing technology into a non-existent current stack when the basics of what people want are unknown. In this case, you don't need a skeleton per se. Just throw together a UI mock up and deploy that to a static environment (even a file system) to get people using it to input data that never gets stored. This can be done in a few minutes compared to creating a walking skeleton which can take a couple of hours to get the same amount of feedback and can be potentially constrained by infrastructure problems and will require some prerequisite work. So bang for buck, if the question is trying to find out of Henry Ford's customers wanted faster horses, this would be cheaper to do than a walking skeleton and yields just as much value. The second meeting can fill this out with a skeleton if you want, since by this point you have more information to base choices on.

Risk and Sensitivity

You have two non-mutually exclusive choices to deal with risk. The first is to reduce the chance of it occurring, which this technique fits into. The other is to mitigate the impact should the risk occur. Which this doesn't address and isn't intended to. So this can only be one of many tools in the team's arsenal in dealing with tracking, recalibration and risk reduction and as we can see, there are specific scenarios this addresses really well. The question is, what other techniques exist to address the same problem?

Further Updates

I will answer some of the other questions in time and post them as updates to this blog.

Wednesday, 1 April 2015

Lean-Agile Metrics: Like it or Not, Stats Rules!

I've been wanting to write this blog for the best part of 4 years (I have a few of these I've been meaning to write up to be fair). I've only just round to finally doing the necessary mathematical proof...

Wait, where are you going? Come back!!

*sigh*

If you don't listen to this one, you likely aren't data driving your retros, aren't effectively self-managing and could stall your agile transformations. It's not just about the coding you know! You can't embrace change if you don't know what is changing around you!

What prompted this?

Ignoring the shoes I'm wearing... wait, you mean me to write it up?... Ah, yeah. That...

*shifty look*

It was a LinkedIn group discussion, as it yet again has become abundantly clear that we're missing some understanding around lean in the software world.

Tell you what, I'll make it simple. I'll use terms you're used to before you freak out. I'll use the context of software development, since this is an arena I'm intimately familiar with. The key bit to concentrate on is the cycle-time.

Cycle-Time isn't quite what you think

Cycle-time as we know it is the average time take to process a thing. From the point of view of software, let's consider a #NoEstimate or single size story-point ticket (I prefer to move beyond that, but for now, this will do) on a super simple Kanban board of 'Doing' and 'Done'. However, this generalises to any type of flow.

Single stage Kanban board


Each item's individual lead time in days say, can be modelled as shown just under the stage box. This states that cycle time t for any individual ticket is the average cycle time (t 'bar') of ALL tickets through this stage plus a variation (delta-t) around it. For example, if the average cycle-time is 5 days and this task takes 6 days, the variation is 6 - 5 = 1. This can also be rewritten as 6 = 5 + 1 which describes that the cycle-time for a task is the average cycle-time for that stage plus the variation.

However, we can't make a decision on one data point. That is like flipping a coin, getting heads and stating it will always be heads. So we run it again and again, which happens naturally in an iteration as you deliver tickets and ideally, you'll deliver at least 25 ticket which gives us a good level of certainty in any results we draw at the retro... you are data-driving your retros aren't you? ;-)  If you are not delivering 25, then this may be an opportunity to recalibrate by resizing the stories you have so that you can get enough data points, which naturally makes the variance on each story smaller anyway. I'll be giving a talk on this soon (shameless plug), so if you're in Manchester in April, pop in to Lean-Agile Manchester and I'll try to explain it in a slightly friendlier way... but not much. It's just the way I roll.

For the sake of illustration, I've used just 5 samples so you can see how it fits together. You get an average from this, which comes out as 31 in the example and the average deviation which is 2.828 (2 x square root of 2 on the right). The coefficient of variation is simply the standard deviation divided by the average which is in this case, 9.1% of a day. Pretty small.

Kanban: Cycle-Time for Multiple Stages

This small deviation isn't the same for larger exercises. If we chain a series of these together, say into a 3 stage Kanban board (Elaborate, Doing, QA, Done) we get

3 stage Kanban

Again, we can determine the variation as before, but this time, the total variation is influenced by the earliest finish time of the first task, to latest finish time of the final task. The proof is above, and the numbers tell the story. 8.7 / 83.2 = 10.46% of a day, which is an increase in the coefficient of variation of 14.9% for this Kanban configuration and the cycles-time through each stage. You'll note I deliberately didn't compare means, since there is nearly 3 times as much 'work' going on and I didn't directly compare variances with each other, since we know the variance is the earliest start time to the latest finish time on a longer chain.

The coefficient of variation basically normalises the standard deviation relative to the size of the tasks at hand. Hence, this is the best comparator and is something that can be used between teams to compare team certainty if you feel like being dark and monitoring at programme level.

Real World Applications

The beauty of this is that it scales 'fractally'. The maths can apply to a person, a stage in a board, a team, a business vertical/systemic flow of multiple teams, a programme etc. both classical and modern agile groups have been guilty of just concentrating on the 'average throughput' and 'average cycle time' when there comes a point where this doesn't wash any more and consistency becomes key. Hence, understanding and controlling for the variation allows you to gain a level of predictability you otherwise wouldn't achieve.

Basically, the lower the coefficient of variation relative to the costs-benefit of getting there, the better! This another reason why I agree with a number of commentators who propose that we include [business] value in stories, since this hard-to-say measure is in there from the start.

Sweet Spot

This very much depends on a host of factors, including the organisation's appetite for risk, the value they hope to achieve, when they go live to achieve it any contingency budget and of course, how well the team recalibrate along the way. Indeed, I'd even go so far as to say it's a range of values.

Hence, in software, there are practises such as Continuous Deployment and deploying MVPs which are better suited to this than most, value increments are zero before delivery and A/B-testing new changes should aims to improve the delivered value relative to the uncertainty. So anything I'd say here would be a conjecture without a theoretical base, but I'll give you one conjecture.

This is really a link between the expected path you could take and the amount of variance to the point that the variance breaches a series of control limits. In older, larger batch flows, with long lead times, this compounding variance causes a very wide variation by the end of a project. This is the cone of uncertainty. I've covered this before in in the faces of the #NoEstimates  movement last year.

To understand how the cone of uncertainty applies here, let's put ourselves at the origin and look toward deadline date. This it the solid red line in the bottom graph. The further forward we look, the more uncertain looks our future.



The above shows two graphs aligned to each other. The top is the usual J-Curve and the lines around it, green or red, show the uncertainty as it would be defined by the coefficient of variation, since that's the measure of dispersion as tickets and value accumulate.

In classical environments (top graph, red dotted line), those limits are wide as the uncertainty is wide and are still regularly breached. By contrast, the green dotted lines show the coefficient after each iteration has complete and we reassess the coefficients during each retrospective. To understand that uncertainty, curve, look at the bottom graph. As we progress into projects, each iteration we deliver is not uncertain any more, since we've delivered it. It's out! The only uncertainty that remains is the rest of the project which often 'resets' the uncertainty to the levels now understood from the actual delivered functions. This is the same as the 10 coin toss post from last year.

This naturally means the control limits move. Hence, overlaying this on the J-curve like we did with the red dotted line, we can see how the range progressively narrows as each iteration delivers. The key part to this though is that you can only get this narrowing of uncertainty if you are measuring and acting on something! Some would argue waterfall measured, which is did, but it rarely acted as it more often required a huge movement and if that was attempted slower than the market changed it was set for a huge crash.

The sweet spot range is that coefficient of variation at each stage in the project life-cycle. At i = 0,1,2,3...,n and the more frequently you sample, the less likely the coefficient of variation that is being tracked will fall outside those limits, which again, are value [at risk] dependent. Indeed, if you look at this from the point of view of a dynamical system, Lyaponov exponents relating the actual delivery to the coefficient of variation are likely to give you a nice threshold measure, but that's my one conjecture :)


Conclusion 

This is a heavy topic for most to grasp, but one that once you have the fundamentals, can massively transform the way you think about constraints and systems, especially people ones. It's only appropriate for the most advanced lean-teams. I appreciate that a lot of people will find this very scary, so you're welcome to get in touch via email at ethar [at] axelisys.co.uk with specific questions, with your value measure and I'll see what I can do to help.



E