Monday, 11 November 2013

Checklist: Agile Estimates. Use Vertical Slices.

Sorry I'm late with this one. I've been busy closing off a contract, starting my contributions to the Math.Net Numerics OSS project, delivering some technical facilities and strategy for a voluntary organisation I'm associated with and have had to deal with some personal matters, so the blog has had to take a bit of a back seat as of late.

A few months ago, I started a series of blogs on evolving agile estimation and last time, I covered the NoEstimates movement. A few days ago, Radical Geek Mark Jones, a very capable ex-colleague I have a lot of time for, posted a few links to articles on Facebook about agile estimation and started a conversation about the topic. 

I responded with one of my somewhat usual long-winded explanations, which stretched the mobile FB Android App to it's limit before I decided that I needed to blog about this and so cut it short. The original article that parked the conversation was a Microsoft white paper on Estimating published on MSDN.

As usual, I didn't agree with everything. My [paraphrased] response to Mark was:

The things I think are missing are dealing with the intrinsic link between estimation, monitoring the efficacy of estimates and continually improving estimates (by data driving them through retrospectives). After all, the cone of uncertainty is not constant all the way through the project lifetime and once your uncertainty drops, your safety should also drop with it. Otherwise you'll not be improving, or maybe including an unnecessary safety factor which gives the team too much slack, which starts to precipitate a waste of money.

So to go leaner, you have to understand how far off the estimation actually was. For the BAs and QAs this involved finding metrics for the estimated and the actual delivery. 

I've worked with story points [in planning pokers], hours, T-shirt sizes (both one value and size, complexity, wooliness methods), card numbers etc. The [fig 1 below] is an extract from a company who used story points, but chose not to monitor or improve any of their processes. Each story was worked on by one person (not paired). When using relative sizing, you'd expect the 2 point tasks to be about double the effort and hence around the same with time. 

As you can see, 1 and 2 point stories are about the same sort of timescale, 3 and 8 point stories are less than 1 point stories etc. So in reality, the whole idea of relative sizing is an absolute myth at the beginning and it'll stay that way if there is no improvement. A priori knowledge is basically gut feeling. [As time goes on, you'd hope that a priori knowledge improves (and thus allows you to take advantage of lower variability as you make your way along the cone of uncertainty) so that as you get through the project, you have a better understanding of sprint/iteration back log estimates].
[But that's not the whole story. There are a number of good practise elements which give you a better chance of  providing more accurate estimates] What needs to happen is to move to a situation where, whatever is estimated:

1) [Make sure the] distribution of the metric for each actually delivered story point delivered matches the estimates distribution, WHATEVER THAT IS, as much as possible. The point is about getting predictability in the shape of distributions (ideally so that both are normally distributed), then when you've got that, later on reducing the standard deviation of that distribution.

2) Take E2E vertical slices. You can't groom or reprioritise the backlog if there is a high level of interdependence between stories. [Vertical slices are meant to have very little dependence on other features, so reduces the need for tasks in the same sprint to complete before those features are started. Note, this is a form of contention point and just like any contention point, causes blockers. In this case, it blocks tasks before they even start]

3) Don't be afraid to resize stories still in the product backlog based upon new 'validated' knowledge about similarly delivered stories (note, not sprint backlog). Never resize stories in play or done - Controversial this one. The aim of this is to get better at making backlog stories match the actual delivery.

4) Automate the measurement of those important metrics and use them with other automated metrics from other development tools to data drive retrospective improvements in estimation. [So when entering a retro, go in with these metrics to hand and discuss any improvements to them]

fig 1 - Actual tasks delivered

In the previous blog posts in this series, I got into the fundamentals of why my checklist is important. However, it's worth reiterating a crucial point.

Vertical Slices

For agile projects, non-vertical slices, or tasks that depend on the completion of other tasks, is suicide. It introduces a contention point into the delivery of the software and implicitly introduces a blocker into stories. 

As an example, consider the following backlog for a retail analysis system:
  1. As a sales director, I want to a reporting system to show me sales levels (points = 13)
  2. As a purchasing director, I want to see a report of sales by month, so I know how much to order for this year (points = 5)
  3. As a CEO, I want to see how my sales trends looked in the last 4 quarters, so that I can decide if I need to reduce costs or increase resources (points = 8)
Supposing you have 3 pairs of developers. Implicitly, tasks 1 and 2 and 1 and 3 are related. There are a few problems with this:
  • Supposing pair 1 pick up story 1. Neither of pairs 2 and 3 cannot start stories 2 or 3. They are blocked. The company is paying for the developer's time and delivering zero value. So they go on to slack work, which at the beginning of the project involves CI setup etc. which is valuable in terms of cutting development costs, but that is a potential saving until realised at the point of deploying the first few things.
  • Story 1 in itself doesn't deliver business value to the sales director. Sales levels are a vanity metric anyway, but even so, 'sales levels' are a particularly  vague description. Deliver this and you are effectively not delivering value and thus you can only be delivering waste.
  • Even supposing they are not blocked, stories 2 and 3 actually incorporate using the reporting system in task 1 (they are dependent after all). So the true length of these stories is more like 18 and 21 respectively. As it stands, given the dependence on task 1, 2 and 3 are not full tasks and as such are already underestimated. 
  • You cannot reprioritise story 1 in the backlog - You are committed to delivering story 1 before either/both of stories 2 or 3. They are not functionally independent stories.
  • You certainly cannot remove story 1 without the whole story being incorporated into either/both of story 2 or 3.

Lets concentrate on that last point, as it requires some explanation. even in the ideal scenario where story 1 is delivered and then stories 2 and 3 are delivered in parallel, there is a still a problem. Let's look at the variability in the tasks:

Estimated sizes and initial ordering
Story 1 - 13 points
Story 2 - 5 points
Story 3 - 8 points

Actual Order
Story 1 - 13 points
Story 2 - 5 points
Story 3 - 8 points

Actual mean effort: (13 + 5 + 8) / 3 = 8.97 points per story
Variance: (13 - 13)^2 + (5-5)^2 + (8-8)^2 / 3 = 0

Cool. So it works when everything runs to plan (plan, the word which existed in waterfall, V-model and RUP days - How successful was that? ;)

Now lets' assume that you decide to reprioritise to deliver valuable items 2 and 3 first, as it is less woolly. 

Actual Reprioritsed Delivery
Story 2 - 5 points (+13 points = 18 poitns)
Story 3 - 8 points (+13 points = 21 points)
Story 1 - 13 points ( = 0 because we delivered it under 2 and 3)

Looking again at the statistics:

Actual mean effort: (13 + 5 + 8) / 3 = 8.97 points per story
Variance: (18 - 5)^2 + (21-8)^2 + (0-13)^2 / 3 = 169

Woah!! ;-)

So what does this look like?

fig 2 - Comparison of distributions - emphasised in red shows ungroomed backlog. Blue area shows groomed backlog, with higher variance.

The red line (which I've highlighted to show it's location) shows what happens when the idea scenario s achieved. Though remember that this, like other poor project estimation techniques relies on everything being perfectly delivered, which we all know is rubbish. Just changing time to story points doesn't make this any less true. The blue area is of course, the distribution delivered by the second, reprioritised backlog.

You'll note that both methods delivered the same number of stories, but there is a lot less exact an estimate in the second case. Additionally, you cannot groom away story 1 and leave 2 and 3 easily, without including the story 1 tasks into either or both. It's essential to complete this story for them to be started.   

So a vertical slice?

By comparison, starting with a vertically sliced story, you would includes all effort required to deliver the story, even those dependent tasks. So story 1 would become consumed by stories 2 and 3 and the estimated adjusted accordingly.

Thus:

Estimated sizes and initial ordering
Story 2 - 18 points
Story 3 - 21 points

Now, regardless of which order the two tasks are conducted in, they can both start and finish independently and can be run in parallel. Thus, assuming running to time again, it takes no more than 21 points to deliver that functionality and 2 pairs of developers instead of 3. So you're saved two developer's wages for that time and the actual reprioritised order never changes (because story 1 is subsumed into the two tasks independently).

Actual Reprioritsed Order 
Story 2 - 18 points
Story 3 - 21 points

This allows you to groom the backlog, reprioritise items out of the backlog and into other sprints.

Wont this violate DRY?

Well, yes. But that's what refactoring is for. Refactoring the code will allow the system to push the common structures back into the reporting engine and as such, evolve a separate Story 1 from the more concrete stories 2 and 3.

Additionally, if you manage to notice refactoring points to use story 2 or 3 in the other, then doing so and reusing the code will allow you some slack to pick up on other tasks that need to be done to prep for the rest of the development when you know you are going to use it.

Conclusion

The moral of this story kids, is always structure FULL vertical slice stories. It gives you the greatest opportunity to pivot by backlog grooming and as such, greater agility. It also reduces the variance and increases predictability and keep data driving this factor in your retrospectives, so you know if and where your grooming needs to happen and how well it has worked so far.

Even Enterprise Architecture is focussed on delivering business capabilities (and hence value) and keeping it 'real'. So if they can do it, what stops us?

2 comments:

  1. Thanks for the mention! I must admit I have been guilty of not vertically slicing my tasks, and I have felt the pain because of it. In many ways my tasks and planning have been one of my weakest areas. Don’t get me wrong, I have got good at describing user stories in gherkin, matching the use cases up with wireframes, and working through the story one case at a time in an ATDD cycle. So my tasks are my use cases for the most part. So by nature of the fact that they are all part of the same story, they are interdependent. A common example is a feature that describes a new form. The first use case is that the form is successful given the correct input. And the second is that the form give a particular error given invalid input. These obviously both require the form to be created. I have been unsure for a while on the best method of estimating this, and have up until now sized the “happy path” task bigger than the remaining tasks in the feature (or use cases in the user story). You have certainly clarified things somewhat.

    I totally agree with the continual improvement, I have been constantly experimenting with process for some time now, and in doing so feel my story point estimates in the context of my one-man-band work are pretty good now. However, I now find myself in a position where I must estimate in hours on an unfamiliar project in an unfamiliar company. I know that anything I do at the outset is going to be completely off, but I hope by approaching with an idea of what I need to learn from it I can learn it quickly and improve my estimating. I guess the metrics should work just as well regardless of the estimation unit?

    ReplyDelete
    Replies
    1. No probs.

      Yes, the estimation process uses relative units of measure. Something to say this 1-point blob is half the size of this 3-point blob. Also, be aware that a 1-point story in ABC company will be different to a 1-point story in DEF Company or even in ABC company team 2. So a direct comparison of velocity/throughput simply isn't possible and indeed, would be meaningless.

      The use of time is one factor in the story point. I used time because it is often measured that way, but value is also a function of cost of development amongst other things (thinking lean). Two teams delivering at the same size of point, but structured differently, may have different costs. For example, a team of 4 developers, 2 of whom are junior (being paid 2/3 the level of the 2 remaining seniors) will have a different cost per point to a second team made up of 4 staff, one of whom an architect (being paid twice as much as the previous devs) and the rest are senior engineers. As a result, you are almost guaranteed to totally off the first few times you try it with a new team. But as you say, as long as you know what you're looking to improve on, hypothesise, test, measure, pivot if necessary, rinse and repeat.

      Delete

Whadda ya say?