Sunday, 22 September 2013

#NoEstimates

Once in a while I come across a host of different 'fads' which actually have something to them, but are sold as something completely different, often for what I consider are the wrong reasons or focus on the wrong things. This is like Viagra, which was created as something completely different, but has become synonymous with sex, become the butt of jokes and the epitome of junk mail amongst a host of other things. Indeed, back in the day, before people understood agility and as is the case with lean software development today, this was the same. Consider it the same as tech following Gartner's hype curve.

This time round, it is the turn of the 'No Estimates' school.

No estimates is a movement which seems to be sourced in the non-committal Kanban world which people assume to mean that no estimates are given for tasks. This is not actually true. The aim of the group is to move away from the concept of estimation as we know it. This includes the sizing of tasks by story points, and concentrating on counting cards. ThoughtWorks released an e-Book in 2009 about using story cards as a measure of velocity and throughput. I personally take this one step further and prefer to break tasks down into the smallest logical unit with the lowest variance. What I mean by this is that I prefer to play to the human strength of being better able to measure small things than large (in terms of variance of the actual metric from the expected metric).

This means that I personally much prefer to size things in single point items/stories. Larger tasks are then composed of these smaller subtasks, like Kanban in manufacturing composes larger parts from smaller ones. The lower variance means lower delivery risk and lower safety (read inventory) and pushes the team closer to the predictability afforded by Little's law as the safety margin factor to zero.

Why Smaller?

Consider a burn down chart of tasks. The burn down never actually follows the burn down path exactly. The nature of story sizes means that you will have an 8 point task move across the board and completing it will decrement the burn down by a discrete 'block' of points (8 in this case). So the best you can get is a stepped pattern, which in itself makes the variance larger than it needs to if the burn-down rate is taken as the 'ideal' baseline (note, a burn down chart is the 'ideal' model of how the work will decompose).

Why do you care? Because this stepped pattern introduces a variation of its own. This means that some times you will have slack, others you'll be rushing, all during the same project. This is all without the introduction of a variance on the size of the task at hand (as shown my a previous blog post on evolutionary estimation, often points don't actually reflect the relative effort in stories) which in themselves introduce a variance on this variance. The fabricated image below shows the variance on a burn down due to the step and when you consider the variation in the size of one point tasks, bracketed in the time periods at the bottom, this is the second variance due to the timings being out.

fig 1 - Burn down of the variation of both the 'steps' and the
delivery timing for different sized stories. The idealised burn down is shown in red (typical of tools like JIRA Agile).


Note, the blue line shows the top and bottom variance of the actual delivered timing (i.e. the green step function), not against the red burn down line. If the average were plotted on the above, the burn down 'trajectory' would sit above the red line, passing half way through the variation. So as of any moment, the project would look like it would be running late, but may not be. It's harder to tell with the combination of the variance of task size and time per task.

Reducing the size of stories to one point stories gets you closer and closer to the burn down line and gives you the consistent performance of the team, which will have a much narrower variance simply because of the use of a smaller unit of work per unit of time. The following example, which is the same data as in fig 1, just burning down by one point, shows that for this data, the variation is reduced, simply by making the story points a consistent size.

fig 2 - 1-point burn down chart showing shorter variation


The reduction in variation is 12 percent, which by proxy, increases the certainty, simply by sizing the tasks per epic differently. This reduction in variation reduces the variance around the throughput (which is story points per unit sprint/iteration). The only 'variable' you then have to worry about is the time a story point takes, which then simply becomes your now relatively predictable cycle time. 

The key with No Estimates, as should be apparent by now, is that it is an absolute misnomer.  They do estimate, but not as a forecast with many variables.

Why does this work?

There is a paper and pen game I play when explaining variance to people. I do this twice and for each go, I draw one of two lines. Firstly one short and one long, on a piece of paper and each time ask Joe/Jane Bloggs to estimate the size of the two lines on the paper. I then ask them to estimate how many longer lines can fit in the shorter one, by eye only. After all three steps are complete, I get a ruler and measure the lines. Usually, the longer line and combination are significantly off, even if the estimates of the short line is fairly good. Please do try this at home. 


fig 3 - Estimate the size of the smaller and latter, then estimate how many small tasks go into the latter.


As humans, we're rubbish...

...at estimating. Sometimes we're also rubbish at being humans, but that's another story. 

The problem arises because there are three variances to worry about. The first is how far out you are with the shorter line. When playing this game, most people are actually quite good at estimating the shorter line. For say, a 20mm line, most will go between 18mm and 21mm. The total variation is 3mm. That's 15 percent of the length of the line. 

With a longer line of 200mm say, most people are between 140mm and 240mm. A total variation of 100mm which is 50% of the line length. 

When the combination of these errors occurs, it is very rare that they are cancelled out altogether. However, the total error when performing the 20mm into the 200mm line effectively multiplies the error by at least 10 (as you take your smaller line measure by eye and apply it one after the other to measure the longer line, the error adds up) and on top of that, you have the error in estimating the big line, which means the total effect of the variances is a factor of the multiplication of the variance of the smaller line with the larger and not the addition. It's non-linear.

Note, the important thing isn't the actual size of the line. You first draw the line and you don't care how big it is. It's the deviation of the estimate from the actual size of the line that's important.

What's the point?

OK, granted, that joke's getting old. From my previous evolutionary estimation blog post, you can see that estimation is not a super-fast nor simple matter when trying to apply it to retrospective data. Indeed the vast majority of developers don't have the statistical background to be able to analyse the improvements they make to their estimation processes. By contrast, No Estimates aims to do away with the problem altogether by fixing the size of a story to one size. For example, what would have been a three point story in the old(er) world. In a way that's a good thing and intuitively relates better to the concept of a kanban container size, which holds a certain number of stories. In the software world this maps to the idea of an epic, or story with subtasks.

Conclusion, is what you said previously is 'pointless'?

Nope! Definitely not. Makes a good joke heading though.

The previous techniques I have used still apply, as the aim is to match the distribution in exactly the same way, just with one story size as opposed to the many that you have in other estimation techniques. Anything falling outside a normally distributed task could get 'chopped' into several story sized objects, or future pieces of work resized so each subtask is a story.

Just to reiterate, as I think it is worth mentioning again. Projects have never failed because of the estimates. They filed because of the difference between estimated and actual delivery times. That's your variation/variance. Reduce the variation, you increase predictability. Once you increase predictability, speed up and monitor that predictability. Then 'fix it' if it gets wide again. This is a continuous process, hence 'continuous improvement'.

Thursday, 29 August 2013

Evolutionary v. Emergent Architecture: ThoughtWorks Geek-Night

I was at a ThoughtWorks Geek Night presentation on the principles and techniques of evolutionary architecture given by Dr Rebecca Parsons, who works as ThoughtWorks CTO. She was giving a talk about evolutionary architecture and in fairness, everything she said was sensible. Technically I couldn't argue with any of it. Now, I have a lot of time for ThoughtWorks. Granted, they're not about to convert me to any sort of permanent role, but at the same time, the top level members (including Martin Fowler) do have what I consider to be one of the best stances on agile adoption, evolutionary systems and people driven approaches in the market. They're not perfect, but in a world where perfection is governed by how well the solution fits the problem, and that every problem is at least subtly different, I'm willing to live with that.

The one thing I did take issue with was her stance on the distinction between emergent architecture and evolutionary architecture. Dr Parsons regarded this distinction as being one of guidance. My viewpoint at the time was that everything is evolutionary and that there was no distinction. I still stand by this and the more I think about it, the more I think it's true. However, I'd redefine emergence as what happens as the result of these evolutionary processes where the guiding influence isn't immediately obvious.

So what are you whinging about now?

The problem I have with Dr Parson's definition is that there is nothing that we ever do in our professional (and personal) lives that isn't guided in any way. Even those people who love being spontaneous love it for a reason. Indeed, the whole premise of lean and agile is based on the principle of [quick] feedback, which allows us to experiment and change tack or resume course. Also, as humans in any field, we rarely do stuff for the heck of it. We choose particular tech for a reason, apply design patterns for a reason, write in imperative languages mostly for a reason and we do all of those things regardless of correctness and often of systematic optimality (i.e. we'd do this to make our lives easier, but potentially at the expense of systems as a whole).

She mentioned a term I've used for at least 4 years, and that's a 'fitness function'. For those who have a good grasp of genetics or evolutionary systems such as genetic algorithms (which used to be a favourite topic of mine at the turn of the millennium) or champion-challenger systems, this term isn't unfamiliar. It's a measure of distance of some actual operational result from the target or expected result. In the deterministic world of development, this may be percentage test coverage from the ideal test coverage for that project; in insurance risks it's the distribution of actual payout versus the expected payout and indeed in my previous post on evolutionary estimation, the R-squared was used to measure the distance between the distribution of done tasks and the normalised version of the same data.

What's the guidance?

Now, you can take guidance from any source and it doesn't have to be direct. Some people choose their favourite tech, some people choose familiar patterns and practises, some people prefer to pair-programme, some people prefer to make decisions at different points to others (the latest possible moment versus reversible decision) etc. Also, you can't always see what's guiding you, or know for sure what the 'guiding influence' wants. For example, Lean Start-up is an attempt to validate hypotheses about what the market wants (which is the guiding influence in that case) and the fitness functions are often ratios which divert the focus from vanity metrics (how many sales per 100 enquiries, how much it cost to sell this product - which are all standard accounting measure btw). 

Additionally, emergent design often comes about through solving development detail problems through some mechanism of feedback, which again guides the development choices. For example, because it causes the developers a lot of pain or are guided by the needs of the client. Just as human being evolve their behaviour based on sensory stimulus (I won't put my finger in the fire again because it hurt), development 'pain points' guide the use of ,say, a DI framework, which the architect isn't always aware of. That 'guidance' Dr Parsons refers to can come from outside the immediate environment altogether.

For example, for my sins, I'm a systems architect. I'm a much better architect than I am a developer but I am often guided into development roles because of the lack of architecture consultancy jobs relative to software development contracts (the ratio as of 28/08/2013 is 79:48:1 for dev:SA:EA roles, which is a significant improvement over the 300:10:1 in 2011 - Note, these are contract only). It's a business decision at the end of the day (no work, no eat and I eat a lot). By that same token, I guide other decisions based upon my role as a consultant. For example, I derisk by having low outgoings, building financial buffers to smooth impacts, diversifying my income stream as much as possible etc. It also prevents me from being bored to easily and makes it easy for me to up and leave to another town when a role arises. All these non-work decisions are guided by factors from inside the work itself and there are factors which work the opposite way (for example, I am certainly not about to take on a permanent role as it is not something that suits my character and personality). Hence, a domain which is in a different spot in the 'system' [that is my life] has influences on those other facets.

In tech, this means the use of frameworks because they make life easier for the developer. There is a problem that needs a solution, the fitness function is how well the solution solves that particular problem over a period of time. For devs, that means it has to be shiny, has to garner positive results quickly, has to be 'fun' and has to allow devs to 'think' of the solution themselves (i.e. it can't be imposed dictatorially - this doesn't mean it can't be guided organically though). Systematic productivity improvements, which is often the domain of the architect and/or PMOs isn't high on most devs lists. You can see this by the way the vast majority of retrospectives are conducted. 

In Summary: So no emergence?

Not exactly, rather that it isn't something that has an easily discernible influence from all perspectives. Taking a leaf out of chemistry and cosmology, we are all ultimately made up of atoms. Charge was the guiding influence that pushed protons together to make larger atom and eventually molecules. Those molecules eventually connected and made cellular structures, which then went on to form life as we know it (missing lots of steps and several hundred million years in the process). We don't see the atomic operations day-to-day but now we do know they are there and they are influenced by the environment they exist in and in turn, change the environment they exist in, which in turn changes them for the next generation. Up until these pioneers, we didn't know of DNA before Watson and Crick just as we didn't know for sure of the atomic level of matter until 19th century science validated the philosophical hypotheses of the ancient Greeks (darn long feedback loop if you ask me).

What worries me is that where the pattern isn't obvious, in general society this leads to conjectures, superstition and other perspectives (some of them untestable) because the individuals cannot tangibly see the guiding influence. So in an attempt to gain cognitive consonance from that dissonance, people come up with weird and wonderful explanations, most of which don't stand up to any form of feedback (or in some cases, are never scientifically tested). Science attempts to validate those hypotheses or conjectures and that's what makes it different. Over time, the science helped organically guide society into this emerged world (so far) where we have the medical advances we have or the computer systems we have. Not just that, but it is validated from many different angles, levels of abstraction, perspectives and details and none of them are wrong per se, they just use effective theories which may mean guiding influences for the existence of some phenomenon is missing (i.e. at a lower level of detail than is studied - Think neurological guiding influences on cognitive psychology).

The same is true of software systems and development teams. Without the scientific measurement step, and hence the guiding metrics/influence it's conjecture and nigh on pointless. Something may emerge, but it won't be guided by any factors of use to the client, may be guided by the developers alone, who are not always aligned with the client's needs, who remember, ultimately sets the fitness function and hence alignment the team should have.

Sunday, 18 August 2013

Evolutionary Estimation

This is a topic that I've started but had to park numerous times, as timing has simply not been on my side when I've had it on the go. I started to think about the mathematics of Kanban a couple of years ago as I got frustrated by various companies being unable to get the continuous improvement process right and not improving their process at all. The retrospectives would often descend into a whinging shop, sometimes even driven by me when I finally got frustrated with it all.

In my mind, cycle-time and throughput are very high level aggregate value indicators which is often measured in the world of the client by a monetary sum (income or expenditure), target market size or some risk indicator. To throw out the use of analytical processes and indeed mathematics as traditional process driven 'management' concepts is fatal to agile projects, since you are removing the very tools you need to measure alignment with the value stream that underpins the definition of agile value, not to mention violate a core principle in agile software development by losing the focus on value delivery to the customer.

I won't be covering the basics of continuous improvement, that is covered by many others elsewhere. Suffice to say that it is not a new concept at all, having existed in the world of manufacturing for over 40 years, in Prince 2 since the mod-to-late 90s and process methods such as Six-sigma, maturity models such as CMMI, JIT manufacturing (TPS we all know about) etc.

In software, it is really about improving one or both of the dependent variables of cycle-time and throughput (aka velocity) and often takes place in the realms of retrospectives. I am not a fan of the flavour of the month of just gathering up and grouping cards for good-bad-change or start-stop-continue methods, as there is often no explicit view of improvement. It affords the ability to introduce 'shiny things' into the development process which are fun, but has a learning lag time which can be catastrophic as you head into a deadline as the introduction of a new technology introduces short-term risk and sensitivity into the project. If you are still within that short-term risk period, you've basically failed the project at that point of introduction, since you are unproductive with the new tool, but have not continued at full productivity on the old tool. Plus, simply put, if you want to step up to work lean, you will have to drive the retrospective with data, even if it is just tracking the throughput and cycle-times and not the factors on which it depends (blockers, bug rates, WiP limits, team structures etc.)

I have written quite a bit of stuff down over the last couple of years and so I am going to present these as a series of blogs. The first of them here covering improved estimation.

Let Me Guess...

Yes, that's the right answer! :-) Starting from the beginning, especially if like me, you work as a consultant and are often starting new teams, you will have no idea how long something is going to take. You can have a 'gut feeling' or draw on previous experience of developing similar or not so similar things, but ultimately, you have no certainty nor confidence in how long you think a task is going to take.

The mathematical treatment of Kanban in software circles is often fundamentally modelled using Little's Law, which is a lemma from the mathematical and statistical world of queuing theory. In it's basic form, it states that the average WiP items (Q) is the resulting arrival rate of items into the backlog (W. and when stable, this is also the rate at which it moves into 'Done' - aka throughput in unit time) multiplied by the average time the ticket, a story point or whatever (as long as it is consistent with the unit of throughput) spends in the pipeline, aka its cycle-time (l).

Q = lW

Little's Law can be applied to each column on the board and/or the system as a whole. However, here's the crux. The system has to be stable and have close to zero variance for Little's law apply effectively! Any error and the 'predictive strength' of the estimate, which most clients unfortunately tend to want to know, goes out of the window. After all, no project has ever failed because of the estimate, it is the variance from the estimate that kills it. Reduce the variance, you reduce the probabilistic risk of failure. A variance is simply:

V = | A - E |

Which is the absolute difference (don't care about negatives) between the actual and estimated points total or hours taken. You have some choices to reduce the variance and bring the two into line. Improve your estimates, deliver more consistently or indeed both.

However, Kanban has been modelled to follow a slightly more general model, where a safety factor is included in the equation. In manufacturing and in software, safety is very often (but not always) associated with waste. The equation basically adds a safety factor to Little's laws, thus allowing for variance in the system. So it looks more like:

Q = lW + s

Aside from many things, Kanban helps to introduce lean principles into the process and eventually, aims to reduce the safety factor, making it reliable enough to be modelled by Little's law, where the mental arithmetic is not as taxing :-)

Part of doing this in software, is reducing the need to have slack in the schedule, which in turn is dependent on the variance in the system. Getting better at reducing the variation and eventually the variance, improves the understanding, accuracy and reliability of the estimates and this is the part I'll cover today.

What's the point?

I have never really been a fan of story point for the reasons that have been given by the practising agile community. The difficulty is that unlike the use of hours, as inaccurate as they are, they don't have an intuitive counterpart in the mind of the client and are simply too abstract for developers, let alone customers, to get their head around, without delivering a corresponding traded-off benefit for that loss. Effectively, a story point also introduces another mathematical parameter. This is fine for maths bods, and I certainly have no issue with that, but there isn't actually a need to measure story points at all. Story points violate the KISS principle (or for true engineers, Occam's Razor) and inherently make the estimation and improvement process more complex again, without a corresponding increase in value apart from maybe bamboozling management. What doesn't ever come out is how bamboozled the development team also are :-)

It's no great secret that despite including the use of story points in the first edition of XP Explained, Kent Beck moved away from the use of story points and back to hours in his second edition, much to the dismay of the purists. In my mind, he simply matured and continuously improved XP to use a better practise (which has it's roots in a previous practise) and so personally lives the XP method. He gained a lot of respect from me for doing that. That said, points aren't 'point-less' but if you wish to use points, you need to get to the... erm... point of having some form of consistency in your results... OK, I'll stop the puns :-)

For those experienced in the lean start-up method, there is a potential solution to the metrics which removes some of the unknowns. Following on from the above discussion around variance, consider one of the team's Kanban metrics to be measurable by the width of the standard deviation. The metric would be to repoint/reestimate tasks based upon the validated knowledge of what you find from the continual experiments with the estimation->adjustment cycle, until you achieve normally distributed (or t-distributed if the number of data points is below about 25) 1-point, 2-point, 3-point, 5-point,... data. That will then allow you some leeway before then evolving to make the distribution as narrow as possible.

For example, the A/B-test for the devs would be to set the hypothesis that taking some action on estimation, such as re-estimating some tasks higher and lower, given what they have learned about somewhat similar delivered stories will yield a narrower variance, hence a better flow, reduce risk and improve consistency (especially to the point where the variance from Little's law becomes acceptably small). This would take place in the retro for each iteration, driven by the data in the process.

In the spirit of closing a gap a conversation and hence improving the quality of that conversation, for a product owner, manager or someone versed in methods such as PRINCE 2, PERT, Six-sigma, Lean or Kaizen, this will be very familiar territory and is the way a lot of them would understand risk (which in their world, has a very definite value, most obviously where there is a financial consequence to breaching a risk threshold). As time goes on, you can incorporate factor analysis into the process to determine what factors in the process actually influence the aggregate metrics of throughput and cycle time.

Show me the money!...

No, because it varies on a number of factors, not least the salaries of the employees. To keep the discussion simple, I'll attach this to time. You can then map that to the salaries of the employees at your company and decide what the genuine costs and savings would me.

Imagine the following data after some sprints. This is fabricated point data from 2 sprints, but is still very typical of what I see in many organisations.

table 1 - initial 2 x 4-week sprints/iterations worth of data 

From this you see next to nothing. Nothing stands out. However, let's do some basic analysis on it. There are two key stages to this and they are:

  1. Determine the desired 'shape' of the distribution from the mean and standard deviation in the current data
  2. Map this to the actual distribution of the data, which you will see is often very different - This will give you an indication of what to do to move towards a consistent process.
You'll note that I deliberately emphasised the word 'current'. As with any statistic, it's power doesn't come from predictability per se, it comes from it's descriptive strength. In order to describe anything, it has to have already happened. Lean Start-up takes full advantage of this by developing statistical metrics without using the term, as it may scare some people :-)

So, from the above data we can see that we have more than 25 data point, so we can use the normal distribution to determine the shape of the distribution we would like to get to. The following graph shows an amalgamation of the normal distribution of time taken for each 1 to 8 pointed ticket up to the last sprint in the data set (if you work on the premise that 13 points is too big and should be broken down, then you don't need to go much further than 8 points, but that threshold depends on your project, team, and of course, how long things actually take). The overlaps are important, but I will come back to why, later.

fig 1 - Points distribution in iteration 1

Having got this, we then plot the actual distribution of the data and see how well it matches our normals.

IMPORTANT ASIDE
As well as showing that the overlap of the normals mean that a task of 4 days could have been a one point of an 8 point task, causing unpredictability, for the points themselves the distribution above also shows a very interesting phenomenon and that is the informal ratio of the height against width of each peak. The distributions may well even have the same number of data point (you get that by integrating the areas under the distributions or of course, using normal distribution tables or cumulative normal functions in Excel), but the ratio intuitively gives you a sense of the variance of the estimation. The narrower the better and it shows our ability to estimate smaller things better than larger things.

I often illustrate this by drawing two lines. One small (close to 2cm) and one much larger (close to 12cm) and ask someone to estimate the lengths of the lines. The vast majority majority of people come within 10% of the actual length of the small line and 25 - 30% of the bigger line. It's rare that estimations are the same for both sizes. This is why taking on smaller jobs and estimating them also works to reduce risk, because you reduce the likelihood of variance in the number of points you deliver. Smaller and smaller chunks.


Anyway, back to the distributions. Using the original table, do the following look anything like normal?

fig 2 - Actual distributions

If you said yes, then..., ponder the difference in weight between a kilogramme of feathers and a kilogramme of bricks.

OK, I'm being a bit harsh. In some of the distributions we're almost there. It's easier to see the differences when you take into account the outliers and in these distributions, it is pretty obvious when you consider the kurtosis ('spikiness') of the corresponding curves. Kurtosis is the spikiness of the corresponding curves (approximating these discrete distributions) against the normal distribution for that data. It's easier to see this on a plot, again using Excel.

fig 3 - first generation estimates

As expected, we're pretty close with the 1 point stories, partly because of the reasons mentioned in the previous aside. The 2, 5 and 8 point estimations, whilst quite unpredictable show something very interesting. The kurtosis/spikiness in the curves are the result of peaks on either side of the mean. These are outliers relative to the main distribution. These are what should be targeted to move into other point categories. The 4, 5 and 6 day tasks which resulted from the 5-point estimates are actually more likely to be 3 point tasks (read the frequencies on the days in each graph). The same is true for the 1, 2 and 3-day, 2-point tasks as these are much more likely to be 1 point tasks. This is also the case when looking for data to push to higher points. 

What are you getting at?

Estimation is a process we get better at. We as human beings learn and one of the things we need to do is learn the right things, otherwise as we search for cognitive consonance to make sense of any dissonance we experience, we may settle on an intuitive understanding, or something that 'feels right' which may be totally unrelated to where right actually is, or a positions which is somewhat suboptimal. In everyday life, this leads to things like superstition. Not all such thoughts is incorrect, but in all cases, we need to validate those experiences, akin to how hypotheses are validated in lean start-up.

In this case, when we push the right items, the right way, we then get a truly relative measure of the size of tasks. At the moment, if we are asked "how big a task is a 2 point task?" we can only answer "It might be one day, or it might be 8 days, or anything in between". Apart from being rubbish, it has the bigger problem that if we are charging by point, we have no certainty in how much we are going to make or lose. As a business, this is something that's very important to know and we need to get better at. For those who work as permanent staff, have a salary for the predictability and surety and a business is no different.

The statistical  way to assess how good we have become at estimating is to use goodness of fit indicators. These are particularly useful in hypothesis testing (again very applicable to Lean Start-up again). The most famous being the r-squared test, most often used for linear regression, but can be used for normal distributions and also the chi-squared tests, which can be applied to determine if the distributions are normal. We can go further by using any L-norm we want. For those that have have worked with approximation theory, this is fairly standard stuff, though I appreciate it isn't for everyone and is a step further than I will go to here. The crux is better our estimates and actuals fit, the better the estimating accuracy and the better the certainty.

OK, I push the items out, what now?

Cool, so we're back on track. You can choose how you wish to change point values, but what I often do is start from the smallest point results and push these lower outliers to lower point totals, doing this for increasing sized tickets, then starting from the high valued tickets and working backwards, push the upper outliers on to higher valued tickets.

All this gives you a framework to estimate the immediate-future work (and no more) based on what we now collectively know of these past ticket estimates and actuals. So in this data, if we had a 2 point task that took 1-day, it's likelihood is actually that it is a 1-point task, given the outlier. So we start to estimate those tasks as one point tasks. The same applies to the 6 and 7-day 2-point tasks as they are most likely 3-point tasks. If you are not sure, then just push it to the next point band, as if it's bigger it will shift out again to the next band along in the next iteration or if it is smaller, as we get better at estimating, it may come back.

Assuming we get a similar distribution of tasks, we can draw up the graphs using the same process and we get graphs looking like:

fig 4 - Second generation estimates, brought about by better estimations decided at retros.

As we can see, things are getting much smoother and closer to the normal we need. However, it is also important to note that the distribution of the old expected from the now actual has shifted and so has the normalised variance and mean of the distributions themselves (i.e. the normal distribution curves in blue have themselves shifted). This is easier to illustrate by looking at the combined normals again. So compare the following to figure 1.

fig 5 - Second generation normally distributed data

So our normals are spacing out. Cool. Ultimately, what we want it to rid ourselves of the overlap as well as get normally distributed data. This is exactly the automatic shift in estimation accuracy we are looking for and is touted by so many agile practitioners, but is never realised in practise. The lack of improvement happens because retrospectives are almost never conducted or driven by quality data. It is the step that takes a team from agile to lean, but our validated knowledge on our estimates, together with the data to target estimation changes (which is the bit all retrospectives I have ever been to when I have started at a company, miss out) is missing. As we can see here, it allows us to adjust our expectation (hypothesis) to match what we now know which in turn adjusts the delivery certainty.

OK, fluke!...

Nope. Check out generation 3. This also illustrates what to do when you simply run out of samples in particular points.

fig 6 - Iteration 3, all data. Note, 2-points and 5-point values

The interesting thing with this 3rd generation data is that it shows nothing in the 2-point list. Now, for the intuitivists that start shouting "That's so rubbish!! We get lots of 2-point tasks", I must remind you that the feathers and bricks are not important when asking about the weight of a kilogramme of each. Go back here... and think about what it means.

All this means is that you never had any truly relative 2-point tickets before. Your 2-point ticket is just where the three point ticket is, your 3 is your 5, 5 is your 8 and 8 is your 13. It's the evolutionary equivalent of the "rename your smelly method call" clean code jobby.

Note the state of the 5 point ticket. Give it's a value on it's own, but is covered by other story amounts, it's basically a free standing 'outlier' (for want of a better term).

Iteration 4

After the recalibration and rename of the points (I've also pulled in the 13-point values as the new 8-point tickets). We deal with the outlying 5-point deliveries (which are now categorised as three point tickets)  by shifting it to the 5-point class in the normal way. This means the data now looks like:

fig 7 - 4th generation estimation. Note empty 3-point categories.

Iteration 6

Skipping a couple of iterations:

fig 8 - 6th generation estimates.

By iteration 6, we're pretty sure we can identify the likely mean positions of 1, 2, 3, 5 and 8-point tickets at 2.72, 5.43, 6.58, 9.30 and 13 days respectively. The estimates are also looking very good. The following table puts it more formally, but using the r-squared test to show how closely the distributions now match. 'Before' is after iteration 1, and 'After' is after iteration 6. The closer the number is to 1, the better the fit. As expected, the 1-point tasks didn't improve massively, but the higher pointed tasks shifted into position a lot more and provided greater estimation accuracy.

table 2 - Goodness of fit r-squared measure

So when do we stop?

Technically, never! Lean, ToC and six-sigma all believe in the existence of improvements that can be made (for those familiar with ToC, it changes the position of constraints in a system). Plus, teams change (split, merge or grow) and this can change the quality of the estimations each time, especially with new people who don't know the process. However, if the team and work remains static (ha! A likely story! Agile remember), you can change focus when the difference between the expected and actual estimates reduces past an acceptable threshold. This threshold can be determined by the r-squared test used above, as part of a bigger ANOVA operation. Once it has dropped below a significance threshold, then there is a good chance that the changes you are seeing are due to nothing more than a fluke, as opposed to anything you do deliberately, so you hit the diminishing return a la the Pareto principle.

Conclusion

I've introduced a method of evolving estimates that has taken us from being quite far out in estimation to much closer to where we expect to be. As 'complicated' as some people may find this, we've gotten pretty close to differentiated normals in each case. Indeed now, all tickets are looking pretty good. We can see this in the r-squared tests above. Having completed the variational optimisation, you can then turn your attention to making the variance smaller, so the system as a whole gets closer to the average estimate. If you're still in the corner, it's home time, but don't forget to do your homework.

Future Evolutions: Evolving Better Estimates (aka Guesses)

Ironically, it was only last week I was in conversation with someone about something else, and this next idea occurred to me.

What I normally do is keep track of the estimates per sprint and the variance from those estimations and develop a distribution which more often than not tends to normal. As a result, the standard deviation becomes the square root of the usual sum of the residual differences. As time goes on in a Kanban process, the aim is to reduce the variance (and thus standard deviation by proxy) and hence increase the predictability of the system such that Little's law can then take over and you can play to it's strengths with a good degree of certainty, especially when identifying how long the effort of a 'point' actually takes to deliver. This has served me pretty well either in story point form or man-hours.

However, after yesterday's discussion, it set me thinking about a different way to model it and that is using Bayesian Statistics. They are sometimes used in the big data and AI world as a means to evolve better Heuristics and facilitate machine learning. This is for another day though, you've got plenty to digest now :-)

Sunday, 21 July 2013

JavaScript Testing Overload!!

Arent' you a lucky bunch this month? Following on from yesterday's blog post, I continued my foray into mocha by using different assertion libraries today.

I started with Chai, which is an assertion library that follows on from the assert node module by introducing its own assert syntax, but it also introduces the 'expect' and 'should' BDD syntax into the mix. Before delving into that, a little recap of assert may be in order.

Node: Assert

Straightforward assertion library which can be nicely tied in to Jasmine syntax. The basic assertions for the calculator app took the form of:

assert = require("assert");
 
Calculator = require("../Calculator.js").Calculator
describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("aubtracting 3 from 5"function () {
        it("should return 2"function () {
            var result = new Calculator().SubtractNumbers(5, 3);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("multiplying 7 and 8"function () {
        it("should return 56"function () {
            var result = new Calculator().MultiplyNumbers(7, 8);
            assert.equal(56, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("dividing"function () {
        it("8 by 4 should return 2"function () {
            var result = new Calculator().DivideNumbers(8, 4);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
 
        it("7 by 2 should return 3.5"function () {
            var result = new Calculator().DivideNumbers(7, 2);
            assert.equal(3.5, result, "But the number " + result + " was returned instead");
        });
 
        it("5 by 0 should throw an exception"function () {
            assert.throws(function () { new Calculator().DivideNumbers(5, 0); }, Error, "This did not throw the expected error");
        });
    });
});

The key is to note that each assertion roughly takes the form of:

assert.<function>(<expected>,<actual>[,<message>]);

The main deviations from this format include when using exceptions, failing, checking for existence etc.

When using other libraries, you find that there are some subtle or significant differences.

Chai: Assert

To run the chai examples, you need to install chai node modules which can be pulled from npmjs.org by issuing the usual:

npm install chai

Being a lazy so and so, I wanted to change as little code as possible. When using the chai assert syntax for the calculator tests, you'll notice that it is somewhat the same. The main difference is catching exceptions which is done with a slightly different prototype.

// assert = require("assert");
assert = require("chai").assert;  // Note that I left the old assert require commented out to
// show how it differs.
Calculator = require("../Calculator.js").Calculator
describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("subtracting 3 from 5"function () {
        it("should return 2"function () {
            var result = new Calculator().SubtractNumbers(5, 3);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("multiplying 7 and 8"function () {
        it("should return 56"function () {
            var result = new Calculator().MultiplyNumbers(7, 8);
            assert.equal(56, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("dividing 8 by 4"function () {
        it("should return 2"function () {
            var result = new Calculator().DivideNumbers(8, 4);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
 
    // Note the exception expectation is slightly different in Chai to node assert.
    describe("dividing 5 by 0"function () {
        it("should throw an exception"function () {
            // The throw in the next line in particular operates very differently to Node's assert
            /* It doesn't process the infinity call the same as assert */
            assert.throws(function () { new Calculator().DivideNumbers(5, 0) }, Error, "Attempt to divide by zero!");
        });
    });
});

Not much different, so learning the first method covered off the second. BDD-like syntax is somewhat different.

Chai: Should

If you have not done so already, install the chai module (if you have followed assert above, then you will already have should in your node_modules directory in your local folder).

Being  a BDD template, should takes the form of

<ItemUnderTest>.should.<comparator>(<value>)[.<otherchainfunction>]

So for the AddNumbers test, it looks like:

 result.should.equal(7);

Note that both Should and expect use a chainable language to evaluate their tests, which means that the functions can run multiple evaluations in one statement chain. For example, testing for an exception and testing that it is the right string. I have modified this version of the tests to show this in action. I have also called the should() function after requiring it.

var should = require("chai").should();
 
Calculator = require("../Calculator.js").Calculator
describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            result.should.equal(7);
        });
    });
 
    describe("subtracting 3 from 5"function () {
        it("should return 2"function () {
            var result = new Calculator().SubtractNumbers(5, 3);
            result.should.equal(2);
        });
    });
 
    describe("multiplying 7 and 8"function () {
        it("should return 56"function () {
            var result = new Calculator().MultiplyNumbers(7, 8);
            result.should.equal(56);
        });
    });
 
    describe("dividing 8 by 4"function () {
        it("should return 2"function () {
            var result = new Calculator().DivideNumbers(8, 4);
            result.should.equal(2);
        });
    });
 
 
    describe("dividing 5 by 0"function () {
        it("should throw an exception"function () {
            // The throw in the next line in particular operates very differently to Node's assert
            var refFn = function() { new Calculator().DivideNumbers(5, 0); };
 
            // Note the following chained calls to determine the error class AND the message call. It evaluates them one by one.

            refFn.should.throw(Error).and.throw('Attempt to divide by zero!');
        });
    });
});


Conclusion

This short intro shows some examples of other JS assertion libraries. There are subtle differences between them and what you choose will depend on what you hope to accomplish and how familiar your developers are with the syntax. If I was introducing this into a team lacking BDD skills and no intent to use them, with a lack of familiarity with node, I would introduce the default assert libraries first, since this would require the least time to become productive from where they are.

Should you wish to eventually introduce BDD into the team, then Chai offers a good alternative and has the greatest scope for expansion without having to relearn any new assertion libraries. The team begin to use the chai assert style and migrate to using should or expect through retrospectives.

If the team are already familiar with BDD syntax and processes and have no problem understanding chaining, then I would introduce chai from the start as the learning curve is not as steep (or to be exact, they are further up it).

In all cases though, if you cant find learning resources for it, then flag this up as a major debt issue, since you will need to repay that to allow the team to scale effectively. For example, by team members keeping internal blogs or wikis with the validated learning that has taken place. Otherwise the team risks stumbling when it finds it needs to scale, introducing wasted time and risk into the flow.

Saturday, 20 July 2013

Script: Never drink too much Java!!

In my trawls across the web, I come across a lot of OSS which always has the promise to deliver lots, but certain things let it down really really badly. This time round, it is the turn of some of the Javascript testing frameworks.

Unfortunately, there are far far too many and they are split across TDD and BDD and a large proportion of them have appalling documentation that is either inaccurate, non-existent or incomplete. So the provisions are awful, no two ways about it.

...Be warned, I may rant a lot in this post.

Javascript unit-testing?

Yes, Javascript testing. Why not? You unit test everything else (or should be) and even if you test the output generated in the HTML in some form of test, you don't automatically test the Javascript and it certainly won't be at the unit level if you're testing it through Selenium. This leaves both a massive hole in your test coverage and leaves the testing of Javascript far too late into the development cycle. So take note QAs, does your team cover their Javascript code?

OK, So What Do We Do?

The lesson I learned the hard way is it depends what you want to do with it and how you run it. If you want the tests to form part of a CI testing process, which integrates with Jenkins, TFS or "Insert your favourite task runner here" then you will likely end up working with mocha, which is a NodeJS application. I am going to be covering this here. If you want to build it into a web-page, to tack on to somehting like Fitnesse (remember that?) then you'll need something else, as whilst mocha contains that ability, it is rubbish out of the box, despite protests. Also, despite the tagline of "simple, flexible, fun" it is anything but if you are not experienced in Node, have an understanding of RequireJS and JavaScript AMDs and are used to a packages solution, because Mocha is anything but that. Read:
  • Simple - Can't be bothered writing decent documentation, you work it out
  • Flexible - Needs to be amalgamated with other tools just to do the simplest of things
  • Fun - ...If you like hunting for holy-grails  
The docs is the bit that gets my goat a bit. With my architecture hat on and a lean one at that, I have to be very aware how fast I can scale teams up or down. The requirement to have new team members get productive quickly is a massive factor as this can make a huge mess of the velocity/throughput of a team if they have to go digging in a framework or worse, pull time from others on the team who working at close to capacity. 

On top of that, you have to be aware of how the addition of frameworks affects the CI build process. Adding tech not already in the stack has to be justifiable via a kind of 'cost-benefit'. For example, adding a DI/IoC container is very useful for the development of modular, decoupled software, but the cost or trade-off is it couples your application to the framework and hence your ALM/CI/CD process has to account for it (even if it is only one line) and account for upgrades/modifications to it.

So all-in-all, the introduction of a new framework into a system is not something that I take lightly unless there is an obvious benefit and it significantly outweighs the cost of its introduction. For example, if a dev decides that they wanted to put a DI framework into a project which didn't need the more advanced features of it yet (such as profiles, configurations or something else), I tell them to shove it, since this simple dev level introduction can cause headaches for DevOps and tech services guys, since they may have to open connections in restricted environments, maintain a package environment of their own, of something else, just to save the one dev from writing what can be tantamount to a key-value dictionary. Plus, it may delay the project because of it by introducing a blocker into the flow if the environment takes time to process change requests for example.

In short, mochas distinct lack of coherent documentation (note, there are docs, but none of it flows well) means that it would not be something I would normally introduce, since it would hit flow really really badly and require new team developers to spend an age getting to grips with. However, given it's free and downloadable and it based on Node, I figured I'd give it a go.

What does JS Unit-testing on Mocha Look Like?

Woah horsey! First you have to install node, then install it, then set it up with an assertion library and then write the test, as the assertion library will dictate somewhat the structure of your tests and of course, if you pick a BDD like language, that will look different again.

A Shot of mocha

Mocha is based on NodeJS. indeed, it is deployed as a Node Packaged Module (npm) which is the Node equivalent of nuGet. However, first you have it install NodeJS. You can get Node from here.

Unlike mocha, Node's site is actually very good and the documentation is quite extensive and they make good use of examples.

Once you have installed Node, you can install mocha by opening a command prompt As Administrator and typing:

npm install -g mocha

This globally installs mocha so you can run it from anywhere in your directory tree. Hence once the installation competes, you can literally run mocha by typing mocha from the command line (which doesn't have to be opened in Administrator mode).

first run of an empty mocha 
Nothing special. What this has done is run mocha on a directory, it has found no unit tests (as there is no directory with unit tests in it) and has existed with a green "0 passing" message. It is very important to note that this produces this message.

Do I need an assertion library... What is an assertion library?
An assertion library is a file containing assertion modules to perform TDD or BDD style test validation. For the .NET chaps it is the .NET library in Assert.AreEqual in MSTest and similarly for NUnit. However, if you've ever downloaded fluent assertions separately, this is also a form of assertion library and is what we ahve to do with mocha via the 'require' RequireJS function.

Mocha Directories

One thing that the documentation doesn't make it easy to find is that out of the box, mocha requires that a 'test\' directory exist relative to where you run it from. You can recurse into other directories from the command line by including --recursive  in the command-line switches.

So let's try to set up to run some unit tests. The first stage is to create the directory structure. I created the following:

c:\spikes\mocha\             <-- Main subdirectory holding the classes/functions under test.
c:\spikes\mocha\test   <-- Which holds the unit-tests

To test this out, I intend to TDD a created a calculator class, which will just perform the four main arithmetic operations and

Creating Tests

Assertion library: assert

Within the test sub-directory, create a file called CalculatorTest.js and at the top of that file, put:

assert = require("assert");
Calculator = require("../Calculator.js").Calculator

The first line defines the requirement for assert, which is the vanilla assertion library available from node. The second line specifies a requirement for the calculator in the main subdirectory (i.e. the test directory parent) and after the dot (.) also specifies the actual class we want to use. For those with a .NET background, this is akin to aliasing in C# with:

using Calculator = Calculator.Calculator

When you run mocha, Node will follow the path to the assert library and download it if necessary. If it is not local it goes to the default npm location and aliases it to there. Calculator is then located in the subdirectory.

Run mocha and you will see your blank test file has produced the same 0 passing message as before. So be aware that mocha will run whatever it finds in the test directory, but also will return green tests if there are no tests. Not too problematic so far.

So let's build our first Calculator test. By default, using the assert library with mocha means you are going to be using the Jasmine style of assertion. For the calculator, this basically takes the form of:

describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result);
        });
    });
});

In general, you will put a describe option to describe the test harness and methods/scenarios, which is just a string that appears in the summary view, so can be anything you like. I prefer the usual BDD 'given... when... then...' syntax, but when reading the test in code, it was surprisingly strange to split it across this syntax so that the description on screen and the partial-fluent-like test read the same.

Additionally, the mocha documentation seemed to confuse me a little by prefixing hashes to method names, which initially made me think there was some form of JQuery, but then I came to my senses and figured I'd test it with just a string and a failing test and it worked fine.

The describe syntax just groups the tests into convenient units. In the above, I am effectively blocking up the test into a class level calculator tests, with a method test for adding 3 and 4. You will note that the supplied anonymous function is a callback, which then runs the expectation. Indeed, you can have many descriptions and many 'it' statements within them.

without a calculator class defined, you get this

So running this test, without a calculator class shows a thrown ReferenceError, with calculator not defined. Fine, that's OK, we can learn from that. Despite the dynamic nature of JavaScript, it conceptually makes sense anyway, as you'd expect the C# compiler to throw an error if there was no such class defined.

So defining a Calculator class in a file in the main subdirectory, which we can do using our usual standard OO JavaScript knowledge, we get:

function Calculator() {
}
 
Calculator.prototype.AddNumbers = function (p1, p2) {
    return 0;
}


This basically creates our class, with a public function AddNumbers, defined to return a zero instead of the sum of two numbers, as  failing test. Remember, Red-Green-Refactor.

Our directory structure should now look something like:

c:\spikes\mocha\Calculator.js
c:\spikes\mocha\test\CalculatorTest.js

Cool. So running mocha shows us:

mocha run with calculator (without module defined)
At this point I was totally scratching my head. Under normal circumstances, this should have worked without issue. What else did I need to do?

After a bit of trawling, I managed to figure that under the RequireJS usage, which I highlighted previously is used in Node, I needed to export the module. That initially made me a little uncomfortable, but actually, given the AMD nature of the code, and thinking about how you'd want to use this, I figured that I wouldn't really have much of an issue with the idea of exporting a module, since this is a better loading system than you'd otherwise have (for example, loading on the command line or worse, static loading in a web page).

So I added the following line to the bottom of the Calculator.js file:

module.exports.Calculator = Calculator;

Run mocha again and you get the first legitimate failure.

Woohoo! First test failure!! :)
The error message is somewhat rubbish, so I figured I'd change it in the CalculatorTest.js file to include a message at the end of the the assertion:

describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result, "But the number " + result + " was returned instead");
        });
    });


Which then output the following message (highlighted region shows the new message):

Meaningful error message
So let's make the test pass. Change the AddNumbers function in Calculator.js to return the sum of both parameters and run the test. You should get:

First green test pass :)
Note the number of full stops output. This shows the number of tests run and of course, the green text shows the number of tests passed.

The same process is used for the rest of the TDD. You can find references to the assertion options for your chosen library at their website.. but it might be rubbish so be warned. 

Fast forwarding through those to an exception scenario. The divide by zero error. I have added this at the bottom of the CalculatorTest.js file:

//-- CalculatorTest.js
assert = require("assert");
Calculator = require("../Calculator.js").Calculator
describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("aubtracting 3 from 5"function () {
        it("should return 2"function () {
            var result = new Calculator().SubtractNumbers(5, 3);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("multiplying 7 and 8"function () {
        it("should return 56"function () {
            var result = new Calculator().MultiplyNumbers(7, 8);
            assert.equal(56, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("dividing 8 by 4"function () {
        it("should return 2"function () {
            var result = new Calculator().DivideNumbers(8, 4);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
 
    // Catching exceptions 
    describe("dividing 5 by 0"function () {
        it("should throw an exception"function () {
            // The throw in the next line is caught by the assert and hence registers as a pass.
            assert.throws(function(){ new Calculator().DivideNumbers(5, 0) }, Error"This did not throw the expected error");
        });
    });
});

Javascript throws exceptions just like every other language. The selection of assertion library will determine what syntax is used to capture those exceptions. In the case of assert, it is simply uses assert's "throws" method, which basically runs the supplied function inside a try...catch. However, note one very important feature, which left me scratching my head for 5 minutes until this StackOverflow post:  

http://stackoverflow.com/questions/6645559/is-nodes-assert-throws-completely-broken

It doesn't take a parameter for the FUT. So you have to wrap it in a function to test it.  

Within the 'A calculator' description, I deliberately throw an exception inside DivideNumbers() when the second parameter is zero. 

//-- Calculator.js
function Calculator() {
}
 
Calculator.prototype.AddNumbers = function (p1, p2) {
    return p1 + p2;
}
 
Calculator.prototype.SubtractNumbers = function (p1, p2) {
    return p1 - p2;
}
 
Calculator.prototype.MultiplyNumbers = function (p1, p2) {
    return p1 * p2;
};
 
Calculator.prototype.DivideNumbers = function (p1, p2) {
    if (p2 === 0)
        throw "Attempt to divide by zero!";
 
    return p1 / p2;
};
 
module.exports.Calculator = Calculator;

Additionally, when I looked at the above code and back at Jasmine, there is the possibility of using multiple  'it' statements, which makes sense when describing muliple tests (such as positive and negative tests). The DivideNumbers() method call is ideal for this as it has a number of scenarios which you need to test. So I refactored the test code into:

assert = require("assert");
 
Calculator = require("../Calculator.js").Calculator
describe("A calculator"function () {
    describe("adding 3 and 4 together"function () {
        it("should return 7"function () {
            var result = new Calculator().AddNumbers(3, 4);
            assert.equal(7, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("aubtracting 3 from 5"function () {
        it("should return 2"function () {
            var result = new Calculator().SubtractNumbers(5, 3);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("multiplying 7 and 8"function () {
        it("should return 56"function () {
            var result = new Calculator().MultiplyNumbers(7, 8);
            assert.equal(56, result, "But the number " + result + " was returned instead");
        });
    });
 
    describe("dividing"function () {
        it("8 by 4 should return 2"function () {
            var result = new Calculator().DivideNumbers(8, 4);
            assert.equal(2, result, "But the number " + result + " was returned instead");
        });
 
        it("7 by 2 should return 3.5"function () {
            var result = new Calculator().DivideNumbers(7, 2);
            assert.equal(3.5, result, "But the number " + result + " was returned instead");
        });
 
        it("5 by 0 should throw an exception"function () {
            assert.throws(function () { new Calculator().DivideNumbers(5, 0); }, Error, "This did not throw the expected error");
        });
    });
});

...plus added a real number test as well, testing the tests against the unmodified code all the time (remember, when everything is green and you refactor your tests only, you are effectively using your code as the mould your tests fit into. If you have to, red out the tests one at a time by changing only the test code, refactor the test code and make it green again by fixing the test code ONLY in an exploratory testing kind of way). All six tests now pass:

6 passing, refactored tests


Conclusion

JavaScript unit testing has the potential to be great and it is much needed. However, the state of the documentation varies greatly and you really need to pick the one where you can get up to speed quickly. In this case, I had to bear in mind that this is a combination of assert, require (and AMDs), mocha, and node.

This needs further coverage, so I will look at the combination of mocha with expectJS, chai (in TDD and BDD mode - note the documentation looks nicer for chai, but it still pretty thin on the ground when it comes to examples). However, just to get this far was a day's work. This is a substantial learning curve which means that turning around new staff, however smart they are, will take longer than necessary.

The main thing for me from this experience, is that if the frameworks the OSS community wish to supply to the commercial world have poor documentation or learning resources, the introduction of such frameworks into agile teams will greatly slow down the team. That means teams can't scale upwards as fast as they need to, to keep up the velocity/throughput, especially if there is no slack in the development process. The lack of slack means there is no time to learn and that increases risks around adoption of new frameworks and new team members. This applies across developers, analysts, QAs, DevOps and tech services. The introduction of frameworks is something that needs thinking about as a fairly typical trade-off analysis.