Monday 28 September 2015

Genetic Algorithms and BarCamps :)

It's Monday. I didn't have a Saturday or Sunday, since I was enjoying myself at BarCamp Manchester 2015. If you're not familiar with what a BarCamp is (which I wasn't really until his weekend), it's a form of 'unconference' with no scheduled speakers. The attendees are the speakers :) The rules of Manchester BarCamp included that anyone who hasn't spoken before MUST speak.

So in a panic, I hastily put together one presentation on the Saturday to get out of the way. I also didn't know what I should be pitching, despite the fact I could have picked any subject. I have a lot of interest in so many things it makes it really hard to choose sometimes. So figured I might try two at a push. The first on Saturday was entitled... well, "Sh*t that moves" which I learnt led to people thinking it was about robotics (one of those "Oh yeah moments" - *bomb*) the other on Sunday I decided to make much more specific and less click-bait like. I decided to focus on something I hadn't really picked up in 15 years and that topic was Genetic Algorithms.

I tore through a quick piece of JavaScript code after I put my card up on the wall that morning which I used to illustrate a really simple version of the concepts. I wrote it in JavaScript and visualised it in D3 JS, which this gave me an excuse to try out.

By comparison, this talk seemed to go down better *phew* and I'd like to give my thanks to those who have provided that feedback. Much appreciated. Many folk also asked if I have a blog (yes) and also asked me to post my code on JSBin/JSFiddle, so I did yesterday and am now following this up with this post to explain the concepts and go through the code in a little more detail.

Guided by Nature

Genetic Algorithms are considered by many a branch of artificial intelligence. Personally, I don't think that's entirely correct, as it's more in the realms of distributed multi-agent systems and the software side of BIMPA. The intelligence in such systems is contained in the ability of the agents to solve a problem or to search a space as a population. A rough comparison is it's like 'wisdom of the crowds' but operating through genetic or memetic transfer of information. These concepts can also be applied to programming in such a way that programs built from scratch solve problems more effectively than humans have done coding traditional algorithms. As is often the case, the complex behaviour of the population is non-linear even though an individual member's rules of reproduction are inherently simple.


As hinted, a genetic algorithm, just like monte-carlo approaches, uses multiple agents or samples to determine solutions to the problem you're trying to solve. Each agent or sample contains an encoding of the information you wish to find, akin to a genetic sequence in biological systems. They start very often from a random sequence and through a process of one or more cross-over of a subsequence of that genetic code with a fit member of its cohort and potential mutation of one or more characters of the encoding string, it produces children (usually two children per cohort-pair) who share ideally, the best characteristics of each parent, perhaps with a 'genetic mutation' which takes in the population, or dies out over time.

a single pass through a genetic reproduction operation between two fit parents

Often, when the entire cohort completes its reproduction, that is considered a generation. Members of the preceding generation may remain or die out depending on the approach to the problem you're trying to solve.

Measuring Fitness

In order to determine how close you are to something, you have to know two things. Firstly, what or where it is you want to get to. Secondly, where you are now. If you don't have either of those two things, you can't measure how fit a solution is for purpose. Any measure of displacement is conducted through a distance metric, usually comparison what I call the 'utility' of something with the point you're trying to achieve, which in genetic algorithms is very often used to determine how fit a solution is for the purpose it was evolved for.

Code Walk Through:

Opening up the trivial JSBin example here, you'll see the scrap code I put together (I keep reiteratiing the shame that it's not productionisable - especially since I break Single Responsibility in it. Oops!). All it aims to do is evolve towards an ideal string, in this case provided by a user in the text box labelled "Fitness string:". I slowed it down to make it easier to see the operation via a visualisation of all agents below it. The visualisation is shown on the rightmost JSBin frame. A population of 100 agents is shown in the grid of circles.

In order to understand what the circles mean, we need to look at the distance metric code contained in the 'fitness' function:

Here the fitness of an agent is decided using a hamming distance between the ideal (targetFitness) word and the agent's genecode. A Hamming distance is just the number of positions where two strings differ. For example, John and Jean have a hamming distance of 2, Jen and Jon have a hamming distance of 1. Jim and Jim have a Hamming distance of zero.

Since we're trying to find the closest match, we are looking to define the shortest hamming distance as the fittest organism. However, this can equally apply to longest, biggest, fastest, least number of steps, richest etc. etc. depending on the problem you're looking to solve.

If you are familiar with TDD/BDD, this may seem strangely familiar. The goal is your acceptance test criteria, the initial condition is where the agents start and the actions, how it gets from pre to post-conditions you don't care about :)  Kind of how you'd hope managers/architects leave devs to fill in the gaps.


In this code, all I am doing is taking the top 33% when sorted by fitness and breeding them.There is no reason they can't breed across other parts of the population, but this may take longer to converge on the solution.

This happens in two steps. The first is to sort the population by fitness (in order of increasing distance) and and then breed them with any other in that space. This then runs into the crossover function, which takes the genecodes of each  members, pairs them and mutates them (for illustration, I took out the random occurrence of mutations and maintained the random mutation point). You can see this as part of the Baby() class, which represents each cohort member:

When the button is clicked, the population breeds. I count the number of generations for convenience.

So the circles?

Yes, the circles. The darker the circle, the shorter the distance from the goal. A fully black circle, (hex rgb = #000000), hits it straight on (I circle the winning members in green when they achieve it). The lighter the circle, the further away it is from the fitness goal. Note how the cohort responds to longer strings of characters, how long it takes to get to a solution etc. also run exactly the same test twice and see how the different start configurations change the number of generations required to reach the goal. Are there any you've found which don't ever seem to get closer?

...And the sumDiff comment?

Yes, that was deliberate. This was an extra function I put in which simply counts the alphabetic distance between the characters in the strings and adds them. This then becomes the distance metric.

Evolutionary Strategies

There are crucially many ways (and combinations of such ways) you can use to change how the population evolves.

  • Change the distance function/metric
  • Number of fit members (I chose 33% but this can vary)
  • Increase number of cross-over points
  • Increase the number in the cohort
  • Apply attrition/mortality rate
  • Unconstrain population growth
  • Run large populations in parallel
  • Randomise weather a mutation occurs or not 
  • ....

Play around with any or all of these. It would be interesting to see what you come up with. If you have any questions, drop them in the comments and if pertinent, I'll update this post to answer them.


Genetic algorithms are not new. They've been around for over 20 years and was heavily into them at the turn of the millennium. I was spurred to give this talk after Rosie Campbell's talk on the "Unexpected Consequences of Teaching Computers to See" (which was hilarious! I have to say! So I nicked the concepts on one slide to illustrate problems with mutations - but she played a vid from 20 years ago which showed evolving virtual lock animals). There are places where Genetic Algorithms work well. Evolving solutions or programs. However, there are some major drawbacks, including the fact it needs to be 'trained' (either supervised or unsupervised) and hence, have very little productive usefulness for a long time. As I mentioned in the talk, sometimes 'champion-challenger' algorithms perform better since the approach can be used with production data 'out of the box' and the fittest algorithm evolves as it goes.

Overall, BarCamp Manchester was an awesome experience! I had a whale of a time! Really really enjoyed it. Well worth going to one if you're looking to improve your public speaking and given not everyone's talk

Thursday 10 September 2015

Lean Enterprise

I attended the Lean Enterprise session last night at ThoughtWorks Manchester. Speaking were Bary O'Reilly and Joanne Molesky, who coauthored the upcoming Lean Enterprise book with Jez Humble.

I happen to like Barry O'Reilly's work. As a lean practitioner, I don't think I've ever disagreed with anything he's said (at least, not to any significant degree - believe me, I try :). Whilst I came into the venue and fumbled my way to a seating position with folded pizza slices in hand, they had just started speaking (thank you Manchester City Centre for having so many roadworks going on at the same time that I had to U-turn on myself 3 times to get in).

I am always interested in looking at how companies close the feedback loop. i.e. how they learn from the work they've done. Not just learn technologically, but also about their process, about themselves, their culture and how they work. I'm a great advocate of data driving retrospectives. Hence, I always find myself wanting CFDs, bug and blocker numbers and a generally greater understanding of how we're developing in a particular direction.

With this in mind, I asked a question about the hypothesis driven stories (which are a really great idea that Barry has shared with the community before). The format of the story is akin to:

" We believe that <doing this action>
  Will result in <this thing happening>
  Which benefits us <By this amount>"

What I asked was around how he gets people to come up with that measurable number. There's always a nervousness in the community when I ask about this sort of thing. I don't mean to cause it, it just happens :)

Why asked it?

When working in build-measure-learn environments, including those in lean environments, the learning process aims to become more scientific about change. If the result is positive, that's fine, since every organisation wishes positive change. However, if it's negative, that's also fine, since you've learned your context doesn't suit that idea. Spending a little money to learn a negative result is just as valuable, since you didn't spend a fortune on it. The only real waste when learning is spending money on inconclusive results. Hence, if you design an experiment which is likely to yield and inconclusive result, then you are designing to spend money generating waste.

What's inconclusive?

For those who use TDD, you might be familiar with this term. If you run unit tests, you might see the odd yellow dot when the test doesn't have an assertion (async JS programmers who use mocha may see it go green, oddly). This is a useful analogy, but not wholly correct. It isn't just that you're not measuring anything, which is bad enough, since most companies don't measure enough of the right stuff (hence, most of the results of expenditure are inconclusive in that regard), it's also concluding an improvement or failure under the necessary significance threshold.

Say what? Significance Threshold?!

The significance threshold is the point at which the probability of false results, the false positive or false negative is negligibly small and you can accept your hypothesis as proven for that scenario. Statisticians in frequentist environments, those which work off discrete samples (these map to tickets on a board tickets), are very familiar with this toolkit, but the vast majority of folk in IT and indeed, businesses aren't, sadly. This causes some concern, since money is often spent and worse, acted on (spending more money), when the results are inconclusive, there is not just no uplift. Sometimes it crashes royally!

Here's an example I've used previously. Imaging if you have 10 coins and flip them all. Each fli i a data point. What is the probability of heads or tails? Each one is 50%, but the probability of getting a certain number of heads is normally distributed. This may perhaps be counter-intuitive to folk:

So you can be quite comfortable that you're going to get 5 heads in any ten flips with any ten fair coins. However, if you look at zero heads or all heads after all the flips, the outliers, these are not very likely. Indeed, if you get your first head, the probability of getting zero heads in 10 after the remaining 9 have been flipped as well is obviously zero (since you already have one).

Now let's suppose we run the experiment again with the same number of coins. An A/A-test if you like. Let's suppose we get 4 heads. Is that significantly different? Not really, no. Indeed, many good researchers would consider a significant difference to fall at either 0 or 10 in the above before they call a change significant. Indeed, an unfair coin, one which has only a head or a tail on both sides will give you exactly that outlier (all tails or all heads). Anything before this is regarded as an insignificant change. Something that you already have knowledge for or can be explained by the existing context, not the new one the team delivers, or 'noise'.

Why is this important in lean enterprises?

In business, you spend money to get value. It's as simple as that. The biggest bang for your buck if you will. Positive and negative results, those that yield information, are worth paying for. Your team will be paid for 2 weeks to deliver knowledge. If there are 5 people in the team, each paid for two weeks at £52,000 a year each (gross, including PAYE, employers NIC, pension, holidays, benefits etc.) that is £10,000.

If the team comes out with knowledge that improves the business value by 3% and the required significance level is a 7% uplift, this value addition is insignificant. Rolling this out across the whole enterprise will cost you significant amounts of money, for a result which would likely happen anyway if you left the enterprise alone. At the end, you'll be down. Lots of consultancies which have delivered positive results have actually seen this sadly. However, as Joanne rightly said in the meetup, it's often just as easy to do the opposite, and miss opportunities because you didn't understand the data. The false negative.

Teams have to be aware of that level of significance. That depends very much on sample size. You need to get a big enough sample for the 'thing' you're trying to improve. Significance levels also generally depend on that the degrees of freedom (how many possible categories each sample can fall into - heads or tails) and the probability of false positives and negatives.

If you have a pot of £1 million, each costing £10,000 you can run 100 experiments. You need them to be conclusive. So select your hypothetical number for acceptable value, the threshold beyond which a credible change can be deemed to have occurred, before you spend the money running the experiment.

Otherwise you won't likely just lose the money to gain zero knowledge (pressure to declare a result conclusive which isn't, is just another form of the sunk cost fallacy), you may end up spending more money on a result that isn't credible and it will most likely bomb (check out Bayesian stats for why), or also miss opportunities for growth adding value or something else. As a result, I'd argue that you need to know the hypothetical sample size requirement (there are tools out there to do that), but also remember to stop when you reach that sample size, not before (since you'll destroy the credibility of the experiment) and not too long after (since you're getting no significant increase in extra knowledge, but you are still spending money).

Keep it lean, keep it balanced! :)