Monday 28 September 2015

Genetic Algorithms and BarCamps :)

It's Monday. I didn't have a Saturday or Sunday, since I was enjoying myself at BarCamp Manchester 2015. If you're not familiar with what a BarCamp is (which I wasn't really until his weekend), it's a form of 'unconference' with no scheduled speakers. The attendees are the speakers :) The rules of Manchester BarCamp included that anyone who hasn't spoken before MUST speak.

So in a panic, I hastily put together one presentation on the Saturday to get out of the way. I also didn't know what I should be pitching, despite the fact I could have picked any subject. I have a lot of interest in so many things it makes it really hard to choose sometimes. So figured I might try two at a push. The first on Saturday was entitled... well, "Sh*t that moves" which I learnt led to people thinking it was about robotics (one of those "Oh yeah moments" - *bomb*) the other on Sunday I decided to make much more specific and less click-bait like. I decided to focus on something I hadn't really picked up in 15 years and that topic was Genetic Algorithms.

I tore through a quick piece of JavaScript code after I put my card up on the wall that morning which I used to illustrate a really simple version of the concepts. I wrote it in JavaScript and visualised it in D3 JS, which this gave me an excuse to try out.

By comparison, this talk seemed to go down better *phew* and I'd like to give my thanks to those who have provided that feedback. Much appreciated. Many folk also asked if I have a blog (yes) and also asked me to post my code on JSBin/JSFiddle, so I did yesterday and am now following this up with this post to explain the concepts and go through the code in a little more detail.

Guided by Nature

Genetic Algorithms are considered by many a branch of artificial intelligence. Personally, I don't think that's entirely correct, as it's more in the realms of distributed multi-agent systems and the software side of BIMPA. The intelligence in such systems is contained in the ability of the agents to solve a problem or to search a space as a population. A rough comparison is it's like 'wisdom of the crowds' but operating through genetic or memetic transfer of information. These concepts can also be applied to programming in such a way that programs built from scratch solve problems more effectively than humans have done coding traditional algorithms. As is often the case, the complex behaviour of the population is non-linear even though an individual member's rules of reproduction are inherently simple.

Characteristics 

As hinted, a genetic algorithm, just like monte-carlo approaches, uses multiple agents or samples to determine solutions to the problem you're trying to solve. Each agent or sample contains an encoding of the information you wish to find, akin to a genetic sequence in biological systems. They start very often from a random sequence and through a process of one or more cross-over of a subsequence of that genetic code with a fit member of its cohort and potential mutation of one or more characters of the encoding string, it produces children (usually two children per cohort-pair) who share ideally, the best characteristics of each parent, perhaps with a 'genetic mutation' which takes in the population, or dies out over time.


a single pass through a genetic reproduction operation between two fit parents

Often, when the entire cohort completes its reproduction, that is considered a generation. Members of the preceding generation may remain or die out depending on the approach to the problem you're trying to solve.

Measuring Fitness

In order to determine how close you are to something, you have to know two things. Firstly, what or where it is you want to get to. Secondly, where you are now. If you don't have either of those two things, you can't measure how fit a solution is for purpose. Any measure of displacement is conducted through a distance metric, usually comparison what I call the 'utility' of something with the point you're trying to achieve, which in genetic algorithms is very often used to determine how fit a solution is for the purpose it was evolved for.

Code Walk Through:

Opening up the trivial JSBin example here, you'll see the scrap code I put together (I keep reiteratiing the shame that it's not productionisable - especially since I break Single Responsibility in it. Oops!). All it aims to do is evolve towards an ideal string, in this case provided by a user in the text box labelled "Fitness string:". I slowed it down to make it easier to see the operation via a visualisation of all agents below it. The visualisation is shown on the rightmost JSBin frame. A population of 100 agents is shown in the grid of circles.

In order to understand what the circles mean, we need to look at the distance metric code contained in the 'fitness' function:




Here the fitness of an agent is decided using a hamming distance between the ideal (targetFitness) word and the agent's genecode. A Hamming distance is just the number of positions where two strings differ. For example, John and Jean have a hamming distance of 2, Jen and Jon have a hamming distance of 1. Jim and Jim have a Hamming distance of zero.



Since we're trying to find the closest match, we are looking to define the shortest hamming distance as the fittest organism. However, this can equally apply to longest, biggest, fastest, least number of steps, richest etc. etc. depending on the problem you're looking to solve.

If you are familiar with TDD/BDD, this may seem strangely familiar. The goal is your acceptance test criteria, the initial condition is where the agents start and the actions, how it gets from pre to post-conditions you don't care about :)  Kind of how you'd hope managers/architects leave devs to fill in the gaps.

Breeding

In this code, all I am doing is taking the top 33% when sorted by fitness and breeding them.There is no reason they can't breed across other parts of the population, but this may take longer to converge on the solution.


This happens in two steps. The first is to sort the population by fitness (in order of increasing distance) and and then breed them with any other in that space. This then runs into the crossover function, which takes the genecodes of each  members, pairs them and mutates them (for illustration, I took out the random occurrence of mutations and maintained the random mutation point). You can see this as part of the Baby() class, which represents each cohort member:


When the button is clicked, the population breeds. I count the number of generations for convenience.

So the circles?

Yes, the circles. The darker the circle, the shorter the distance from the goal. A fully black circle, (hex rgb = #000000), hits it straight on (I circle the winning members in green when they achieve it). The lighter the circle, the further away it is from the fitness goal. Note how the cohort responds to longer strings of characters, how long it takes to get to a solution etc. also run exactly the same test twice and see how the different start configurations change the number of generations required to reach the goal. Are there any you've found which don't ever seem to get closer?

...And the sumDiff comment?

Yes, that was deliberate. This was an extra function I put in which simply counts the alphabetic distance between the characters in the strings and adds them. This then becomes the distance metric.

Evolutionary Strategies

There are crucially many ways (and combinations of such ways) you can use to change how the population evolves.


  • Change the distance function/metric
  • Number of fit members (I chose 33% but this can vary)
  • Increase number of cross-over points
  • Increase the number in the cohort
  • Apply attrition/mortality rate
  • Unconstrain population growth
  • Run large populations in parallel
  • Randomise weather a mutation occurs or not 
  • ....

Play around with any or all of these. It would be interesting to see what you come up with. If you have any questions, drop them in the comments and if pertinent, I'll update this post to answer them.

Summary

Genetic algorithms are not new. They've been around for over 20 years and was heavily into them at the turn of the millennium. I was spurred to give this talk after Rosie Campbell's talk on the "Unexpected Consequences of Teaching Computers to See" (which was hilarious! I have to say! So I nicked the concepts on one slide to illustrate problems with mutations - but she played a vid from 20 years ago which showed evolving virtual lock animals). There are places where Genetic Algorithms work well. Evolving solutions or programs. However, there are some major drawbacks, including the fact it needs to be 'trained' (either supervised or unsupervised) and hence, have very little productive usefulness for a long time. As I mentioned in the talk, sometimes 'champion-challenger' algorithms perform better since the approach can be used with production data 'out of the box' and the fittest algorithm evolves as it goes.

Overall, BarCamp Manchester was an awesome experience! I had a whale of a time! Really really enjoyed it. Well worth going to one if you're looking to improve your public speaking and given not everyone's talk

0 comments:

Post a Comment

Whadda ya say?