Thursday, 11 December 2014

What's wrong with a Little predictability?

I was asked recently about Little's law. For the uninitiated, it is a fundamental, but elegant result in queuing theory. It's akin to the simplicity of Einstein's 'E' equals mc squared as it reduces a whole heap of complexity into a few simple variables. It is now finally being applied to software Kanban having existed way before the field of software engineering ever existed.

In software, it's pretty simple and relates the average number of cards in play (between the backlog and done) to the average cycle time and arrival rate. If your arrival rate is the same as your service rate, which in Scrum you would expect it to be if you're delivering all your cards in that Sprint's time period, you end up with a pretty good link.

So what's the problem?


The issue is (again) that people miss a crucial detail. It's how KISS differs from Occam's razor and how folk abuse the agile manifesto. Remember the items on the right? Now do you remember the last statement that references them? ("Whilst there is value in the items on the right, we value the items on the left more").

With Little's law, it is that the team has to attain predictability. That predictability is the team consistently delivering the same number of points every sprint and/or having a consistent cycle time. Little's law doesn't technically have a stochastic component, so obviously needs stability to attain a zero variance. The problem you have, especially at the beginning of each 'project' [*grumble* *humbug* need #NoProjects] is that you do not have that stability. Teams can under or over-perform, so there isn't stability. That said, a team that is also improving and delivering 'more', which is always desirable, then has the disadvantage that they're not naturally stable! They are delivering more, so naturally the average changes.

But isn't improving a good thing?

Totally! It's the best thing you can do! However, if you are hoping to use Little's law to project/forecast in an environment which is improving, you can't do it because of this. At least, you can't do it without the introduction of a stochastic component, or comparing against the desired burn-up. Believe it or not, improving is instability which naturally increases the variance of the delivery as a whole. That's your trade off! Continual improvement means you cannot gain the stability needed to use Little's law!

*shock horror*

Are you sure?

Yep, very!

Consider the following graph. it shows a team's data where they do not improve their delivery and are running late to start. If projecting forwards, their variance is very narrow. You're going to be very late, but you're pretty sure they are going to be late. If you plot the projection of the end of the 'project' through the average burn-up as you accumulate ACTUAL data, you'll see where it's likely to be:


Team who do not improve


Little's Law could be used here to project where they're going to be and if you look at the range of possible outcomes in the time allotted or the time variance needed to complete the scope (remembering the golden triangle) you'll see this is much narrower than the team who improve below!

Team practising continuous improvement
Here Little's Law is pretty much no use! Indeed, in most teams, you can't get enough of a data set for each improvement to measure the average and deviation reliably.

Conclusion: What to do?

At the end of the day, you're just trying to give yourself the best chance. It’s not intuitive to applaud greater variance, since that’s normally greater risk, but because the variance needs to ‘cross’ a value (the average, which in this case is the original burn-up. i.e. 'fixed' at the outset if scope is fixed), it’s the more points you deliver ‘above the thick red line’ that count. If it swings wildly with the majority of the mass below the red line, you’re scr3wed. If it’s above, you're rocking! This is why I prefer to get the average of teams to be on or above the red line and then reduce the variance, since this gives you greater certainty about the burn-up rate.

So, in short, there are a million and one tools out there to help folk with software development and predictability. Teams have to be careful they don't pick a tool and misapply it and it's these limits that often tell us whether it is appropriate or not to use it. The situation where it doesn't work may outnumber the ones that do. We're not all hammering nails, after all.

Friday, 21 November 2014

#LeanConf 2014: 4 Fave Presenters

Short one this one, as it has been an exhausting week!

I was at #LeanConf in Manchester this week and of the amazing and inspiring speakers, there were a few that stood out. My top 4 were:

Ton Wesseling 

twitter handle: @tonw

Hands down my personal favourite presenter there! Being a bit of a data geek myself, I loved the data and educational elements of his presentation. Whilst not new to me, he's the sort of guy in the industry who can help organisations close the leaning loop by allowing you to truly understand your data, improvements, A/B-test results, what to focus on and what to ignore. When you are the only guy in pretty much every single company you go into who walks and talk agile metrics, performance, statistics, learning, data, data and more data, it can get to be a very lonely place until you find another person in the world who shares the same passion, knows what's just enough, and both its importance and pitfalls.

I took the time to speak to Ton after his presentation, specifically about how to get the statistical thinking into some teams as this often requires bridging a huge skills gap ad his answer was pretty simple. Employ psychologists! I have long thought that psychologists have a place in organisations, but I as yet to be convinced that I could justify suggesting a formal psychologist role at team level so steered clear of suggesting them. Psychologists bring both human psychodynamics AND statistics to the table, since they have to study it. So having this suggestion come from someone who's done it does add some validity to the idea, so I look forward to trying it out.

Janice Fraser 

twitter handle: @clevergirl

My favourite presentation from an entertainment point of view. It was awesome to see her present and she had me and the rest of the audience in fits of laughter! My stomach was aching the whole day after as if I'd had a session at the gym... and I do go to the gym! Her presentation about Gab Zichermann's new educational system and use of games and puzzles to educate helped promote curiosity and traditional skills in education. I have to vouch for this, as whilst I was classically educated, it was the stuff I did outside school that put it into practise and hence, allowed me to score highly in school/college/uni yet not have to do a single day's worth of revision, because these were skills I used all the time. Definitely think there is something in this.

Tristan Kromer 

twitter handle: @TriKro

My best memory award goes to Tristan. His slides didn't work unfortunately, but he blasted through the whole presentation, by heart, without missing a step. Awesome professionalism!

This isn't to say that other presenters weren't good, as it was a tough choice. Everyone will have a different favourite 3. For example, Barry O'Reilly from ThoughtWorks provided an informative talk on a classical Enterprise Agile problem, optical illusions and plenty of Watermelons :)

Ash Maurya

twitter handle: @ashmaurya

The author of Running Lean spoke about how companies are basically customer factories. Thy produce happy customers. He also talked about testing the market and the crucial feedback loop that allows the factory to respond to market opinion and change. He's certainly well aware of the need to consider the data when deciding how much to invest and work with.

Enjoyed #LeanConf! Especially since I won a copy of Ash Maurya's book, Running Lean for asking a question at the right time. Looking forward to next year! :)

Wednesday, 29 October 2014

Cone Head!!

A topic that seems to come up time and time again that folk seem to either take to or not, is the idea of an 'uncertainty cone'. I briefly touched on this in a previous post where I was violently disagreeing with Woody Zuil and Nick Killick, not on their principle of #NoEstimates, since the method has definite merit, but on the specifics of the merit that it has.

I'll take this time to explain a little more about the cone of uncertainty for those who are not familiar with it, or who would like to see a more practical example of what it is. To do so, let's consider 10 flips of a coin as the example. There are 2 to the power of 10 possible combinations of 10 head or tale results.

Before rolling the die a first time, I want you to guess what the final total may be. How many heads do you think you'll get?

Well, if you think about all the combination (0 heads, 1 head, 2 head...) and thus build a histogram of all results, you get this:

number of heads when flipping a coin 10 times - University of North Carolina (via Google images)
I'll come back to this later, but you should have a number in your head. Let's now consider the range of all possible numbers of heads at the end from this point. i.e. before the first flip. You can either get a minimum of 0 heads, or a maximum of 10 heads, or of course, anything in between right? Cool.

1st Flip

When you flip the coin the first time, it comes up say, tails. This does two crucial things:


  1. It gives you an actual result to work with, so you now have 9 uncertain results and 1 actual result.
  2. Now that you have flipped a tail, you cannot get 10 heads. Given that in the above histogram which applies all the time, there is only one scenario, that scenario is now out! The best you can hope for is 9 heads, given you've flipped 1 tail.
Drawing up the table of min and max heads after the first flip we can see:

2nd Flip

Flipping the coin the second time, it comes up say, heads. This also does two crucial things:


  1. It gives you an actual result to work with, so you now have 8 uncertain results and 2 actual results.
  2. Now that you have flipped a head, you cannot get 0 heads, because you have at least 1. Given that in the above histogram which applies all the time, there is only one scenario with 0 heads, that scenario is now out! Your rage is now 1 head to 9 heads.




Put this in the table and flip again. Follow the rule that if you flip a head, you increment the minimum by one, otherwise you have flipped a tail so decrement the maximum heads by one (because you now don't have enough flips to get the previous maximum).

<<Fast Forward>>

10-Flips



10-flips completed

So we've completed the whole 10 flips, incrementing the minimum if we get a head and decrementing the maximum if we get a tail. Surprise surprise, by the end, we have two ends that meet in the middle (which is correct, because by that point, we have 10 actual results and thus, no uncertainty at all). You can double check this by counting the number of heads you got, which is 4 in this case, against the meeting point of the maximum and minimum, which is 4. If you don't, then you've banjaxed your counting, so you might want to ask a 3 or 4 year old for help next time.

Making the Cone

From this table, we simply have to plot the flip number against the minimum and maximum number of heads. So let's do that. I've also included the trend lines, in black, which show the trajectory of the minimum and maximum numbers. The gap in the middle is the level of uncertainty or variance:


cone of uncertainty

Let's recap what happened. At the beginning, we had no idea where we were going to end up [with how many heads], aside from the range of 0 to 10 heads. As we progressed, we reduced the size of the range of possible 'options' or heads we could get and by the end, we were where we were.

Map this to typical IT projects. At the beginning, we have no idea where we're going to finish. As we progress and choices are made (which honestly do sometimes seem random), we reduce the total number of potential options that we have (which isn't always a bad thing, especially if we discount the highest waste or risk options) and eventually, we come to rest somewhere. Also, despite everything, we always know where we're are starting. We're starting 'here'. The end of the last cone (or part thereof).

And the First Histogram?

Returning to the histogram, which is built up from a knowledge of all possible combination of coin flip (it is a closed probability space mind you, which isn't always the case in software), you can see straight from this that the best options for your guess is 5 closely followed by 4 and 6 heads. The curve is a bell curve, aka Normal Distribution, and in this case it is fine.

Epilogue

The only real difference with development is the probabilities in software development are somewhat conditional, since the decisions we make are not random, but somewhat stochastic, or at least Bayesian, since we ourselves learn and make better decisions or become more productive, which help us descend the cone faster. It's good enough, so should still be used, but if you're a masochist, then I best at least tell you that something has recently come to my attention in the field of theoretical statistics which may be useful for the part that is currently quantified normally. That something is the Tracy-Widom distributions, which appear ever to slightly skewed to the right. It's not something I've used [yet] and it is somewhat advanced, but I am excited to see where this field goes.

Sunday, 12 October 2014

Evolving & Emerging Architecture: Agile-EA

Going into companies is always an interesting experience. You get to view the way they work and in agile organisations using physical boards, you can walk the floor and view how the work is progressing. That is well known in the agile project management arena, product owner roles etc. but it's also an extremely useful technique in the Agile Enterprise Architecture world (Agile-EA). Before going into why, we need to recap a couple of definitions.

1. Conway's Law

In 1968, Melvin Conway provided this now tried and tested gem.

"organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations"

This is a huge revelation for some companies, but it got hidden by process oriented techniques, which masked the problem for decades.  

In the agile space, it's a very different story. The reliance on teams to develop their own processes, and having the greatest expressive power within them as opposed to across them has resulted in this being more applicable than ever. You can turn this to your advantage and having elements of a workflow which make up the greatest proportion of it, contained within teams really improves the rate of delivery of work, as I have covered in previous posts and is a much better facilitator of outcome than having a distributed development team. This also means that cross-team communication encourages the development of a system through that communication channel.

2. Consumer Driven Contracts

A consumer driven contract can be considered a Kanban pull signal. This pulls a feature from another team and in software development spaces, this also includes a sub-suite of tests from the consuming team. When a feature provided by another team is needed, a cross-team card, often of a different colour is placed on the team board who are to provide that functionality (at Laterooms,com for example, they used rainbow coloured cards at one point). That team then pick the card up and code against the tests to make them pass. 

Consumer driven contracts are pretty much the most central concept in true just in time, cross team software delivery. They often have a subtask link to the source board and effectively provide an architectural link between application components or business features.

This provides opportunities for architects in both the enterprise and/or solution sphere to see how their estate is evolving by simply going to all stand-up meetings, and building a picture of the emerging architecture or estate by following the chain of tickets. Once such a ticket appears, this introduces an architectural dependency and over time, more and more tickets appears which fill the space between things that architecture lives in.

Walking the Floor

Architects live between things. As architects we should aim to get to every stand-up within our sphere. When we do, we should look out for these cross team tickets, or features being played which we are aware touch something else or perhaps could leverage a feature elsewhere in our estate. 

Imagine we go to the stand-ups and see the following boards after, iteration/sprint 10. 


Sprint 10 board states



Sprint 11 sees a purpose ticket appear on team board 1, required by team board 2 (who are pulling it). The purple ticket represents a cross-team 'pull' (indicated in team 2's 'X' ticket for convenience) since it requires a feature from team 2. The dotted line now represents the relationship between the two teams, which remember by Conway's law represents a link between the system or business features:

Sprint 11 emerges an architectural link

This continues to be played and supposing by the end of Sprint 11 it and its compatriot pull are both done. This ticket is now complete and the architectural link has been delivered. So the architecture of the estate now looks like the following (Arrowed lines represent a dependency):

Sprint 11 Architecture - Note, no Team 4 features live, so architect doesn't care

Sprint 12 and two purple tickets appear. One from team 3 to 1 and one 4 to 2. The solid line represents the delivered link from Sprint 11:

Sprint 12 Architecture
When this is delivered, the architecture then looks like:

Architecture after Sprint 12
And so the cycle continues. 

Conclusion

Each one of the dotted lines represents a communication flow between two teams and brings them closer for that phase of the development. When it's done, the link still exists in the system (solid line), but they can separate and form relationships with other teams, as team 1 did with team 3, after team 1 finished with team 2. Of course, there are times when the platform will need to delete those links, removing them from the estate, but the communication still happens regardless. 

As an architect, attending the stand-ups is a great way to see how the system evolves. You can walk the boards looking for these special cards and draw out the architecture as you proceed. Indeed, if you're an active contributor (and you should be) you can and should facilitate the teams to make decisions which are systemically optimal and forming links when the team could otherwise expend a lot of effort would save them and the company time and money. As the organisation matures and the role of software architect embeds in the teams (as roles over job descriptions), it is these individuals who will go to other team stand-ups and hence, facilitate the creation, updating or destruction of these architectural links.

Wednesday, 10 September 2014

Going from Acceptance Tests to Code

When working with a dev hat on, I keep getting overruled in discussions by programmers who either don't seem to get what I am saying or don't seem to want to get what I am saying. One theme that comes up time and time again is the issue of acceptance testing and how to test. There is often the view that it is the QA's/BA's jobs to specify the acceptance test and that unit testing is the preserve of the developer, who only gets them when the QA's have finished, or just as bad, develops their code in parallel.

Now, this obviously creates the very silos that agilists always claim were 'bad' in traditional methods. I am not a great believer in 'owning' code and also believe that a generalist skill-set is better than a specialist one, as otherwise you get blockers when people are off on annual leave or sick, or you get massive contention on the times of these people. So this is yet another thing that narks me.

Hence, in the spirit of breaking these barriers, I am going to spend this blog post writing code.
In order to do that, we need a story card. So, let's have the following:

"As a tutor, I want to be able to recall the marks of my top 3 students at the end of the year to be able put them forward for end-of-year awards."

Simple enough. So using SpecFlow, MSTest, C# and SQL Server Express Edition, how do I turn this into code?

There are many ways to make this happen, including starting with a walking skeleton and pushing the acceptance criteria down through the layers until it exists in the database or by elaborating on the criteria enough to develop the example using design by contract.

Acceptance Criteria 

There are many ways to make this happen, but working with the tutor and using agile methods, we pick this ticket up, elaborate on it with the tutor and we can come up with something fairly reasonable using Gherkin syntax which represents this.

My personal favourite is using specification by example. This allows the dev, QA and BA to engage the product owner/customer in a role play session, defining at each stage an example, say, message passing, website form, document, customer service agent etc. that can tease out example scenarios with example data for each feature developers are being asked to deliver.

For example, the end result of this after interacting with this tutor may be:


Feature: TopFlight Students
 As a teacher, 
 In order to find the top 3 performing students,
 I need to retrieve the average student marks for the year 
 And pick the top 3

Scenario: Pick the top 3 performing students by average score
Given I have the following student marks:
 | ID | Surname | Forename | Score |
 | 1  | Joe     | Bloggs   | 55    |
 | 1  | Joe     | Bloggs   | 73    |
 | 2  | Freb    | Barnes   | 61    |
 | 3  | Jane    | Jonas    | 83    |
 | 4  | James   | Jonas    | 85    |
When I press retrieve
Then the result should be as follows:
 | ID | Surname | Forename | AverageScore |
 | 4  | James   | Jonas    | 85           |
 | 3  | Jane    | Jonas    | 83           |
 | 1  | Joe     | Bloggs   | 64           |





Having established the acceptance criteria here, we can develop the steps and use stub objects to return the expected values, which become something like the following skeletal SpecFlow steps file which tests the TopFlightEngine static class:



using System;
using System.Collections.Generic;
using TechTalk.SpecFlow;
using TutorMarks.TopFlight;
using Microsoft.VisualStudio.TestTools.UnitTesting;
 
namespace TutorMarks.TopFlight.Test.Feature
{
    [Binding]
    public class TopFlightStudents
    {
        // Added the student scores
        private IList<StudentRecord> studentRecords;
 
        [Given(@"I have the following student marks:")]
        public void GivenIHaveTheFollowingStudentMarks(
                IList<StudentRecord> studentScores
            )
        {
            studentRecords = studentScores;
        }
 
        [When(@"I press retrieve")]
        public void WhenIPressRetrieve()
        {
            // Press retrieve
        }
 
        [Then(@"the result should be as follows:")]
        public void ThenTheResultShouldBeAsFollows(
                IList<StudentRecord> expectedTopFlightScores
            )
        {
            IList<StudentRecord> actualResults = TopFlightEngine.RetrieveTopFlightStudents( 
                    studentRecords 
                );
 
            Assert.AreEqual( expectedTopFlightScores.Count, 3, 
                @"The expected number of results were not returned." );
 
            forint index = 0; index < actualResults.Count; index++ )
            {
                AssertPropertyEquality("ID", index, 
                    expectedTopFlightScores[ index ].Id, 
                    actualResults[ index ].Id);
 
                AssertPropertyEquality("Surname", index, 
                    expectedTopFlightScores[ index ].Surname, 
                    actualResults[ index ].Surname);
 
                AssertPropertyEquality("Forename", index, 
                    expectedTopFlightScores[ index ].Forename, 
                    actualResults[ index ].Forename);
 
                AssertPropertyEquality("Score", index, 
                    expectedTopFlightScores[ index ].Score, 
                    actualResults[ index ].Score);
            }
        }
 
        private static void AssertPropertyEquality(
            string fieldName, 
            int index, 
            object expectedElement, 
            object actualElement)
        {
            string CONST_DIFFERENT_RESULTS = 
                @"The property {0} for record {1} is unexpected. Expected {2}, Actual {3}";
 
            Assert.AreEqual(
                expectedElement,
                actualElement,
                String.Format(
                    CONST_DIFFERENT_RESULTS, new object[]
                            {
                                fieldName,
                                index,
                                expectedElement, 
                                actualElement
                            }
                    )
                );
        }
 
        [StepArgumentTransformation]
        private IList<StudentRecord> MapTableToStudentRecords(
                Table source
            )
        {
            List<StudentRecord> result = new List<StudentRecord>();
 
            foreach (TableRow row in source.Rows)
            {
                result.Add(new StudentRecord()
                {
                    Id = int.Parse(row["ID"]),
                    Surname = row["Surname"],
                    Forename = row["Forename"],
                    Score = float.Parse(row["Score"])
                });
            }
 
            return result;
        }
    }
}



Now, remember, for the sake of the illustration and learning why arbitrary test criteria in TDD is a bad thing, look at the bigger picture.

After eventually making the tests go green, the following can be seen (I should have been a poet):

namespace TutorMarks.TopFlight
{
    public class TopFlightEngine
    {
        public static IList<StudentRecord> RetrieveTopFlightStudents(IList<StudentRecord> studentRecords)
        {
            return new List<StudentRecord>
            {
                new StudentRecord() { Id = 4, Surname = "James", Forename = "Jonas", Score = 85 },
                new StudentRecord() { Id = 3, Surname = "Jane", Forename = "Jonas", Score = 83 },
                new StudentRecord() { Id = 1, Surname = "Joe", Forename = "Bloggs", Score = 64 }
            };
        }
    }
    // ... Located in another file
    public class StudentRecord
    {
        public int Id { getset; }
 
        public string Surname { getset; }
 
        public string Forename { getset; }
 
        public float Score { getset; }
    }
}


What pertinent things do you notice? Correct! It only returns the EXACT averages as the tutor expect to see them. This is your 'dumb' wireframe/pretotype. One thing it is NOT is a walking skeleton, as that implies a piece of functionality that manifests through all the connected components of an architecture as a tiny implementation (basically, not actually having any substance to it. Akin to testing with arbitrary data and making sure "...the web bone's connected to the service bone. The service bone's connected to the data bone...", jehee... see what I did there?). This precedes even that! It allows you to get feedback quickly to yourself and builds from the acceptance criteria to the code from the very beginning of a project.

This SAME  pretotype can then be elaborated even further with the tutor by adding more example scenarios. For example, it can be established that the 'mean average' (as opposed to modal or median) is the calculation that brings about the expected results.

For each part of the whole (let us call this part a 'unit'), when playing out the scenario with the customer, using these examples, their view would be as follows:
  1. Take the scores for each student (e.g. "Joe Bloggs"), who is identified by a single ID (the number 1)
  2. Add their scores up (55 + 73 = 128), keeping track of the number of scores they have (2 scores)
  3. Divide the total Sum of the scores by the number of scores they have ( 128 / 2 = 64 )
  4. This is your average score (so average score = 64 )
So, can you see what we have here? Correct! You have a series of test scenarios that you can use to substantiate the pretotype, which relate to the ORIGINAL acceptance criteria! As a result, everyone can see how the unit level code delivers the acceptance criteria all the way through the process.

Taking each step in turn and noting that we can then deliver the individual steps by developing units which use the examples in the steps as acceptance criteria. We then go on to deliver a unit testing class which tests for the correct number of results and then the average results etc. etc,

Benefits, Warnings, Tips

One thing that has consistently been a problem in the past is how to align acceptance and unit tests. If you don't have full code coverage at acceptance tests level, you run the risk of allowing development too much leeway in creating examples which are not aligned in other components of your architecture. That said, given acceptance tests tend to be slow in nature, you could trade off some acceptance specifics, that are low value or risk, such as exception cases, for unit test coverage in that domain, since the 'unit' is typically where such exceptions originate.

Developing unit tests back from the acceptance tests with examples usually give you the highest value cases and secondary scenarios, which automatically gives you unit test alignment with the value that the stakeholder wants. Build on those to then fill out the unit with exception tests, say, potentially mocking them out if you need to.


Tip 1: Take care that that acceptance tests and unit tests are 'joined up'.
This should automatically happen, but there are many a case where it doesn't. This is why working back from the acceptance criteria examples is best to do. If the examples are courses grained than you need at the unit level, discuss it with the stakeholder.


Tip 2: Know Your Context and Watch your coverage!
This is a contentious one. I am of the view that things should be covered 100% in some form. Acceptance tests really help cover at least 80% of the value. However, there are often edge cases and bugs which come about which resulted from unforeseen scenarios. Adding a test for the bug is great, as this fills the gaps, but be aware that it's possible to have acceptance tests cover 80% say, unit tests to cover 100% of the code, but you still have integration problems which result from the 'missing' 20% acceptance tests or bugs resulting from data you/the stakeholder hadn't thought of. This is why integration tests came about, but if you have 100% acceptance test coverage, you don't need integration tests at all, because that's already in the acceptance tests anyway.

This won't always be possible. For example, interfacing with some cloud providers. So know the context of the system and work with that to deliver right up to the cloud service boundary.

Also, don't forget that overusing mocks is a bad thing and unit testing just the edges around the value covered by acceptance tests (if they're not 100%) doesn't prove anything credible either, since you can't isolate an error in acceptance tests by the unit tests that way. I've illustrated the problem areas below, since they will need special attention. Perhaps another discussion with the business owner. Note, green includes unit test coverage and has been removed for clarity.

With mocks. Acceptance test in green, unit test coverage. Red indicates potential areas for bugs, so pay special attention to them.


Without mocks (or only at the extremeties) Again pay particular attention to the red areas.

One of the difficulties is that the red areas are points where the developers don't necessarily have enough information to go on. i.e. they can't set up their tests without potentially fabricating some tests data and the overall behaviour of the system may or may not be consistent with the information used in the other red area(s). Hence, your test suite has the potential to use different entities, with partial overlaps in fields, to get different results in these two different areas of the system.

So make sure that a combination of examples you use makes sense. This might necessitate checking what's in the tests already to make sure you don't create an absurd scenario, such as using random credit card digits for a number in a section of the site you're not interested in unit tests, only for someone else to develop a Luhn validation algorithm and it breaks in all these cases. Not nice to leave cleaning up that sort of mess to your colleagues!

Thursday, 28 August 2014

Drawback of Shared Service: Part 2, Improving on Shared Services

A month or so ago I wrote about some of the biggest drawbacks of shared services in today's market. There are folk out there who make their living delivering these shared services, so I was approached and asked why I felt such a need to denigrate them. That wasn't really the point of the blog and perhaps I could have phrased the somewhat 'tabloid' headline better, especially when I cited them as Evil (which in an accelerated delivery sense, they are but are so nice to reason with, even in purely SOA circles). However, it also became clear that mapping the flow of work through  business hasn't really been done by some in that camp before and hence they didn't have visibility of the actual flow of work through the system.

Kanban and visual management really helps make explicit the work that is goign on. Plus, shared services are only evil in agile and optimised worlds, where the fitness of a company is ingrained in a company's need to adapt and deliver at a fast pace. They form bottlenecks and hence constraints in traditional systems, even if they themselves deliver things quickly (i.e. they are suboptimal). The focus of the last blog was on these systemic issues, not on the individual services and I assumed that the shared services were in themselves optimised. Don't forget, constraints are a natural and expected concept in Systems Thinking and indeed, form a critical concept in the Theory of Constrains. When systemic problems they bring are solved, they are something I and anyone else working in business optimisation positions should be aware of and then look around to see where the constraint has moved to, because they will.

Revisiting Shared Services

Shared services are a stand alone service in a company. They may or may not have budgetary and reporting functions, may have all the elements to deliver a service end-to-end in their arena or perhaps most optimally, deliver business features to other services. They are often characterised as having one accounting cost centre, even if they cut across multiple skill-sets.

For example, in one company, an IT shared service may not have budgetary and HR control consist of:

  • Distinct Tech Support teams - With management
  • Testing teams - With management
  • Software Development teams - With management
  • System Operations/Technical/Network services teams - With management
  • Overall departmental/service management


In another company, their IT service may consist of:
  • Distinct Tech Support teams - With management
  • Software Development teams consisting of DevOps, BAs, QAs/testers - With team leads
  • System Operations/Technical/Network services teams - With management
And management of each team has budgetary responsibility etc.

And indeed, it could be fairly mature and have teams that delivers end-to-end ad support the application.

Getting teams to be more effective inside the bounds of a shared service is a noble goal, but the problem is the constraint then shifts to being a systemic problem, which is what I illustrated last time.

Is Optimising Shared Services Useless?

Not really. If you are moving from a traditionally hierarchical organisation to a flatter, leaner more agile one, it can be a very useful first step and indeed, almost always is. Just getting everyone in the same team who is responsible for delivery in that shared service ad visualising the work they are each contended for is an incredibly useful way to see how they are being pulled from pillar to post. 

However, further down the line, this ceases to yield any significant improvements in value delivery, simply because of the contention on the service as a whole.

If I had to provide my top-4 tips on how to transition from traditional shared services to lean, multifunctional teams, they would have to be:

1. Pick a Stakeholder's Departmental Concern 

Start with the needs of a stakeholder and map the flow of their end-to-end tasks through the entire organisation, noting the departments and functions it touches. 

This often manifests as perhaps a customer entity which starts as a form on a web page, then becomes a record in the DB, then becomes a task for an engineer to come out and do and a conversation that a customer has with a call centre representative when they register an account once the work is completed etc.

To illustrate this through a well known medium, in IT, this can often manifest as the ALM process, for example:

Mapping a sample software ALM to departmental functions

Once you have that list of departments for each individual journey, get everyone in that journey into one team. That way they are all aligned to that one value chain. Note the responsibilit numbers above and the team members below:

Collating members of the value chain into one team

2. Map & Optimise Implicit factors & Visualise EVERYTHING!

These are often 'invisible' supporting functions, such as internal technical support, network services, software licensing, recruitment of team members, capital expenditure for servers etc. These have an impact on the performance of the team, especially in delivery. 

Perhaps the most famous of these is the move to DevOps from SysOps, especially when capital expenditure for servers has traditionally taken a long period of time. First the server is specified, then it is requested by tech services, finance have to approve it, tech services have to build it, SysOps have to provision it on the network, then the system is deployed on to it before going live. Each of those context switches (which is effectively what it is for the server being switched) takes a significant period of time. 

Changing Capex to OpEx (e.g. by using PAYG Cloud Services) especially those coming in under delegated departmental financial authority (especially with the team accountant now being on board) then removes the need for the finance context switch and authorisation to occur, reducing the amount of items in the finance 'to do' list at the same time. This then means that SysOps/Tech services can provision services without the need to get finance authorisation, which is a significant enough saving, as it in turn reduces the lead time but also, if the SysOps staff members are then brought into the development team, this means the development team can then take the technical parts of some feature from inception to live without having to go outside the team, reducing the number of blockers they can't solve.

This can also be applied to HR, facilities, engineering etc. as long as the value chain make sense and the majority of their individual contribution is to this value chain and not some other.

3. The Value Chain is your alignment!

Overlay the journey in tip 1 onto your value chain. Make sure they match and identify and integrate where they don't. The actual journey is what you deliver, not the slide deck the value chain is present in, so that is your starting point and takes precedence. This gives you an aligned enterprise architecture baseline.

After that, look to transition to what your value chain 'vision' looks like on the slide deck, because I bet they don't match :) If you are lucky enough that they already do, or you've done the transition, make sure you're delivering the best value you can. That means revisiting the value metrics and seeing if there is a way to improve them. Chances are just aligning everyone will deliver improvements in itself, but there is always room for more :) 

For example, delivering faster improves financial metrics such as ROR, IRR and NPV which also improves ROI indirectly. Delivering predictably, reliably and with high quality reduces the need for contingency and some BAU processes. 

4. Munge Carefully, one change at a time!

When absorbing functions into teams, make the changes gradually. As usual, smaller changes are easier to integrate than larger ones ad this goes for people too. Smashing two tribes together only ever causes fights, so it is useful to be mindful of the psychology of folk. Indeed, in a lot of cases, most people take to the idea of being an team's authority really very well, even if they are reticent to leave the department they started in.

Conclusion

Creating Shared services which are fully self contained and are aligned to business value are the first step in what could be a long journey for some companies. It also has a very short shelf life, since they will get split and amalgamated into thinner verticals. As you can see fro the diagrammed example, we didn't map every single type of task each department had to do, just the ones that started a vertical and then overlaid the extraneous tasks over the top through implicit tasks which appeared as we looked at each journey.

There are many more tools and techniques which can be used to decipher the actual value chains, including some that can apply here. However, for brevity, hopefully this provides a reasonable start. Also, look into mapping the systemic flow of tasks, but do so for all tasks in the system. If you are looking for a reasonable primer on Systemic Flows, see Ian Carroll's blog for a really good start and primer in synergistic fluency.

Sunday, 17 August 2014

Q: What Do Agility & Astrophysics Have In Common?

There is an agile coaching game called the static points game. It's an extremely useful illustration of how complexity evolves from really simple rules which in the enterprise world, shows how businesses always change under multiple forces, which should be familiar to those in the change management space. I've played it twice, the first was an introduction by Ash Moran whilst I was at Laterooms.com and more recently with Ian Carroll. It's pretty simply, to play:

  1. Get everyone to stand up and move to the edge of the room (or form a circle if the room is too big) 
  2. Tell each person to pick two other folk from the group 
  3. The simple rule is to stay equidistant (the same distance away) from both of them. 
  4. Then let them go.

What you'll see is the group organise and shift about, jostling as the distances come to an equilibrium, then eventually settle. You can play it again, telling people to keep the original two people,  and see where they settle this time. Chances are they settle differently from where they did previously (a digital camera might and high angle come in handy for this variation of the game :).

Reset the game, and with everyone keeping their two folk, fix any one person from the group where they are, perhaps using a chair and send everyone else back to the edge of the room and play it again. They settle quicker. Do it again with that one person fixed, and photograph. You can keep fixing more and more folk and the organisation comes to settle much quicker, with much less movement.


What does this Illustrate?

As an abstract systems game, it naturally covers a multitude of arena!
  • How departmental level business changes relative to other departments as politics plays a part in how departmental heads compete for work or pass blame. Imagine the people in that game are working in an organisation and trying to balance the needs of two sets of stakeholders.
  • How uncoordinated systems work with one another as they evolve (which I believe is what you think this refers to here). Imagine the people in that game are subsystems taking with interfaces to two other systems.
  •  How uncoordinated work-streams work with one another as they evolve (in agile environments, this is what I think happens with shared systems. They pull against the shared systems). Imagine the people in that game are subsystems communicating with interfaces to two other systems.
  • It shows how complexity can manifest from a really simple rule-set. This one is self-explanatory. Intelligent agents (i.e. people, bees etc.) using a really simple rule can still produce a significant amount of very complex behaviour. This, like all the rest, is called [mathematical] chaos.
  • It shows that relationships are easily equally as important as the entities themselves
  • How organisms relate to one another
  • It is a manifestation of planets being influenced by each other’s gravity
Mathematically speaking, they are ALL a manifestation of what is called the n-body problem. The planetary example that ends that list above is where this originated.

A Lesson in Planetary motion

We orbit the sun because our mass is significantly less than it. The effect we have on the sun is near negligible. This is like a big CEO of a company, keeping things in order by imposing forces upon the lower weighted levels, who can’t respond in any meaningful or significant way. However, the behaviour of the system is predictable and has been like that for billions of years. With the big, overarching, autocratic CEO (or C-Suite) in it, and with the absence of any other influential factors, the environment rarely changes, so there is no need for it to change. That is one sun and several planets and moons and their orbits (statics and dynamics) that systemically stay the same for millions if not billions of years, even if what goes on on the surface changes. As far as systems are concerned, architecturally, these are all static points. That’s a high level block diagram!

However, in agile environments, you are empowering folk, quite rightly. Hence, the gravity they have and are allowed to have, relative to the system, is much higher. However, returning to the n-body system, If you have two equally weighted planets orbiting around each other, they will pull each other’s orbits. If you have three orbiting each other, it has been proven that the behaviour of an unconstrained system, is near unpredictable aside from very restricted contexts (i.e. akin to how often waterfall delivered on time and on budget, which some would argue has the same probability that our solar system came into existence the way it has :) Plus, because the class of problem is the same for all of the above, this chaos or ‘randomness’ applies to all of the above examples too.

OK, how can you test if the result is random?

Firstly, this is a complexity problem. You can't necessarily test the system internals [aka business] as a whole if the result isn't predictable or consistent. However, what you can test, is that the unit which is the individual person, does manifest the rules correctly! i.e. they keep the same distance from their two folk at all points of change, including all points of jostling! After all, most software systems test that rules manifest correctly. For example, you can't open a bank account if you don't have ID or you can't board a plane for an international flight without a passport. Each and every one of the individual entities, as well as the whole in deterministic systems, is defined by: 

  • A pre-condition, which includes the initial state of the system - The position they are in in the room 
  • An action - Someone moves
  • A post-condition - They have to remain equidistant

With an invariant that they have picked two constant people to apply the rule with.

Which in BDD/Gherkin syntax is akin to:

Background each person has two different folk to focus on
##...

Given the person is in the room
When someone moves
Then the person has to be the same distance from their left-person and right-person

Remember, a static point is not just the structure or entity, it is also the behaviour it exhibits. Hence, the best way to make a change and make it testable, with the minimum of risk. is to nail all but one of the folk (read systems, which include the entities and how they behave – that is the aggregate of business, application, data and technology for each feature), make that small change, test they manifest the rules, then release everything, make sure the system didn't disintegrate (i.e. all the other elements correctly adhere to their own rules, which may be the old ones), then for the next change, nail all but a different one, make a change, release etc. The key is to test that the one body itself adheres to the simple rule you want of it! This provides parallels in systems?

For example, if in the static points game above, if we took just one person (which is a business unit or capability), let's call them 'delta' and made the rule that they stay an arms length away from everyone else and can pull the folk together if they are not, then played the game again. They may get jostled about a bit by the rest of the business going about its [old] business, which is still testable, whilst simultaneously pulling the two folk to their arms length distance. Note, through pulling folk, the rest of the system, that means folk who have picked one or other or even both of the people pulled by delta, and their 2nd, 3rd and nth degree of separation, will also change. So this new rule influences the system as a whole, but crucially, the rest of the system adhered to the old 'equidistance' rule, which like the new one, you can still measure individually (as mentioned at the top of this section). You measure that Delta is doing their job by ensuring they keep their two within 'punch distance' at all points of change, i.e. jostling.

Conclusion

Trust me, this is a very difficult concept for some folk to get and it requires some 'micro-thinking'. Indeed, it always provides a learning point in communication to me. Whilst there is almost nothing anybody in the IT world can tell me about non-linear system dynamics, there is a lot I have to learn about communicating the concept across in language people understand. That is why I attend community events, such as TechNights, Lean Agile Manchester etc. it’s to learn to effectively communicate the ideas to people who do not have that bridge or background. I've been through this sort of discussion a couple of dozen (or more) times and I still communicate this wrong now, because there may be levels of knowledge or skill between the person I am trying to communicate this to and the understanding of non-linear dynamics that would help illustrate the benefits of the knowledge. I'd be interested to see how others communicate this to analytical and non-analytical audiences alike.

So the best I can do for now is probably illustrate it with video. In the external links below, take a look and see if the systems look the same as each other after they've been running for a while. Indeed, you can run the YouTube vids side by side if your broadband is up to it.

External Links
n-Body Gravity Simulation like folk at the edge of a room - https://www.youtube.com/watch?v=XAlzniN6L94
n-Body simulation with 50 million entities - https://www.youtube.com/watch?v=OJaE9J39A8s

VIew these two 3 body problems side by side:
https://www.youtube.com/watch?v=VX9IdCnNWJI
http://vimeo.com/11993047 (from 24 seconds in - This also has multiple runs with different starting points)