Thursday, 23 May 2013

Deploying from SVN through Jenkins to AWS Elastic Beanstalk (Part 1)

Following on from the AWS Summit last month, I figured I'd do a technical one this time round. This will fall into two parts as they are fairly detailed. But I am not going to go through to much .NET stuff and shall assume that if you follow my blog, that you can already write .NET web apps. 

Part 1: AWS Toolkit and Deploying to the Cloud

Whilst working on my current project which is going to be deployed on AWS, it became the time to look to build the release deployment process to send it to AWS. I had a choice of deployment mechanisms, including AWS CloudFormation, but for the simplicity of this environment, I plumped for deploying direct to EB.

I am running the following setup:
  • VS2010 SP1
  • Visual SVN (with Tortoise SVN client)
  • Jenkins 1.514

Having had a little bit of experience using Go, Jenkins and TFS, I chose Jenkins for the deployment and decided to deploy to AWS Elastic Beanstalk (EB). I will assume you know how to set up and use Jenkins and have an AWS account.

What's this Beanstalk?

Elastic Beanstalk is an autoscaling platform to deploy applications into the AWS cloud. It is built on top of the EC2 PaaS setup and allows you to deploy your software, without the headache of maintaining your own Elastic Load Balancer (ELB), EC2 servers and Relational Data Services (RDS) servers etc.

How many cows do I need to sell for the magic beans to use it with Visual Studio?

None, the AWS toolkit is free. If you use the t1.micro environments for everything, you also have 750 hours of AWS use per month for the EC2 and RDS instances run by EB for free. 750 hours is almost enough to run for an entire month for nothing. Note, the minimum charging unit is per hour. Details of the free usage tier are here:


This lot is way above what Microsoft Azure offers customers. Not surprising given how much more mature Amazon's platform is compared to other platforms.

One thing I would advise is you consider how your account will actually use this. If you start on the free usage tier and then it kicks up a second instance, you may end up paying for the computing time of that other instance (or use your monthly free usage up faster). There are solutions around this, such as the use of reserved instances for servers that are running continuously; or setting the ELB to not scale past 1 system etc. but I won't be covering them here.

Installing the AWS for MSFT VS toolkit

In order to use the AWS platform with Visual Studio, you have to download and install the AWS Toolkit for Visual Studio. The installation will provide a combination of Visual Studio Add-ins and Command Line Interface (CLI), including the AWS Explorer shown below.

fig 1 - AWS Explorer in VS2010
It also installs context menus to build the deployment package. However, as part of the Jenkins CI process, we will be deploying partly through the CLI.

Security and Using AWS Access Keys

In order to deploy to AWS, there are a number of security steps you have to use to get passed the security of the AWS cloud.
  1. Sign up for an AWS account (I shall assume you've done that)
  2. Using the web based account management portion of the site, obtain the secret key and Access Key ID for the platform (in this case EB uses REST/Query API) - You will need these for the automated deployment.
  3. Sign in to the AWS Management Console and set up your server as authorised to deploy to the website to AWS.
The secret key and access key are used to permit the uploading of packages to the cloud. So yes, that means you need to create a package :-) But here is where the AWS tool-kit's CLI tools actually comes in. 

fig 2 - Secret and Access Key IDs

What does AWS Explorer in VS have to do with this?

The AWS Explorer allows you to deploy applications straight from VS2010/12. It also contains the project templates which you need to use to deploy different types of project on the AWS platform. For this demo, I used the "AWS Web Project" selection, which is pretty much like a normal MS Web Project, but contains the AWSSDK.dll file. 

In the image below, you can see 5 sub-items and these are almost all sample applications to help you learn how to use AWS, or to use as a template for AWS development of your own versions of these class of project. 

fig 3 - AWS Toolkit installs new project types
So select the AWS Web Project option and build a web project like you normally would any other .NET application. In my experience, there is nothing an ASP.NET/WCF app has been able to do that it hasn't.  

Let's assume I've done that, what next?

Your easiest (maybe best) bet is to first deploy the application you have created straight from VS2010. This fulfils a number of functions:
  • Allows you to create a deployment file which AWSDeploy.exe uses to deploy your application to EB
  • Tests your CI server (or even PC) can connect to the AWS environment
  • Makes sure your access keys are correct
fig 4 - Items added to the project context menu after AWS Toolkit is installed
After installing the AWS toolkit, you'll note that unlike VS2010, the context menu elements to deploy a project are only available at the project level, not the solution level. That means you can really only deploy one project at a time from here. I've highlighted the differences in the image above. Republishing is the just publishing step without having to enter the publishing settings again.

Publishing to the cloud eh?

Configuring the publishing process from inside VS2010 is actually pretty simple. It is a matter of using the following dialog box.

fig 5 - VS2010's AWS Deploy dialog box
The steps are pretty straightforward.

  • Click the 'Add [another] account' icon next to the accounts dropdown, or select an existing  account to use for deployment. In the popup that appears, this is where the Secret Key and and Access Key ID are used from previous stages. When you have completed this box click OK.
    fig 6 - Creating a new AWS account record (your existing AWS account)
  • Selecting the region to deploy to. In this case, I am using the EU West region which is based in Ireland.
  • For new deployments, select the "AWS Elastic Beanstalk" item form the list box. It is far too easy to attempt to click on 'Next' without doing this step, especially when it appears to be lit. So make sure it is selected. Click next and you are taken to the Application details screen.
fig 7 - Application details screen
  • What this screen does is allow you to specify the name of the application you wish to deploy to EB. Take a note of the bottom checkbox. This shows that the AWS toolkit set up a local Git repository to then push changes to AWS with. 
  • Clicking 'Next' then takes you to the Environment details screen. Here you should create the name of the entire EB environment. 
fig 8 - EB environment details



Important Note:  
This EB environment name is not the same as the names of any EC2 instance, any RDS instance or anything else within that EB environment. The EB environment automatically creates the names of the internal EC2, S3. RDS etc. instances whe it creates the environment. For example, the EB environment could be called AProjectSample1234 and within that, we may have an EC2 instance which it names axjkljehyhsdsds-1 and an RDS instance which it names abbdhjwiukskjkmn-1. Remember, you are deploying to an IaaS environment which is designed to autoscale. You will have automatically been assigned an ELB and at least one EC2 instance, which can scale if your processing needs scale. Hence, the underlying PaaS mechanism is transparent to you, though you can examine the instances from the usual AWS Management Console of the AWS Explorer in VS.


  • Select a URL to use as the main environment. Note, this URL sub-domain is linked to your ELB not your EC2 instance, which is what you configure on the next screen in the Wizard.
  • The AWS Options screen is where you configure your Amazon Machine Image (AMI). A container is the image that you use to contain your application. In this case, a web app. Given it is Windows, you have the choice of 64-bit Win2012 or 2008 R2 OS platforms, with IIS8 or 7.5 respectively. The free tier is the Micro tier (which corresponds to the t1.micro tier in AWS). If you have configured an AMI if your own in Amazon's cloud, you can select it here too. 

fig 9 - Selecting the EC2 config
  • The next screen after that is the Application Options screen. This is where you can configure the .NET Framework version the app pool uses; a health-check page (think heartbeat monitor in IIS for ASP.NET) as well as the security credentials to log in with (I definitely recommend you get some :-)
fig 10 - Setting up the type of container.
  • If you have an RDS instance associated with your project, the screen in fig 11 allows you to configure the security groups associated with your RDS instance. Note that to communicate between the EC2 and RDS instances, they both have to exist in the same security group. This is to save you having to deal with Windows Authentication, especially if the servers are going to scale and the load balanced, since the same authentication accounts would not necessarily be available. If you have no RDS servers, then the checkboxes can be left alone and you can click 'Next'. You can always change this later and add an RDS instance from the AWS Management console > EB or AWS Explorer in VS.
fig 11 - Security group config for RDS
  • As far as we are concerned for this blog post, all those steps were really to get to the following review screen. The review screen has a critical checkbox at the bottom with the label "Generate AWSDeploy configuration". For the Jenkins deployment, this is the key screen. You need to check the box and enter the location of where you want to save your AWS configuration file. I definitely recommend you generate it in your SVN workspace, since you will have to commit it to SVN, for Jenkins to then check out into its workspace and subsequently run AWS with this configuration. 
fig 12 - AWS configuration options highlighted in the review wizard panel

When you're done, hit the 'deploy' button. The wizard will then attempt to deploy your application to AWS EB, using the supplied account credentials and .NET framework onto the elasticbeanstalk.com sub-domain you specified.  This takes some time, but with any luck, you should find it deploys successfully. If you go to the sub-domain you selected in a few minutes (for example http://AProjectSample1234.elasticbeanstalk.com), you should have your site at that address. You can monitor the site through the AWS management console or through the AWS Explorer in Visual Studio.

What's in the AWS Deploy configuration  file?

When the file has been created, opening it shows us the following structure (yes I have changed my keys, you don't think I am showing you that do you? :-)

# For detailed explanation of how these config files should be used and created please see the developer guide here:
# http://docs.amazonwebservices.com/AWSToolkitVS/latest/UserGuide/tkv-deployment-tool.html

# Edit the parameter line below to set the path to the deployment archive or use
# /DDeploymentPackage=value
# on the awsdeploy.exe command line for more flexibility.
# DeploymentPackage = <-- path to web deployment archive -->

# Instead of embedding the AWSAccessKey and AWSSecretKey to be used to deploy
# artifacts we recommend that you consider using the /DAWSAccessKey and
# /DAWSSecretKey command line parameter overrides.

AWSAccessKey = XXXXXXXXXX11223XXXXX
AWSSecretKey = alkHd1PO3lakjsdka/asdlas/1+djasdjl/zu/lkjaslkdas
Region = eu-west-1
SolutionStack = 64bit Windows Server 2008 R2 running IIS 7.5
Template = ElasticBeanstalk

aws:elasticbeanstalk:application:environment.AWS_ACCESS_KEY_ID = XXXXXXXXXX11223XXXXX
aws:elasticbeanstalk:application:environment.AWS_SECRET_KEY = alkHd1PO3lakjsdka/asdlas/1+djasdjl/zu/lkjaslkdas

Application.Description = Sandbox
Application.Name = AProject

Container.Enable32BitApplications = False
Container.InstanceType = t1.micro
Container.TargetRuntime = 4.0

Environment.CNAME = AProject
Environment.Description = Sandbox environment
Environment.Name = AProject

All the details you specified in the wizard are stored here. This file is passed to AWSDeploy.exe on the command line or in our case, through Jenkins. The comments give you some very useful tips. Decisions on the keys will be especially important. If you are deploying to different environments or platforms on client's private clouds for example (over VPC for example), you wouldn't want to use your dev keys to do that. So you would pass them in on the command line, but always take care to store them appropriately (including thinking about an encryption step in the build process).

What next?

The next blog will show you how how to link this to Jenkins and built and deployed the project to AWS. This will focus on configuring the Jenkins job to use MSBuild to build the package and AWSDeploy to deploy it to the elastic beanstalk URL we've already specified here.

Tuesday, 23 April 2013

AWS Summit 2013 - London

AWS Cloud Summit


At the beginning of the day I fell out of my bed into the cloud summit. Seeing as I was less than a stone's throw away, it was one of the easiest journeys to a conference I had managed.

We registered with our bar-codes from the original e-mail, though this proved to be somewhat of a problem technically, since the bar codes didn't seem to appear on everyone's e-mail. this caused a bit of a technical glitch which delayed some poor souls, but mine worked fine so I was in fairly uneventfully.

The geographical layout was available a day prior to the summit via the Guidebook app, which I have to admit has been one of the most solid platforms that I have encountered for this sort of thing and beat the Azure backed Eventboard app from last year hands down. AWS obviously put some work in to get this right. 

What was interesting about the layout was the location of a labs sections near the entrance, with the keynote area cordoned off with high cloth walls which hid a stage where Werner Vogels, CTO of AWS would introduce the summit. The labs areas where we got to try out some of the infrastructure, including the just released (as of today) S3 storage and RedShift storage platforms to the European market. This area also contained large yellow beanbags for the more bohemian amongst us to use to sit, write, blog, experiment etc. I didn't try them out because knowing me, I'd not be able to get back up again because of a) Sleeping or b) Imitating a turtle on it's back.

The infamous beanbags! My nemesis :-)

Upstairs, near the black cloaked keynote area, sponsoring organisations and the supporting acts were plying their trade and demonstrating their wares. in the centre of that area was the AWS exhibit itself, manned by a number of architects of the platform, whoc were showcasing some of their logical reference architectures. I had a useful conversation with Andreas about availability, since this is an area I am interested in and have blogged about before. I pointed to various potential single points of failure and and Andreas managed to explain to me the points of availability and how they have solved some of their high availability problems in the reference architecture, which I could read up on and use as case studies. All very interesting stuff.

The chaps make last minute checks for our arrival

I was also lured, stomach first, to the Smart421 exhibit by the temptation of boxes of Smarties stacked pyramid-like on their desk. What was interesting about this chat was that they reminded me that unlike the Azure platform, Amazon has a very robust mechanism for having PCI-DSS compliance. After all, they do this every day for their own business. Indeed, Stephen Schmidt went on to introduce the security model under AWS and in particular, the very inspiring and competent VPC (Virtual Private Cloud) offering which uses CloudHSM as a hardware security management option where the client holds the keys and not Amazon. Indeed should anything happen to the physical media, such as tampering or accidental damage, the keys are wiped immediately. This was something that I heard hide-nor-hare of at the Azure conference and PCI-DSS, to the best of my knowledge hasn't been adequately addressed to the same level by Microsoft and certainly isn't as mature. 

A little behind schedule, Dr Werner Vogels opened the summit with a comprehensive (read long) keynote on the AWS now and in the future, including some of the new services that have come about recently (53 in the first quarter of 2013 alone). He introduced a number of customers who have used the AWS platform, including a number of very large names in the UK, such as Shell and as well as smaller start-ups such as Shutl.

Big Werner Vogals on a very big stage

Werner outlined the strategy for the company and gave us little insights into the internal workings of Amazon. In particular, how Amazon believes in and practises lean principles at their organisation. The insight into how some of the cloud based operations at other organisations seemed to, in part, back up some of the stereotypes associated with the governance of data, such as don't put most sensitive data online. This didn't seem to be a concern at all points and was certainly not one imposed due to any technical limitation, but it did show that some organisations do still have some concerns surrounding the non-colocation of their data.

Steve Schmidt then took over and introduced a number of key information security objectives. He had a lot to say and there was a lot of detail about the AWS security processes and policies. Very impressive as it happens but given they have DoD clients, this is hardly surprising. One of the implicit things that both Steve and Werner introduced without 'saying' in both their own and case study examples was the need for good governance. Indeed, shell were very explicit about their data governance and as such, cloud should never be considered something that should allow your governance processes to become lax. Indeed, the opposite is true in cloud environments. An important lesson there, especially in VPC environments where the clients manage their own keys.

After lunch, which was very popular (So I started to write this blog post until the queue died down), the breakout sessions began. I stuck myself on the bootstrap sessions mostly, but wanted to head into the Architecting for High Availability thread. My phone's battery was being a bit 'negative' and I worried that I would run out of juice from taking all the pictures that I was taking and sure enough, by the time it got to the HA presentation by Ianni Vanvadelis, I had no more juice to take any pics of relevant slides. As it happens, for the customer case study, presented by Dan Richardson, Director of Engineering at Just-Eat, the slides were not there anyway. It was unfortunate, but he actually did a brilliant job in winging it and still managed to get his message across.

Prior to that however, I attended the Bootstrap tracks of "Your first week with Amazon EC2", "Agility and Cost Savings: Achieving the IT 'X-factor'" and the Keynote tracks of AWS OpsWorks, presenting a deep dive with the OpsWorks environment by Thomas Metschke, which was a very technical example of how to combine Chef, Ruby, Git and Jenkins with OpsWorks to deploy to different CM configs. OpsWorks is an excellent platform for this, I have to say.

Thomas takes us through an OpsWorks deep dive. I really like this tbh!

I went to the X-factor track and my phone died completely. It was tough going from that point onwards, but obviously means I should have known how long my battery would last. However, not being big on photography (with a face like mine, it isn't something that I do much of ;-). Comparing the costings of cloud versus on premise platforms is something I tend to do a lot of anyway, so there wasn't really much new there. However, there is a TCO tool that has been created by AWS for just this purpose. Also, the different costing models were introduced and made very clear. What was interesting was the use of 'Spot pricing'  models, which I finally figured out a use for, especially in non-mission critical/off-line work. Reserved instancing is also something I am going to be looking more into. However, it is good to know that I value things the same way AWS do. Having grown up in a house full of economists and financiers hasn't gone to waste ;-)

AWS Pricing Models presented to us by Dan Roger

Dan shows us the break even with the On-Demand services

What was interesting about what Dan Roger told us about was what he didn't tell us. In that session he drew   comparison of On-Demand against reserved instances at various levels. you can see the slide above, but what he also shows is the break even with Heavy, Medium and Light Utilisation Reserved instances with each other. In this example, at 8 months it makes no difference which option you choose. This is on top of the very obvious comparison with 1 or 2 month on-demand services.

I spent a lot of time getting a lot of valuable information from the AWS guys playing piggy in the middle with the vendor space about their platforms. In particular, the AWS reference architectures will provide a lot of very useful information, not necessarily in how I would do it, but in providing patterns of best practise from which to work.

As an architect, I have certainly not had anywhere near as valuable a day as I've had today. So I am quite a happy bunny. Plus, having played around with AWS EC2 already, I am much happier than when I was playing around with Azure. There are a couple of blog posts that I can see spinning off this, especially in the area of highly available systems.

All-in-all, a very productive day :-)

Saturday, 9 February 2013

Agile software development under TOGAF

Phases G & H: the Architecture Governance Iteration

In a blog post last year, I covered how Agile and TOGAF are perfectly compatible and outlined the similarities that they have.

The title of the post won't resonate much with agilists out there, as the term 'governance' is something that most think of as 'management bureaucracy'. However, the truth of the matter is that if you look at Kent Beck's seminal work on XP and consider the principles and practises section, you'll see that these are in effect, principles and governance guidelines.What I mean by this is probably best illustrated via an example.

Imagine you are an agile QA. In order to adjudge whether or not a piece of software has been developed solidly, agile QA's often carry out the role of checking code coverage and complexity measures and monitor that coverage against metrics defined in their testing strategy. This can be automated and they instantly see a governance measure that they can check against what are effectively their organisational unit's (software development department, say) principles of TDD/BDD etc.

For software developers and architects out there, remember, TOGAF is a much broader and more abstract framework to develop enterprises under (I steer clear of just applying it to software, since it isn't technically a software development methodology. It is an EA framework, or systems development methodology if you look at it through those particular eyes). It covers business architecture, data architecture and technical architecture as well as application architecture and at the highest level, it does this without recourse to any technological concepts at all in those stages (so there are no explicit, detailed references to Visual Studio 2012, SQL Server 2012, MySql, PHP etc.). The aim is to consider the organisation as a whole as a system.

Statics and Dynamics

Any system, in whatever form, only require two high-level concepts to fully define it's operation. These are:

  • Statics - which in the case of business are things like the structures, entities, artefacts, roles, people, systems etc. etc. 
  • Dynamics - How these static elements interact and indeed how the organisation as a whole behaves.
In the world of software, we have seem this many times. For example, the GoF books were split into patterns of structure and those of behaviour and we also have languages to express those interactions, such as UML class and collaboration diagrams. These days, systems thinking is finally making inroads into the software world, despite those of us already familiar with the concepts having used it for nearly 15 years (some of you out there will have used it longer than I have).

Just like businesses as a whole, you have different levels of abstraction which define the organisation. You can define the system dynamics at varying levels of details and just like you have, say, different levels of UML diagram (use-cases to compartmentalise the operation of a software unit, with activity diagrams which define the functions within it. In themselves, these functions can be another use case diagram's use case, in the case of say, a component or package), the same is true of business. BPMN's ability to run sub tasks is another example of this. However, I don't want to get hung up on the language of expressing a business, as this is different from the business itself.

fig 1 - TOGAF ADM with different sub iterations (from Mike Walker's blog on MSDN, copyright The Open Group)
And just like a business, TOGAF allows different levels of EA for each level of the architecture partition or indeed organisational unit.

fig 2 - TOGAF ADM applied at different levels of partitioned architecture (copyright The Open Group)

How does this fit?

Well, a business is a system. Software is a system. Software development is a system. Everything in this context is just a system. Apologies to people who really hold on to the psychological element of agile methods, but you are being kept a happy cog in a big wheel :-) This is not to say it shouldn't happen, but you are there to develop software at the end of the day. Indeed, remember, you and your team use elements like Kanban to 'manufacture' software just like a they do in a factory.

AGILIST: "OK, stop flaming us! Really, how does it fit?"

Joking aside, during a software development project, you will often use BDD/TDD to develop acceptance criteria and the amount of coverage in these scenarios is your governance metric. Additionally, some agile teams are aware of risks and issues that they have to monitor. They also use metrics to define 'productivity' such as throughput and cycle time They also run retrospectives to continuously improve. Software dev peeps should pay particular attention to phase G in TOGAF gives you all that when defining a system representing your organisation. Software, Business, Data and Tech Architects may be involved in earlier phases, maybe even A to E, depending on the model of TOGAF used at each level of architecture partition. But for devs, phase G of  the ADM gives you:
  • Project name, description and objectives [Project name/Description/Epics]
  • Scope, deliverables and constraints
  • Efficacy metrics [Throughput/Cycle time/Business Value]
  • Acceptance criteria [BDD]
  • Risks and issues [Risk Log]
I have linked the Agile concepts in square brackets. As you can see, we have everything required for the initiation of an agile development project right there. 

Remember, when going through an iterative enterprise architecture process such as TOGAF, there is no BDUF exercise. Each transition goes through an iteration of the ADM, which creates an incremental change to the organisational system as a whole. Within each iteration, at phase B, businesses are architected or rearchitected and IT systems are developed to facilitate that business operation. Same as always. 

Some developers see themselves as coming into play quite late in the process (Phases G and H - last n the cycle). It seems to them that there is a BDUF exercise going on, but really the business is defining itself at the beginning of the cycle. After all, you can't have acceptance criteria defined by the business if there is no business :-)

By contrast, when you get a BDD spec for an existing, well understood business, the business system (with it's statics and dynamics) is already defined for the business owners. They have lived it and worked it for years. The business owner knows what they want and work with the 'three headed monster' (BA, QA, dev) to spec that out in Gherkin for example. In this case, there is no need to reengineer the business to then submit acceptance criteria to the devs. In TOGAF, it is exactly like going through the Architecture Definition Iteration (cycles of ADM phases B to F) with the existing formal baselines already in place, but having no EA change requirement. So you can get through those in very little time and go straight to phase G, which is where these BDD criteria form the acceptance criteria for the IT systems (I appreciate using TOGAF here is overkill, but the point is about illustrating the pattern).

Remember this...

Whilst TOGAF is an EA framework and, say, SCRUM is an Agile software development practise, they both work to define systems. Both are iterative and incremental in nature. They just work at different levels and to some degree, can have very different cycle times. After all, TOGAF would also concern itself with the definition of an organisation's services, people required to carry out enterprise functions within that service, the information (not 'data') they require to do their job, and provide services which may or may not use IT at all (think shop assistants, call centre operatives, cleaners, warehouse staff etc. etc.) and their roles have to be at least understood if not engineered into the business system as a whole. This is something IT personnel take for granted when defining software. Even in RUP they assume actors already exist, but I would say how do they think those actors came to exist? TOGAf is a way of making that happen.

Friday, 11 January 2013

SOLID Understanding

I am actually personally very surprised as to why SOLID principles get explicitly called out for as part of interview processes for both development and architecture roles as if it is a benchmark for capability. What interests me is why this is anything more than the absolute basics of software development. Indeed, I don't even have a prominent place for it in my CV. However, often being both sides of the interview table, I can definitely attest to the difficulty the typical industry candidate at any level has with robust OO principles.

In my anecdotal experience, 85% of developers do not apply SOLID principles correctly, often preferring to  look at shiny toys and tech from other organisations before they get their own house in order. This can often lead to badly used frameworks, poor NFRs etc. However, in my book, this should be so basic it shouldn't even be mentioned on a CV. To an OO programmer, this should be akin to typing. The industry assumes that you work with computers so can type. If you sit in front of me at interview and claim to be an OOP/AD expert, I am going to assume you know the SOLID principles. Indeed, when I test you on your design patterns and your evaluation of what patterns to use when and what anti-patterns you know of or can find, if you don't know SOLID principles it will be very evident. Given 85% of software developers can't do the top end roles, I will be looking for any excuse not to employ you but also, I will be pushing to see how far you can go if you do know them. Hey, it's the way I do things.

So What's SOLID About SOLID Principles?...

Absolutely nothing in my book! SOLID principles have been around for a lot longer than Uncle Bob's coined acronym. I took my leap into the OO style of development in the days of OWL and MFC (the Object Windows Library). In those days we were more concerned with doing the role than shouting about doing the role and we were generally required to be more capable in areas not directly associated with programming. Such as memory management, optimisation etc. which modern developers either don't need to worry about due to abstracting  this away using .NET or Java VMs and dynamic languages or have a different focus on. So to 'know OO' was passé. We would say "So what?"

SOLID stands for a set of 5 principles of good OO-design. They generally follow good system thinking practises anyway. They are:

Single Responsibility

Any class should not have more than one responsibility for one context. For example, a shopping basket class (ShoppingBasket) stores items to purchase from a store. In itself, the shopping basket doesn't do the calculation of the subtotal of the items within it.

You can spot SRP violations by looking for code-smells. For example, the name ShoppingBasketInvoice makes you ask the question, what is it? Is it shopping? A basket? or an invoice? If you see this happen, refactor it!

This principle also applies to the methods within the class. Methods such as:


        public string SumAndDisplaySubtotal()
        {
            //...
        }


The above should first be refactored into Sum() and then Display(decimal value) but this then brings the containing class into focus and makes you ask "Why is this class summing and displaying the information?"

If working code first, you can note they don't belong in the same place and so would extract out the Sum() method into an TillAccumulator class, say, and the Display() into a TillScreen class. Then you can ask the question "Why do I have 'TillAccumulator' class and 'TillScreen' classes?" and so extract out Display and Screen classes into objects which the Till is then composed of.

If working design and contract-first/DbC and think about how the world of your shopping works in real-life.  The Till would Sum the shopping basket items using an accumulating total in an Accumulator and displays this on a Screen. So you can get there that way too.

All these give the benefit that if you need to change something, you can do it in one place, in one context.

Open-Closed 

Classes should be open for extension but closed for modification. This is why we have encapsulation in OO languages. There should be no direct access to the fields within the class and good use of polymorphism and virtual/abstract classes to get what you want out of it (leaving out the discussion of inheritance versus composition, as if devs can't get SOLID, that discussion won't make sense). A non-exhaustive list of ways to make this happen include:

  • Use abstract classes to define the public and protected methods you wish to allow to extensions to.
  • Make everything else private.
  • Definitely don't make all your methods public if you are only ever going to use it internally or in derived classes.

The litmus test is how many classes have to change if you make one change to your programme. This is also the same litmus test you can use for discovering coupling, as the higher the coupling the greater the issue.

Liskov Substitution

In any design, a class should be able to be substitutable for its base class or any derived class from that base class (including derivations of its children ad infinitum). This is a simple one, as it just involves using abstract or base classes (or indeed interfaces) to allow any derivation/implementation of a class/interface to be put in the place of the original abstract entity.

This makes sense mainly because otherwise you violate open-closed (as you have to modify classes inside other classes if you have not injected it) and also cause a lot of work, as you are dealing with very specific cases all the time, exploding the amount of coupling involved. Learning to use encapsulation, inheritance, composition, abstraction and polymorphic techniques will save you a lot of code and time in the long run.

Interface Segregation

This is an interesting one for me. Interfaces are a bit weird. Those without any understanding of interfaces never use them. So this principle never gets applied to interfaces (but can equally well be applied to pure abstract classes in languages that support multiple inheritance). Those who understand interfaces and why Fowler's definition of 'header interfaces' are a bad thing are already most likely using this principle but the ones to watch the most are those who do know what interfaces are and use them, but do not understand header interfaces! They will use interfaces as if they are just abstract classes, which is a huge mistake! Definitely a case of a little knowledge is dangerous.

Unfortunately, the way developers use dependency injection and mocking frameworks tends to encourage (if not forces) them to create these header interfaces and the results can be pointless at best or catastrophic at worst.

Interface segregation aims to classify a class' functionality into role specific 'contract structures' (I refuse to use the word 'contract' as interfaces only solve half the problem, i.e. the statics, but give no indication of the dynamics of the system).

Consider it this way. You are a person, a class if you will. You perform many roles. You are a child, a worker, a parent maybe, a friend to someone (hopefully), a spouse etc. and you interact with the different people in your lives in different ways in those different roles. Each of your faces is the way you communicate inter-personally (See what I did there with the mnemonic? Ha-Haaa! I still got it! :o) ).

So look at your objects in context and determine if you really want to be using that header interface just to mock something and thereby "designing-by-tool" - Constraining the design by what the framework can support, makes you a tool!... (Two from two)

Dependency Inversion

This ties in nicely with Liskov substitution, since you should never depend on the concrete classes you create. So always make a link to the abstract class or better still interface rather then the concrete version of the code. This will massively help reduce coupling but also allow for the Liskov substitution to take place much more easily.

Closing Remarks

Back in the day, this was bread and butter stuff. So know it!! It will help you understand design patterns, develop your own usage of patterns, find anti-patterns, apply code smells, understand why coupling is important etc. etc. etc. It's the basics! 

What surprises me is that even with the introduction of fully OO languages into the mainstream, this has not actually improved OO code that much. Indeed, in the case of the mid-level developer, the introduction of OO languages that force the use of OO, seems to have made that standard worse. I would hope that people actually start to learn these again as part of self-tutorship, degree programmes, programming courses etc. but it isn't fun and sexy (though more now than ever), so it doesn't get the recognition and understanding that it deserves. 

Friday, 9 November 2012

Windows 8 Pro Release...

...has a great uninstall application, as despite the Windows 8 Upgrade Assistant saying everything is hunky-dory, I spent the last 5 hours downloading and installing Windows 8 just to get the 2 screens shown below.

These appeared after the 'usual' Win8 ':( something has gone wrong' screen that I also got in CTP on a VirtualBOX VM, whatever I tried:

fig 1 - First error screen in the sequence


fig 2 - Second error screen in the sequence

It would be interesting to know from people who managed to install Win8 Pro on hardware that's a year or two old as I don't know anyone that's not had any problems at all.

Once installed, most people report a good experience (or at least some good experiences) but it is now 1am after wasting an unbelievable amount of time (and paying the cost of the software) that I won't get back, I am not in the mood to try to fix this now.

If Microsoft wants to compete in the tablet and phone markets with the likes of Apple, things have got to just work! With the diversity of the hardware platforms that it they will typically have to support in that arena, this isn't easy at best, but this certainly isn't the way to do it on a desktop platform they have dominated for a few decades.

I will have to maybe try it on a different box tomorrow. It depends if I can transfer that license across. Otherwise, dummy out of the pram, I'm sulking!

The Working Update

UPDATE: I finally managed to get it installed and working the day after. However, I had to reinstall all my applications (the majority of which were not on the Upgrade Assistant's list) and have still got some to do.

I had to choose to save only my files, not my apps. The BSOD error that was happening previously was coded as 0xC000021A. A quick Google seemed to suggest there were too many options at that point, a problem with winlogon.exe (which seemed to happen a lot in the history of Windows, including XP dying by itself), so I just thought blue word thoughts and installed the thing saving only my files.

Once installed, I am actually quite happy with it. It is very fast compared to Win7 on this box! Though I don't know if this is because I am not running some services which I used to. Apart from that, it is very responsive on my SSD based 3.6GHz quad core AMD Phenom II X4 975X Black edition.

The lack of Start menu was confusing, especially when I instinctively hit the Window key on the keyboard. The Metro interface does seem very simplified and closing Windows in Metro would be extremely long winded if I didn't know Alt+F4 existed. It requires a mouse to pick up and drag a Window to the bottom of the screen (think send it to the grave) or you could move the mouse to the top left hand corner of the screen to bring up the running apps bar, right click and select 'Close' (akin to right clicking an icon in the taskbar on Win7 and selecting 'Close Window').

The same is true of shutting Windows down. If you are o the desktop and choose Alt-F4 this brings up the usual Windows shut-down dialog box. Otherwise it is Win+C or move the mouse to the top right and select the 'Settings' cogwheel, then the power button, then 'Shutdown.' from the resulting context menu, then breathe!

I will continue to play and see where it takes me. There are a couple of annoying elements about Metro so far, but I hope this old dog will learn new tricks with time.

Sunday, 4 November 2012

Chaining risks, the Markov way (Part 1)

This one is a bit of a musing, as it isn't currently an established norm in the world of software development.

I started this blog post with the aim of going end-to-end on the translating of risk information stored in a log or an FMEA into a Markov chain, which would be modelled through an adjacency table which could then be reasoned with. For example, finding the shortest path through the risks and impacts to find the least risky way through the development or operational stages. However, this proved to be a much longer task when running it through step by step.

So I have decided to split this down into a couple of blog posts. The first will deal with modelling and visualising the risk log as a directed graph. The second will then build an adjacency table to reason with computationally and so deal with the optimising of those risks.

Risk? What risk?

During the development of a piece of software, there are a number of risks which can cause the project to run into failure. These can be broadly categorized into development and operation risks.

Remembering that the lifetime of a piece of software actually includes the time after its deployment into the production environment, we shouldn't neglect the risks posed in the running of the code. On average this is generally 85% of the total time of the lifetime of a project, yet we often give the proportion of the risks only lip service.

In either case, we have to be aware of these risks and how they will impact the software at all stages. Most 'new age' companies are inherently their software products and therefore, the risks associated with these products inherently put the company as a whole at significant risk.

I was thinking the other day about the use of FMEA's and their role in the communication process. I tailed off my use of FMEA like processes years ago, but picked them up again in 2011 after a contract in Nottingham. The process is pretty easy and harks back to the yesteryear of House of Quality (HoQ) analyses, which I used a lot and still use to some degree in the form of weighted-factor models or multivariate  analyses. People familiar with statistical segmentation or quants will know this from their work with balanced scorecards.

What struck me about the FMEA, even in its renaissance in my world, is that the presentation of the FMEA, just as with any form of risk log, is that it is inherently tabular in nature. Whilst it is easy to read, this doesn't actually highlight the effects those risks will have adequately.

FMEAs and Risk Logs

An FMEA (Failure Mode Effect Analysis) is a technique which expands a standard risk log to include quantitative numbers and allows you to automatically prioritise mitigating the risks not just on the probability of occurrence and its impact but also on the effect of its mitigation (i.e. how acceptable its residual risk is).

Now, often risks don't stand alone. One risk, once it becomes an issue, can kick off a whole set of other causes (in themselves having risks) and these will have effects etc.

Consider for example, a situation where a key technical person (bus factor 1) responsibly for the technical storage solution, leaves a company and an enterprise system's disks storage array fails or that array loses connectivity. This will then cause errors in the entire enterprise application catalogue where data storage is a critical part of the system, which then loses customer service agents the ability to handle customer data, consequentially the company money both in terms of lost earnings but also in reputation and further opportunity costs caused by such damage to the brand.

A risk log or even FMEA will list these as different rows on a table. This is inadequate to visualise the risks. Indeed, many side-effects of this form of categorization exist. Such as, if the log is updated with the above risks at different times, the log may not have these items near each other if they are sorted by effects or entered at different times. So the connection is not immediately obvious.

What are you thinking, Markov?

I started thinking about better ways to visualise these related risks in a sea of other project risks. One way that came to mind is to use a probability-lattice/tree to expand the risks, but then it dawned on me that risks can split earlier in a chain and converge again later on.

OK, easy enough to cover off. I will use a directed graph. No problem. But then this felt a bit like deja vu.

The deja vu was because this is effectively what a Markov chain is.

A Markov chain is effectively a directed graph (specifically a state-chart) where the edges define the probability and show the system's state move from risk to risk.

This was a particularly important result. The reason for this is any directed graph can be represented as an adjacency matrix and as such, it can be reasoned about computationally. For example, a travelling salesman algorithm can then be used to find the shortest path through this adjacency table and thus, these risks.

I have deliberately used the words 'cause' and 'effect' to better illustrate how the risk log could be linked to the Markov chain. Let's consider the risk log elements defined in the following table for the purpose of illustration:

Risk No Cause (Risk) Effect (Impact) Unmitigated risk (Risk,Impact) Mitigation Residual risk (Risk,Impact)
1 DB disk failure Data cannot be retrieved or persisted L,H Introduce SAN Cluster L,L
2 DB disk full without notification Data cannot be persisted M,L Set up instrumentation alerts L,L
3 Cannot retrieve customer data Customer purchases cannot be completed automatically M,H Set Hot standby systems to failover onto L,L
4 Cannot process payments through PCI-DSS payment processor Customer purchases cannot be completed automatically M,H Have a secondary connection to the payment gateway to failover onto L,L
5 Customer purchases cannot be completed automatically Net revenue is down at a rate of £1 million a day M,H Have a manual BAU process L,M


I have not included monitoring tasks in this, plus this is an example of an operation risk profile. However, if you look carefully, you'll note that the risks play into one another. In particular, there are many ways to get into the 'Customer purchases cannot be completed automatically' or 'Data cannot be persisted' effects. However, it is not immediately obvious that these risks are related.

We can model risks as the bi-variable(r,s),where r is the probability of the issue occurring and s as the impact if the risk occurs (i.e. sensitivity to that risk).

The values of these bi-variables are the L,M,H of each of risk and impact in a risk log or FMEA (in the latter case, it is possible to use the RPN - Relative Priority Number - to define the weighting of the edge which simplifies the process somewhat).

Taking the risk component alone, this will eventually be used as the elements in an adjacency table. But first, to introduce the Markov chain. Obviously, if you are familiar with Markov chains, you can skip to the next section.

Markov Chains. Linking Risks.

Markov chains are a graphical representation of the probability of events occurring, with each node/vertex representing the state and the edges the probability of that event occurring. For each node in the chain, it must have the sum of all probabilities leaving the node equal to one. Consider this the same as a state transition diagram, where the edges are the probabilities of events occurring.

In a Markov chain, every output has to total 1. Thus you have to show the transitions which do not result in a change of state if applicable. If a probability is not shown in the risk log, then it is not a failure transition (thus is is no issue) so you include that as 1 minus the sum of all outgoing transition probabilities. Effectively letting any success on a node loop back on itself.

If we set low risk to be 0.25, medium 0.5 and high 0.75 with critical risks at anything 0.76 - 1.00, Then the following diagram shows the modelling of the above risk log as a Markov chain:
Fig 1 - Markov Chain of above risk log

To explain what is going on here, you need to understand what a Markov chain is. A little bit of time reading the wiki link would be useful. However, basically, combining all the state effects together, we have built this chain which shows the way these effects interplay. With each effect, there is a further chance of something happening which then leads to the next potential effect. From the above network, it is immediately clear that some risks interplay. Often, the risks which have the most lines coming in to them need to be mitigated, as any of those incoming lines could cause that state to be entered.

The results can be analysed straight from this. Given each risk is an independent event to any other, the probabilities can simply be multiplied along the chain to the target. We can ask questions such as:

Q: What is the chance we lose 1 million GBP or more?
A: This particular chain only contains nodes which have only 2 types of event emanating from it. Thus we can deduce that the effect of losses can happen from any of the working states through the chain, but there are two ways to work this out. The long winded way which is to follow all the chains through, or use the short winded way which is to look at the situation where everything is working and subtracting this away from 1, to give use the chance of losing 1 million GBP a day. Because I am lazy, I prefer the latter way, which gives:
The latter way also takes into account more than 2 exists in each node. This is particularly important when there may be 3 or more risks that could happen at each chain.


Q: What is the effect of a failure on the DB disk?
A: By following the chain through and expanding  a probability tree (wiki really need someone to expand on this entry, it's rubbish!), assuming the disk has failed, we get:

chance of missing customer data = 100%
chance of lost purchases = 50%
chance of loss of £1 million or more = 25%

The reason for the latter is:

Summary

Although I have not used these in earnest, I am keen to look at the use of Markov chains and will be exploring the use of them when transformed into adjacency tables for computational purposes using linear algebra in the next blog entry. 

Markov chains are widely used in analytics/operations research circles, so it would be useful to see how they apply here. But already from this you can immediately see how the effects interplay and what sort of reasoning can be accomplished using them. This shouldn't be too new to those that have studied PERT, six sigma and network analysis techniques in project management/process optimisation courses, as they are effectively practical applications of this very same technique. Indeed, a blog I did a while back on availability is a practical example of this at system level.

To be continued :-)

Thursday, 13 September 2012

Which came first, contract or code?

Another recurring theme that I keep seeing time and time again on my travels is the debate between communities about which should come first. Contract or code. I see a place for both in service development and decided to explore the reasons for why that is.

Code-First Development

Code first development is defined as delivering a working system, refactoring and splitting the code ito a separate subsystem along a 'natural' boundary. At this point, the teams themselves can be split into two, one for either side of the service contract.
Consider a fabricated shopping basket example. A simple shopping basket is developed as one monolithic and to end function, which sums the prices of the items in them, calculates the tax on the order and renders itself on screen.
A step in refactoring may introduce an MVC pattern and split out the tax calculation using a strategy pattern.
Once split, the taxation interface may be considered a separate domain, natural to split on. The team itself then splits too, with some members then going on to work on the taxation service whilst others remain on the web component.

fig 1 - Code first development
This is the method normally advocated by XP and SCRUM proponents.

pros:

  • Knowledge of the contract doesn't have to be agreed up front - The contract is defined by refactoring to it and emerging the design over time.
  • Team size can be small, then increase until a natural fracture point in the architecture necessitates the split of the code and the team. This maintains Conway's law.
  • Delivery is assessed by acceptance criteria associated with end-to-end business processes.
  • Very useful for delivering software where the business process is not fully known.
  • Delivers more optimum results inside departmental systems.
cons:
  • Breaking changes where split teams do not communicate effectively is higher.
  • Where end-to-end business value is outside the scope of the development team, or they do not have full control/visibility of the end result (such as interacting with COTS systems) and the success criteria doesn't account for they integration work, this can be difficult to get right and the service contracts may not match actual expectations.
  • The services do not evolve to represent the business value until later in the process - message passing between departments does not necessarily evolve from the microscopic view of the role of the technical contracts.
  • Cross-team communication is essential, so the split teams will have to sit near each other to communicate effectively. As the service catalogue grows, this becomes a much more difficult task.
  • Can miss the wider optimisations as the bigger picture is never addressed during 'emergent design'. The resulting optimisations are effectively sub-optimal at the organisation level.

Contract-First Development

By contrast, contract first development is delivered from the identification of messages flowing between the business functions, utilises techniques such as design-by-contract to define message and service contracts between the departments and their systems, which then become the acceptance criteria for the code developed by different teams independently on either side. Automated validation is performed on both sides against the contract using service stubs and a standard interface definition (such as an XSD).

An example process might be:

fig 2- Contract first definition
pros:
  • Architecture and management, who know the bigger picture of the organisation (if one exists) can help define the detail of the business process and hence, contract obligations.
  • Both teams can work independently at an earlier stage and get to a standard contract definition much quicker.
  • Very useful in static, well defined companies with well defined departments and divisions.
  • Much easier to apply Conway's law.
  • Better placed to provide global, enterprise level optimisations - Since similar messages and interactions can be identified much easier as people are looking at it.
  • Contracts provide very well defined, technical acceptance criteria which can be applied to automated testing very easily.
  • Non-development managers and senior managers in structured companies can identify with this method much easier.
cons:
  • Requires a joint design activity up-front to establish the form of the contract and solution.
  • Requires enough big picture thinking to effectively establish the inter-departmental contracts.
  • Not well understood or appreciate by the majority of purist agile developers who are often not concerned with the bigger picture.
  • Less scope to evolve the contracts, so worse for more fluid organisations where the business process is not known up-front.

Summary

The above non-exhaustive list of pros and cons should help guide the development and/or architecture teams on when to use which method. One day I hope to add metric to this set, such as using a multivariate model to  evaluate companies against this. It would be interesting to see if the community at large already has a similar way to define this. So drop me a comment if you do.