As it happens, I was at an AWS Bootstrap workshop yesterday and I am trying to hunt down a solution architect who can answer a question I had from that, Sabastian the trainer couldn't answer at the time. Best go find such a target...
Keynote
I was worried that I wouldn't be able to sit through the keynote as it was going to be two hours long. However, Steve Schmidt's presentation was actually quite informative.Carlos from Just-Eat showing how AWS has facilitated their agility |
Carlos from Just Eat was here again, but Channel 4 put in an appearance, explaining how data drives their advertising decisions, which in turn have led to an 8-fold increase in benefit to customers.
The current AWS estate |
The top-level AWS service catalogue hasn't seemingly changed much from last time. That said, the number of services, which include addtios to service offerings, which I would have liked to hear more about, has increased dramatically. So much so, they didn't fit on a standard linear graph, which could have had a much stronger impact. Oh well, they're techies :o)
Schmidt showing us the mis-scaled pace of innovaiion ;) |
The one thing we heard a lot more of was governance. Whilst AWS provides enough tools to deliver a reasonable level of governance, there was much more a hint around how it was being used in the enterprise sphere.
There has recently been a lot of noise about hybrid-cloud solutions. Steve Schmidt spend a little bit of time mentioning the interoperability of cloud and on premise solutions. In my experience, there is a lot more to it.
My epic fail attempt to win a Kindle |
What's the answer to my question?
Yesterday I was at an AWS workshop. In it, Sabastian the trainer, mentioned that records deployed to an AWS availability zone, with a slave availability zone, would synchronously commit changes on the master and the slave. It doesn't commit this to the master until it has been committed to the slave. This gave me some cause for concern.The aim is if the master availability zone goes down, the slave doesn't goes down. This is sensible form the point of view of the master. However, if the slave availability goes down and the master doesn't commit unless the slave also commits, then unless there is a way for RDS to commit this to the master, you have an issue. Indeed, this situation could technically result in lower availability than running single instances of the DB.
Why?
Firstly, I am pretty sure that this is DB instance independent. Assuming you have a synchronous process that requires two activities to complete successfully. However, I have not seen or heard evidence to suggest this is a two-phase commit process. So to illustrate the issue, an example might be useful.Supposing the two cloned platforms across the two availability zones each have a 99.95% availability. For a master-slave configuration where the commit of the master is dependent on the slave, this introduces a dependency chain and means that the whole uptime of your entire platform requires both services to be up in certain configurations. The result is this reduces the availability to about 99.90% (i.e. the probability of both systems being up). This is lower than any single server and certainly lower than systems running independently in parallel.
This doesn't mean that it is a problem. After all, you can architect to remove this risk and hence increase the availability of the data sources as a whole. However, I put this to our trainer yesterday and he said he'd go away and ask. Hence, I didn't receive an answer at the time.
I spoke to a solution architect this morning and he too didn't have an answer. So it would be good to get one. I am not too bothered whether it is positive or negative, but it would dictate the complexity of a system design and also provide a theoretical constraint to conduct trade-offs around . Known-unknowns can be troublesome, especially if you've only just discovered it, since it was an unknown-unknown before. I must get round to chasing this up.
**** UPDATE: After chasing this up, it appears that there isn't currently any documentation to corroborate the assertion from a few other SAs that the platform would prevent the saving of data in the event of an AZ failure. However, this also doesn't tell me if it wouldn't. I've had my details taken but no sticker given ;) ****
400 - EBS and EC2 Optimisation
This was a 400 track. There were some extremely useful slides in this track. AWS went through an intro explaining that EBS is basically a storage mechanism with a queue attached and is not like a normal disk. I still think it kind of is, when you include the buffers and caches. Both standard EBS and EBS PIOPS (Provisioned IOPS) were introduced and in the latter case, we briefly touched on the configuration of the IOPS provision.However, importantly for me, the existence of a formal queue defines a specific need to understand the block size per IOP, as this can significantly affect the throughput of the system. The bigger the ECS instance, the more you can write (specifically, the faster you can write the data), the bigger the EBS queue, the faster it can write the standard 16K blocks.
This suggests that the best way to write the data to disk is to chunk them up in 16K blocks (or multiples thereof) and write them in parallel, which was suggested in yesterday's workshop.
200 - Hybrid Environments with AWS
This was an interesting track, however most of this is pretty standard. For indeed, some of my clients have done this for a while. I have a much greater appreciation of this via some of the security group work I've done since last year. So I am liking the way that hybrid and cloud solutions can work together. There didn't seem to be too much that was new though.Hybrid Environments - Yes, I was a bit late to this one :-S |
300 - Building for availability and cost
Fitz, a solution architect at Amazon presented Here.com's autoscaling solution. Through all Autoscaling demos at this conference, the mantra "scale up fast, scale down slow" was repeated. This is because it takes little time to prevent an AWS EC2 instance from receiving traffic, but it can take an age for it to get to a position to receive traffic. So that makes sense.End of Day
Not a bad summit. I don't think I will take away as much from this as I did last year... aside from that my weight isn't appropriate for perspex chairs (Sorry Amazon). Amazon always put on a very good show. I'm sat here with a beer whilst I prep to tackle the tube 'struck' TFL public transport system before getting my train to Manchester. There is a lot to take away and I'll have to let that lot ferment as much as the beer before brewing up a new vat of ideas for the future of my architecture work with the new tools AWS provide. I am still to be convinced of the some of them, such as the need for schedule based autoscaling, which I see as a way to circumvent the 15 or 20 minute spin up of a new platform. However, they do solve some problems so are not at all without purpose. Especially in warming up environments for immediate use.
Additionally, the EBS optimisation session has set off a few ideas around using queuing theory to try to explain some of the numbers Amazon have found in their testing. One thing that appeared time and time again was the experience of other speakers, a large proportion of whom spent a lot of time and effort creating PoC platforms to prove the viability of AWS.