This is particularly relevant in the infrastructure domain, especially with the introduction of data-centres over the years and given the increased importance of IT in large enterprises, I felt that I should cover some of the fundamentals of why we use redundant systems. This is especially important for companies who deliver PaaS infrastructure and was very lightly touched upon in the fairly recent Microsoft cloud day in London (Scott Guthrie didn't do any maths himself to prove the point).

### Mean-Time to Failure and Uptime

Every hardware system has a mean time to failure MTTF. This is calculated from a series of runs, where the time taken for a system to break or error is calculated from a couple of dozen component runs. Then a mean failure time is calculated from those results.System vendors use these uptimes to then give a warrantee that minimises them doing work for free but gives them a certain confidence to be able to offer that service as SLAs or to concur with legislative frameworks (given the risk of something happening increasing the nearer you get to the MTTF).

In the case of data centre/server room infrastructure, these mean times to failure, when apportioned by year/month or whatever, can indicate the uptime of the system component. SLAs for uptime are then delivered on adjustments of that.

For example, if a router has a mean time to failure of 364 days of always on use (which is realistic in a lot of cases) then the uptime is a day in every year, which is also known as (100 * 364)/365 = 99.726% uptime. You can statistically model this as the probability of the system being up in a year.

When you combine a number of these components together, you have to be aware of the uptimes for all components and also be very aware of how those components interact. In order to understand the uptime of the whole system, you have to look at the single points of failure which connect these systems to the outside world.

### How many 9s?

It has always been touted that if you increase the availability of a system by a '9', you increase its cost ten-fold. Whilst correct as a heuristic, there are things you can look at to try to improve availability on the infrastructure you already have, without necessarily spending money on extra hardware. We will investigate total costs and what this means for cloud providers or data centre operators at a later date, but for now, let's look at an example.Imagine a network structured like the following:

*fig 1 - Sample Network*

*'l'*represents levels at which the uptime can be calculated. We can state that the uptime of the system can be determined by the intersection of the uptime of all relevant components at each level.

*eq. 1 - Probability the system is up*

*eq. 2 - Current level availabilities are not*

*affected by higher or lower level availabilities*

**This assumption is not true with power supplies, since Lenz's law defines that the Newtonian equal and opposite reaction to a power supply switch/trip is a surge spike back into the parent supply and potentially into the same supply as the other components. However, to keep this example simple, we are concentrating on network availability only.**

__Technical Note:__- Backbone router 99.9%
- Subnet Router 99% (each)
- Rack 95% (each)
- Backplate 95% (each)
- Server 90% (each)

**1. Single Server availability**

*eq. 3 - Single Server Availability*

**2. Triple Server Availability**

*fig 2 - Triplicated site, same backplate component.*

*eq. 4 - Triplicate the web application, level 1*

*eq. 5 - Total availability*

**3. Different Backplate Routers**

*fig 3- Triplicated site, different backplate router components.*

*eq. 6 - Level 2 availability and system availability*

**4. Across Two Racks**

*fig 4 - Different Rack Clusters (3 different backplate routers)*

*eq. 7 - Level 2 & 3 availability and system availability*

**5. Two Subnets**

*fig 5 - Different Subnets (3 different racks)*

*eq. 8 - Level 2, 3 & 4 availability and system availability*

### Summary

*"An engineer is someone who can do for a penny what any old fool can do for a pound"*