Imagine this situation: you are the new guy in a new project. Maybe you were just hired by the company, or it is a side project that you just sold (great!), or even is just a new project at the organization where you actually works.
And then somebody says those scary words: “we need this application to be highly available”. Now imagine that you have no clue on what the hell they are talking about.
Before you start to crying or pretend to pass out, let’s have some pratical steps that may be helpful to understand what your customer is asking.
1-First of all, ask your customer what “highly available” means to him/her
This is key to know whatever you should do with your solution. You could get answers like:
- “The system should be available from 08AM to 17PM, from Monday to Friday”;
- “The system should be available 24 hours a day, from Monday to Saturday”;
- “The system should be available 24 x 7 (24 hours a day, 7 days a week)”.
Of course you could get many other answers. The point is that each answer above would require a totally different kind of solution. The architecture should be different and the development also should.
2-Ask what are the exceptions related to the question (and answer) above
His/Her first answer surely will be “no exception at all!”. But trust me.. insist! There are many situations, both predictable and unpredictable, that may cause the application to be unavailable:
- Blackout
- Network failing
- Deployment
- OS failing
- etc
Of course you can deal with each of these items and that’s why you need to know what are the exceptions. This can be easy to handle if the system should be available from 08AM to 17PM, from Monday to Friday… but the things get tricky when you need to design a 24 x 7 solution.
In any scenario would be nice to have some index to help managing the customer expectations. For example, between the availability defined, the system will available 99.5% of the time, or 95%… or 99.99%!
And then comes the third question…
3-What is the budget available? Is it highly available?
(Of course you shouldn’t do the second part of the question… or should, whatever…)
Depending on the budget available, the answer to the previous question should be reviewed. Will be hard to build a solution that is available 24 x 7, 99.99% of this time, with just one server… no cluster, no replication, no UPS (Uninterruptible Power Supply)… no contingency at all!
So if your customer is insisting in give no exceptions at all at the second question, maybe he/she will review the decision at this moment!
Now that you have an overview of what your customer want, what are the exceptions and what is the budget, you can do a better planning of your solution (both architecture and development). And can even brag a little about yourself for knowing all the time what was this “highly availability” thing!