Here is a tale of a small little airplane and what kind of damage it can do. No I am not going back to Sep. 11 and the WTC. more of the likes of Feb 17th, 2010 - 07:50 in Palo Alto California.
A play-by-play of the events showed that because of a small plane that crashed (unfortunately killing 3 people) Palo Alto went dark - well not really dark, because it was early morning - but power went out, COMPLETELY!!
Now I guess that most of us can imagine what that means for us in our personal life, I mean you have No lights, no TV - no VoIP, no computers, no air conditioners, no hot espresso from the espresso machine…
You see where I am going.
But taking this to a different angle. Your Datacenter.
Power goes out. If you are well organized - you will have backup power for a certain amount of time - depending on the generator.
SMS alerts start flying into your mobile phone like crazy.
You call the office … no-one answers.
You manage to find someone on his mobile that is at work - and try to find out what is going on - but they are as confused as you are.
You try to turn on your TV to see what is going on - whoops forgot no electricity - which means not internet connection either.
Chaos… Confusion…. this is what it can be like.
I mean I could go on and on but I think you get my drift.
So if you are lucky enough you manage to power everything down properly before your backup power runs out of juice.
OK now comes the analysis phase - WHAT IS DOWN?
Let us go back to VMware's case. No Email. No Phones.
Post notification to the world of service availability - Twitter
Open up a channel of communications as soon as possible - Twitter
Restore services as soon as possible
(in this case until full functionality was back was almost 10 hours)
I am not going to go into the BCP plan of VMware - because I have absolutely no knowledge of what was down and what was still up and working.
For your BCP plan you will need (amongst others) these things:
Recovery time objective - How long until I am back up?
Recovery point objective - How much is the acceptable amount of data loss measured in time?
What critical services do I need back online and in what order and how soon?
In this particular case I can only assume that the BCP plan that existed for VMware in Palo Alto before February 17th and the BCP plan that will be amended thereafter will not be the same.
I am sure that the small Flagship product called Site Recovery Manager can be used for this purpose and hopefully VMware will come out with a better BCP plan.
So have you reviewed your BCP plan for your Datacenter lately? If not….
DO IT!!!!!! NOW!!!!!!!!!!
p.s. I hope all of you in Palo Alto did not have to through out too much food from your refrigerators - I mean 10 hours is a long….. time