Amazon EC2 Outage Explained and Lessons Learned

By Letrasic

Amazon will provide a “10-day credit equal to 100% of your usage of EBS volumes, EC2 instances, and RDS DB instances that were running in the affected Availability Zone” to all affected customers. The report ends with apologies to AWS customers.

The media has been inundated with reports related to the first cloud outage. Jason Bloomberg draws a number of lessons:

  • There is no 100% reliability. In fact, there is nothing 100% about IT: no code is 100% bug-free, no system is 100% fail-safe, and no security is 100% impenetrable. Just because Amazon appeared snake-eyed in this roll of the dice doesn’t mean that public clouds are any less trustworthy than they were before the crisis. Whether it’s investing in the stock market or building a highly available IT infrastructure, the best way to reduce risk is to diversify. Do you have eggs The more baskets the better.
  • This particular crisis is unlikely to happen again. We can safely assume that Amazon has some smart and evil cloud experts, and that they had already built a cloud architecture that could withstand most challenges. Suffice it to say, therefore, that the last crisis had a set of unusual and complex causes. It is also clear that those experts are working feverishly to eradicate those causes, so that this particular set of circumstances does not happen again.
  • Unknown unknowns are by definition inherently unpredictable. Although the particular sequence of events that led to the current crisis is unlikely to happen again, it is relatively likely that other completely unpredictable problems will emerge in the future. But these issues could apply to private, hybrid, or community clouds just as much as they could affect public cloud again. In other words, abandoning public clouds to take refuge in the supposedly more secure private arena of the cloud is a futile exercise.
  • The most important lesson Amazon should learn is about visibility that reliability. The weakest part of Amazon’s cloud offerings is the lack of visibility they provide to their customers. This “never mind the man behind the curtain” attitude is part of how Amazon supports the cloud abstraction that I discussed in the previous ZapFlash. But now it goes against them and their clients. For Amazon to capitalize on your success, you need to open your kimono a bit and provide your customers with a level of management visibility into your internal infrastructure that you have not been comfortable with until now.

Read the entire article here.

SHARE THIS: