High Scalability -

Top

Recommend The Azure Outage: Time Is a SPOF, Leap Day Doubly So (Email)

This action will generate an email recommending this article to the recipient of your choice. Note that your email address and your recipient's email address are not logged by this system.

Email Article Link

The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.

Article Excerpt:

This is a guest post by Steve Newman, co-founder of Writely (Google Docs), tech lead on the Paxos-based synchronous replication in Megastore, and founder of cloud service provider Scalyr.com.

Microsoft’s Azure service suffered a widely publicized outage on February 28th / 29th. Microsoft recently published an excellent postmortem. For anyone trying to run a high-availability service, this incident can teach several important lessons.

The central lesson is that, no matter how much work you put into redundancy, problems will arise. Murphy is strong and, I might say, creative; things go wrong. So preventative measures are important, but how you react to problems is just as important. It’s interesting to review the Azure incident in this light...

Article Link:

Your Name:

Your Email:

Recipient Email:

Message: