Entries by HighScalability Team (1576)

Monday
May022011

The Updated Big List of Articles on the Amazon Outage

Since The Big List Of Articles On The Amazon Outage was published we've a had few updates that people might not have seen. Amazon of course released their Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. Netlix shared their Lessons Learned from the AWS Outage as did Heroku (How Heroku Survived the Amazon Outage), Smug Mug (How SmugMug survived the Amazonpocalypse), and SimpleGeo (How SimpleGeo Stayed Up During the AWS Downtime). 

The curious thing from my perspective is the general lack of response to Amazon's explanation. I expected more discussion. There's been almost none that I've seen. My guess is very few people understand what Amazon was talking about enough to comment whereas almost everyone feels qualified to talk about the event itself.

Lesson for crisis handlers: deep dive post-mortems that are timely, long, honestish, and highly technical are the most effective means of staunching the downward spiral of media attention. 

Amazon's Explanation of What Happened

Click to read more ...

Wednesday
Apr272011

Heroku Emergency Strategy: Incident Command System and 8 Hour Ops Rotations for Fresh Minds

In Resolved: Widespread Application OutageHeroku tells their story of how they dealt with the Amazon outage. While taking 100% responsibility for the downtime, they also shared a number of the strategies they used to bring their service back to full working order.

One of Heroku's most interesting strategies wasn't a technical hack at all, but how they consciously went about deploying their Ops personnel in response to the emergency. An outline of their strategy is:

Click to read more ...

Monday
Apr252011

The Big List of Articles on the Amazon Outage

Updated on Friday, April 29, 2011 at 8:27AM by Registered CommenterHighScalability Team

So many great articles have been written on the Amazon Outage. Some aim at being helpful, some chastise developers for being so stupid, some chastise Amazon for being so incompetent, some talk about the pain they and their companies have experienced, and some even predict the downfall of the cloud. Still others say we have seen a sea change in future of the cloud, a prediction that's hard to disagree with, though the shape of the change remains...cloudy.

I'll try to keep this list update as more information comes out. There will be a lot for developers to consider going forward. If there's a resource you think should be added, just let me know.

Amazon's Explanation of What Happened

Click to read more ...

Friday
Apr222011

Stuff The Internet Says On Scalability For April 22, 2011

Submitted for your reading pleasure on the day, deep breath, before Dr. Who invades the USA...

Click to read more ...

Wednesday
Apr202011

Packet Pushers: How to Build a Low Cost Data Center

The main thrust of the Packet Pushers Show 41 episode was to reveal and ruminate over the horrors of a successful attack on RSA, which puts the whole world security complex at risk. Near the end, at about 46 minutes in, there was an excellent section on how to go about building out a low cost datacenter.

Who cares? Well, someone emailed me this exact same question awhile back and I had a pretty useless response. So here's making up for that by summarizing the recommendations from the elite Packet Pushers cabal:

Click to read more ...

Monday
Apr182011

6 Ways Not to Scale that Will Make You Hip, Popular and Loved By VCs

This is a hilarious presentation by Josh Berkus, called Scale Fail, given at O'Reilly MySQL CE 2011. Josh is entertaining, well spoken, and cleverly hides insight inside chaos. And he makes some dang good points along the way.

Josh has a problem, you see Josh has learned how to make sites that are both scalable and reliable. So he's puzzled why companies "whose downtime interfaces (Twitter) are more well known than their uptime interfaces" get all the attention, respect, and money for being failures. Just doing your job doesn't make you a hero.  You need these self-inflicted wounds in-order to have the war stories to share at conferences. They get the attention. Just doing your job is boring. This is so unfair in that way life can be. 

So if you want to turn the tables and take the low road to fame and fortune, here's Josh's program for learning how not to scale:

Click to read more ...

Friday
Apr152011

Stuff The Internet Says On Scalability For April 15, 2011

Submitted for your reading pleasure...

Luxury is an ancient notion.  There was once a Chinese mandarin who had himself wakened three times every morning simply for the pleasure of being told it was not yet time to get up.  ~Argosy

  • We have a Qutoable Quote machine for you today:
    • @kevinweil: Twitter monthly signups have increased more than 50% since December, and we're now doing well over 150 million Tweets per day.
    • @ChrisShain: Prediction: Black art of query optimization will become black art of #nosql data modeling, for same reasons. Minimize IOs, query time.
    • @ui_matters: Infrastructure as a Service = no hardware headaches. Platform as a Svc = no scalability headaches. SaaS = common dev platform #amchamtech
    • @plcstpierre: Thinking about high scalability stuff... I never thought database stuff can be interesting...
    To read more of what the Internet is saying on scalability please read below...

    Click to read more ...

Thursday
Apr142011

Strategy: Cache Application Start State to Reduce Spin-up Times

Using this strategy, Valyala, a commenter on Are Long VM Instance Spin-Up Times In The Cloud Costing You Money?, was able to reduce their GAE application start-up times from 15 seconds down to to 1.5 seconds:

Click to read more ...

Wednesday
Apr132011

Paper: NoSQL Databases - NoSQL Introduction and Overview

Christof Strauch, from Stuttgart Media University, has written an incredible 120+ page paper titled NoSQL Databases as an introduction and overview to NoSQL databases . The paper was written between 2010-06 and 2011-02, so it may be a bit out of date, but if you are looking to take in the NoSQL world in one big gulp, this is your chance. I asked Christof to give us a  short taste of what he was trying to accomplish in his paper:

Click to read more ...

Tuesday
Apr122011

Sponsored Post: Gazillion, Edmunds, OPOWER, ClearStone, deviantART, ScaleOut, aiCache, WAPT, Karmasphere, Kabam, Newrelic, Cloudkick, Membase, Joyent, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • Gazillion Entertainment is looking for a Web Developer Generalist to work on massively multiplayer online games. Please apply here
  • Edmunds.com helps people find the car that meets their every need.  We’re currently hiring talented Java Developers in the Los Angeles area.
  • OPOWER motivates millions to become more energy efficient, and we're hiring!
  • deviantART is looking for Infrastructure and Database Operations Engineers! 
  • Kabam is looking for a Quantitative Analyst and a Senior Data Engineer to join the Business Intelligence group at our social gaming startup.

Fun and Informative Events

Cool Products and Services

  • APM (Application Performance Management) for NOSQL, Java and More - Try ClearStone 5.0. Download ClearStone 5.0 today!  http://www.evidentsoftware.com/download/
  • ScaleOut StateServer - Scale Out Your Server Farm Applications!
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. 
  • WAPT is a load, stress and performance testing tool for websites and web-based applications.
  • Karmasphere is bringing Apache Hadoop power to developers and analysts. Download your Free Community Edition today!
  • Newrelic - What are you doing to ensure the performance of your apps?
  • Cloudkick - monitor & manage your servers better with a FREE Cloudkick developer account.
  • Learn how two game developers prepared for rapid user growth in this recorded Joyent webinar: http://bit.ly/hzBoib.
  • CloudSigma. Instantly scalable European cloud servers.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.
To read more on each sponsor please click below...

Click to read more ...