Entries by HighScalability Team (1576)

Wednesday
Mar142012

The Azure Outage: Time Is a SPOF, Leap Day Doubly So

This is a guest post by Steve Newman, co-founder of Writely (Google Docs), tech lead on the Paxos-based synchronous replication in Megastore, and founder of cloud service provider Scalyr.com.

Microsoft’s Azure service suffered a widely publicized outage on February 28th / 29th. Microsoft recently published an excellent postmortem. For anyone trying to run a high-availability service, this incident can teach several important lessons.

The central lesson is that, no matter how much work you put into redundancy, problems will arise. Murphy is strong and, I might say, creative; things go wrong. So preventative measures are important, but how you react to problems is just as important. It’s interesting to review the Azure incident in this light...

Click to read more ...

Tuesday
Mar132012

Sponsored Post: Nokia, Oracle, Percona Live, AiCache, ElasticHosts, Logic Monitor, Attribution Modeling, New Relic, AppDynamics, CloudSigma, ManageEngine, Site24x

Who's Hiring?

  • Nokia's Cloud Computing, Operations and Development Group is hiring! Check out: http://devops.nokia.com. CCOD designs, builds, manages and scales Nokia’s cloud computing. 
  • ConnecTV is a start up looking for a DevOps & System Administration Leader to help build a revolutionary new social network to enrich the experience of watching TV. Apply here.

Fun and Informative Events

  • The Percona Live MySQL Conference & Expo features 60+ speakers, 72 breakout sessions, and keynotes from HP, Facebook, Box, Eucalyptus Systems, and more. April 10-12 in Santa Clara
  • Sign up for this free 30-minute webinar exploring how new technology can determine which ads have been seen by users and will discuss the C3 Metrics Labs analysis of over 2 billion impressions. 

Cool Products and Services

  • Join the MySQL experts from Oracle to learn Oracle's strategy for MySQL as well as the latest product development and features. So learn more and register now!
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aicache acceleration for free.  No sign-up required. http://deploy.aicache.com
  • ElasticHosts award winning cloud server hosting launches across North America. Adding data centers in Los Angeles and Toronto. Free trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
Mar122012

Google: Taming the Long Latency Tail - When More Machines Equals Worse Results

Likewise the current belief that, in the case of artificial machines the very large and the very small are equally feasible and lasting is a manifest error. Thus, for example, a small obelisk or column or other solid figure can certainly be laid down or set up without danger of breaking, while the large ones will go to pieces under the slightest provocation, and that purely on account of their own weight. -- Galileo

Galileo observed how things broke if they were naively scaled up. Interestingly, Google noticed a similar pattern when building larger software systems using the same techniques used to build smaller systems. 

Luiz André Barroso, Distinguished Engineer at Google, talks about this fundamental property of scaling systems in his fascinating talk, Warehouse-Scale Computing: Entering the Teenage Decade. Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today's service oriented systems, the overall response time is dominated by the long tail distribution of the parallel operations. Every response must have a consistent and low latency or the overall operation response time will be tragically slow. The implication: high performance equals high tolerances, which means your entire system must be designed to exacting standards.

What is forcing a deeper look into latency variability is the advent of interactive real-time computing. Responsiveness becomes key. Good average response times aren't good enough. You simply can't naively scale up techniques to build larger systems. The reason is surprising and has deep implications on how we design service dominated systems:

Click to read more ...

Friday
Mar092012

Stuff The Internet Says On Scalability For March 9, 2012

You've Got Questions We've Got HighScalability:

  • 1 trillion bits per second: IBM’s Holey Optochip; Scale of the Universe: 2; Infinite wireless: Vortex radio waves; 105,000 Servers: Akamai.
  • Quotable quotes:
    • @CodingFabian: IaaS = Ops without Hardware; PaaS = Devs without Ops; SaaS = Business without Devs
    • @dthume: "Fault tolerance implies scalability" - Joe Armstrong, 
    • @jessiekeck: Looks like my local bar takes the same approach to scalability with their paper towels as I do w/ software. http://pic.twitter.com/DTL2W1eC
    • @neil_conway: Weird: network locality is no longer important within a DC and yet communication predicted to dominate computation cost in manycore CPUs
    • @coda: You don't "beat the CAP theorem". You "build distributed systems that don't suck miserably". At best.
  • Before you complain too much about Apple's store being down on iPad day, remember scaling a store is much harder than scaling a website. Just ask Amazon and eBay. You know you are going to buy one anyway, so why should they spend on handling a once a year peak load?
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Tuesday
Mar062012

Ask For Forgiveness Programming - Or How We'll Program 1000 Cores

The argument for a massively multicore future is now familiar: while clock speeds have leveled off, device density is increasing, so the future is cheap chips with hundreds and thousands of cores. That’s the inexorable logic behind our multicore future.

The unsolved question that lurks deep in the dark part of a programmer’s mind is: how on earth are we to program these things? For problems that aren’t embarrassingly parallel, we really have no idea. IBM Research’s David Ungar has an idea. And it’s radical in the extreme...

Click to read more ...

Friday
Mar022012

Stuff The Internet Says On Scalability For March 2, 2012

Please don't squeeze the HighScalability:

  • Quotable quotes:
    • @karmafile: "Scalability" is a much more evil word than we make it out to be
    • @ostaquet: More hardware won't solve #SQL resp. time issues; proper indexing does.
    • @datachick: All computing technology is the rearrangement of data. Data is the center of the universe
    • @jamesurquhart: "Complexity is a characteristic of the system, not of the parts in it."
  • Peter Burns talks computer nanosecond time scales as a human might experience them. Your memory == computer registers , L1 cache == papers kept close by, L2 cache == books, RAM == the library down the street, and going to disk is a 3 year odessy for data.
  • Fault Tolerance in a High Volume Distributed System at Netxlix (slidedeck). Ben Christensen with another deep dive on Netflix tech. This time it's on how to support an extreme service architecture by: isolating failures, shedding load, and being resilient to failures. Their solution makes use of multi-pronged: network timeouts and retries, separate threads on per-dependency thread pools, semaphores (via a tryAcquire, not a blocking call), and circuit breakers.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Mar012012

Grace Hopper to Programmers: Mind Your Nanoseconds!

Computing pioneer Grace Hopper, inventor of the compiler, searched for a concrete way to create an intuitive understanding of just how fast is a nanosecond, a billionth of a second, which was the speed of their new computer circuits. As an illustration she settled on the length of wire that is as long as light can travel in one nanosecond. The length is a very portable 11.8 inches. A microseconds worth of wire is a still portable, but a much bulkier 984 feet. In one millisecond light travels 186 miles, which only Hercules could carry. In today's terms, at a 3.06 GHz clock speed, there's .33 nanoseconds between ticks, or 3.73 inches of light travel.

Understanding the profligate ways of programmers, she suggests that every programmer wear a necklace of a microseconds worth of wire so they know what they are wasting when they throw away microseconds. And if a General is busting your chops about satellite messages taking too long to send, you can bust out your piece of wire and explain there's a lot of nanoseconds between here and there.

Here's a short, witty, and wise video of her famous nanosecond demonstration. An amazing lady, great innovator, an engaging speaker, and an inspiring teacher.

Click to read more ...

Wednesday
Feb292012

Strategy: Put Mobile Video Into Cold Storage After 30 Days

Limelight says 95% of Mobile Video Views Take Place in First 90 Days and 88.8 percent of views take place in the first 30 days.

Since a lot of people are working with video, which is expensive to store and serve, the implication: there's little need to keep your video close to the user or on a CDN after 30 days.

 

Tuesday
Feb282012

Sponsored Post: Oracle, Percona Live, AiCache, ElasticHosts, Red 5 Studios, Logic Monitor, New Relic, AppDynamics, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • Red 5 Studios. Wanted: DBAs and Programmers interested in MySQL scalability and replication. If interested, please see us here

Fun and Informative Events

  • The Percona Live MySQL Conference & Expo features 60+ speakers, 72 breakout sessions, and keynotes from HP, Facebook, Box, Eucalyptus Systems, and more. April 10-12 in Santa Clara

Cool Products and Services

  • Join the MySQL experts from Oracle to learn Oracle's strategy for MySQL as well as the latest product development and features. So learn more and register now!
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site.
  • ElasticHosts award winning cloud server hosting launches across North America. Adding data centers in Los Angeles and Toronto. Free trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
Feb272012

Zen and the Art of Scaling - A Koan and Epigram Approach

This is a guest post derived from an email conversation with Russell Sullivan, a computer architect and creator of Alchemy Database, A Hybrid RDBMS/NOSQL-Datastore.

Russell (AKA Jak Sprats) has been pondering, considering, and implementing distributed databases for many years. In a recent email conversation he shared 44 of the lessons he has learned from developing the infrastructure for high performance / highly scalable systems. Some are well known, some are debatable, and some obviously result from a deep experience that is worth learning from:

  1. There are maybe 20 classic bottlenecks (CPU, NIC overload, memory fragmentation, disk seeks, swap, thread deadlock, packet loss, etc.), have a basic understanding of them, because each is a dark tunnel, and you need a specialised flashlight for each.
  2. Click to read more ...