We Finally Cracked the 10K Problem - This Time for Managing Servers with 2000x Servers Managed Per Sysadmin
Tuesday, November 19, 2013 at 8:54AM
HighScalability Team in DevOps

In 1999 Dan Kegel issued a big hairy audacious challenge to web servers:

It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.

This became known as the C10K problem. Engineers solved the C10K scalability problems by fixing OS kernels and moving away from threaded servers like Apache to event-driven servers like Nginx and Node.

Today we are considering an even bigger goal, how to support 10 Million Concurrent Connections, which requires even more radical techniques.

No similar challenge was issued for managing servers in a datacenter, but according to Dave Neary from Red Hat, in a recent FLOSS Weekly episode, we have passed the 10K barrier for server management with 10,000 or more servers managed per sysadmin.

Should we let this milestone pass without mention?

Absolutely not! It’s a stunning accomplishment with 200x-2000x increases in productivity. Dave said he remembered in the 1990s it took one sysadmin to manage 4 or 5 Windows servers. A Linux sysadmin could manage 50 to 60 servers.

Now companies are managing over 10,000 servers per sysadmin. This huge change is rooted both in IaaS, treating a datacenter as an elastic programmable resource, divorcing operations from infrastructure deployment, and in the DevOps revolution, with its emphasis on tools, culture, automation, metrics, sharing of resources, and infrastructure as code.

What will it take to manage 10 million servers per sysadmin?  

Who might know? Google of course.

As James Hamilton says, Counting Servers is Hard, but Microsoft says they have 1 million servers, and Google is planning for 10 million servers, so it may take a while before we can get to 10 million servers per sysadmin.

But when it does happen the base will be built on:

At a high level the approach of scaling to 10 million connections per server and scaling 10 million machines per sysadmin are the same: scalability is specialization.

But at lower level they differ completely. Scaling to 10 million connections is about removing layers and doing the work yourself. Scaling to 10 million servers is all about putting the intelligence into smarter and smarter layers. A lot like how human body utilizes trillions of individual components mediated by many autonomous systems all directed by a parallelized and decentralized brain.

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.