6 Ways Not to Scale that Will Make You Hip, Popular and Loved By VCs

This is a hilarious presentation by Josh Berkus, called Scale Fail, given at O'Reilly MySQL CE 2011. Josh is entertaining, well spoken, and cleverly hides insight inside chaos. And he makes some dang good points along the way.
Josh has a problem, you see Josh has learned how to make sites that are both scalable and reliable. So he's puzzled why companies "whose downtime interfaces (Twitter) are more well known than their uptime interfaces" get all the attention, respect, and money for being failures. Just doing your job doesn't make you a hero. You need these self-inflicted wounds in-order to have the war stories to share at conferences. They get the attention. Just doing your job is boring. This is so unfair in that way life can be.
So if you want to turn the tables and take the low road to fame and fortune, here's Josh's program for learning how not to scale:
- Be trendy. Use the tool that has the most buzz: NoSQL, Cloud, MapReduce, Rails, RabbitMQ. It helps you not scale and the VCs like. Use Reddit to decide what to tool use. Whatever is getting the most points this week is what you should use.
- Troubleshoot after the barn door has closed. Math is not sexy. Statistics are not sexy. Forget resource monitoring, performance testing, traffic monitoring, load testing, tuning analysis. They are all boring. Be more intuitive. Let history be your guide. Whatever problems you had on your last job are the ones you'll have on this job.
- Don't worry about it. Parallel programming is not sexy. Erlang can parallel program a 1000 node cluster, but it's not sexy. Be hot and from the hip. Ignore details about memory and management. Don't worry about it. Use single-thread programming, lots of locks, ignore scope and memory contexts, have frequently-updated single-row tables, have a single master queue that controls everything, and blocking threads are your friend.
- Hit the database with every operation. Caching is not your friend. Every single query should go directly to the database. Ignore caches completely.
- Scale the impossible things. Scaling easy things is for wimps. There's no hotness there. There's no speaking engagements in scaling web servers, caches, shared-nothing hosts and simple app servers. Scaling the impossible things is where the hotness is: transactions, queues, shared file systems, web frameworks. This is how you can have the long nights and weekend and the war stories that will get you up on stage.
- Create single points of failure. No matter how large your software is, you must have a couple of places where a single point of failure will bring down your entire infrastructure. Like a single load balancer, or a single queue, or load balancers that run a 100% capacity.
Josh says that after following this program you'll have learned how not to scale and become the big macho guy on stage.
If you've worked as a programmer for any period of time this analysis strikes home. What we know about human nature is that to be a hero you must overcome great odds. If your code always works and never stops a release with a sev one bug, then paradoxically, you will not be an organizational hero. You will just be reliable Good Old Joe. The hero is the programmer who stayed up 7 days straight, mainlining red-bull, looking all bleary eyed, to get that last check-in just before the dead-line. All this to fix a problem they had in fact created from the start.
I may add that here at HighScalability.com we would love your story on how you keep your site up. Tell us your Hero story.
Reader Comments (9)
I would have found this funnier if it were a little less true! At one of the leading Silicon Valley virtualization company where I worked, I experienced all this first-hand. My clueless manager was fixated on the number of bugs fixed by a developer (s/he never understood the causal relationship between bug creation and fix) and even more clueless manager's manager going around repeating it as how great his team is performing. I agree, war stories are all too sexy and important and solid and dependable performance is dull and not worth anything.
Whatever you do, don't wory about copy editing or spell checking. It's not sext.
Old problem. Sardonic new twist. I've spent the last ten years tuning major n-tier and other complex integrated platforms and networks. I wish there was a cure for stupidity, but usually people need to lose jobs, money, prestige or a lot of sleep, blood, sweat and tears.
I've even run into the same people pushing the same crappy, unreliable and non-scalable solutions to major e-commerce companies over the years. You may think its funny, but there is nothing funny about 14 transactions per second on $10,000,000 platforms.
This has a large element of 'won't someone please pat me on the back, my arm is getting tired!' to it. But leaving that aside, it also makes assumptions about companies that are struggling with scalability that are incorrect in the vast majority of cases, and that show a really interesting bias on the part of the author.
What are these assumptions? Well, there are several, but they all broadly boil down to one thing: 'of COURSE everyone with scaling problems has an unlimited money supply!' When in fact most companies working on these problems have maybe five or six engineers, of whom approximately zero are experienced in serious scaling issues, and cannot afford to spend the requisite amount to recruit someone away from Google to join their team.
I mean, really. 'Erlang can scale to 1000 nodes!' Sure. You just toss the typical engineer with 7 years' experience in java, C++, or (God help us) perl into the Erlang pit and see what comes out the other end... and how long before it does. And in the mean time the original code base has one fewer programmer to patch things together in the short term.
The job I had that had scaling problems, we had six developers, two of whom were fresh out of college and only one of whom had more than ten years' development experience. No one was expecting 100 hits per second, let alone what we got. Perhaps we should have started interviewing experienced scalability experts at that point? Or maybe the engineers should have started reading up on Erlang in their copious free time?
Or consider Reddit, victim of one of those little snide remarks in here. At one point, their traffic doubled, in the course of a few weeks. If I recall correctly, they had at that time a total of three engineers, to maintain and expand a web site which probably is within the 100 (certainly 200) busiest web sites in the world, and probably the only one of that list to be unable to use normal caching for any but the most trivial uses (since vote and comment counts must be kept up to date with at least reasonable (certainly 'within a few seconds') timeliness, and every logged-in user's front page looks different (due to subscriptions).
This is ivory tower crap at it's finest: 'always solve the problem the right way' is all well and good for companies with exactly the right expertise, and all the money and time they need. For the other 99%, well, we will continue to do what we can, as we can. But if you want to come rearchitect our systems for us for free, great! We'll use the time to brush up on our Erlang, we promise.
And whatever you do, let lowgrade sarcasm be your guide. Everyone loves that.
Sounds like the last application I worked on
Nice attitude Fnordy... sounds like this might have hit a bit close to home, eh? Time to take responsibility for your own failures maybe? LOL. :-)
How is RabbitMQ hip and trendy but not Erlang (which RabbitMQ runs on)?
The Erlang part is refering to a certain website, famous among us. They didn't like Erlang for some reason and started rebuilding their website on another new language.
Not Reddit btw.
Another item for the program: blame your scalability programs on your languages, and not on your crappy code.