Friday
May252012
Stuff The Internet Says On Scalability For May 25, 2012
Friday, May 25, 2012 at 9:15AM
It's HighScalability Time:
- 15 million 64 GB iPods every day: IBM on the Big Bang; Billions of msgs/day, 75Bn ops/day, 1.5M ops/sec peak. 250TB new data/mo and growing: Facebook HBase stats; 72 hours of video uploaded every minute: YouTube
- Quotable Quotes:
- @ammeep: Great discussion on data scalability and caching approaches at #yellowcamp "a cache is like your best mate you can never trust"
- @adrianco: Why do OpenStack talks I go to always focus on the story of how they did it rather than how it works and what it does for me? #Gluecon.
- Cliff Moon: Incuriosity killed the infrastructure.
- @seldo2: "NoSQL is a land grab against the DBAs the prevent you from doing shit" - @monkchips #gluecon
- @mjasay: Netflix's @atseitlin: Build for fail, not for scale. Focus on service-level metrics, not individual servers, when monitoring system health
- @peakscale: I liked the part about "graph DBs are a niche data model... that only covers 95% of situations" #gluecon
- joshfraser: I personally prefer using the 95th or 99th percentiles because they force you to keep the mindset that you need to be fast for everyone
- In the best laid plans of mice and men department: Nasdaq's Facebook Glitch Came From Race Conditions. Seems as if all the bidding action starved the price calculator of enough time to do the actual calculation.
- How We Serve Faster Maps from MapBox. Will White with an excellent post on how their custom mapping platform scales quickly using Node.js, Backbone.js, Nginx, Puppet, Jekyll, EC2, CouchDB, CloudWatch, Loggly, PagerDuty, EBS, GitHub, and Chargify.
- Obviously adding on compute on demand is a lot easier than adding database capacity on demand. So don't wait until the last minute. When your shards approach X% full then add capacity. That's what Pinterest does.
- Yes, a lot of companies use Python in their projects. Need that soothing feeling of confirmation? Here's a list.
- Event-Sourced Architectures by Martin Thompson at QConLondon 2012. Matthew Skelton with a great summary of a talk given by Martin Thompson on event-sourced architectures, and why event-driven, state-machine designs are the way forward for complex, multi-path software systems. I couldn't agree more.
- Good Google Groups thread on bulk uploading data from S3 into HBase. Store the data into HBase from the start seems the best idea. Or use Hive over S3. Also, HBase Schema Design.
- Caching in Search. Hugh Williams explains caching works really well in search because we pretty much all ask the same questions. Storing the results for the top 100,000 queries saves the search backend from processing 34% of all queries.
- OpenFlow @ Google: Brilliant, but not revolutionary. Ivan Pepelnjak with a stirring work of fiction telling the tale of how Google is building their own networking infrastructure out of software and off-the-shelf chips. Keep in mind for Google the datacenter is the computer, so building custom everything has a huge paypack in reducing latency variance.
- Multithreading Problems In Game Design. Erik McClure advises: don't try to multithread your games. It isn't worth it. Separately parallelize each individual component instead and write your game engine single-threaded; only use additional threads for asynchronous activities like resource loading. Good discussion on Reddit. Threads, well, they are tricky.
- Judging by all the #gluecon tweets they all must have had a good time in the great state of Colorado. An under dressed Graeme Thickins has the live blog.
- Book recommendations from @jamesurquhart: Diversity and Complexity (Primers in Complex Systems) and Complexity: A Guided Tour
- What’s hot in APIs? John Musser with an energizing list of the 10 of the hottest trends in open APIs today: VC Funding, API growth rate, REST, JSON, API billionaires and trillionaires, API as product, Hackathons, Monetizing APIs, Invisible mashups.
- Phillip Tellis with a deep dive on The Statistic of Web Performance. Covers Bandwidth, latency, page load times, the stastics used to quantify performance.
- Scale and scalability: Scalability is the ability to grow your datacenter without growing the amount of work, time or money needed to support it. I mentioned some of the main factors here as a teaser. I challenge readers to look at their own systems with a critical eye: what does it cost you as you add more IT.
- An interesting explanation of how to create a Redundant Array of Independent Clouds.
- Peeking Under the Hood. John Sloan with a broad history of snchronization, test and set, CAS, and the troubles with achieving certainty in a multi-certain world. Really good stuff.
- I don't think this is an April fools joke: Resistance is futile? Memristor RAM now cheap as chips - UCL breakthrough after team toyed with LEDs
- Great writeup on server scalability by Scott Miller: CPUs, Cores and Threads: How Many Processors Do I Have? Knowing how many processors you actually have is not so easy. Excellent explanation of the terminology and the concepts behind the microprocessor marketing.
- Urs Holzle @ Open Networking Summit 2012. James Hamilton with a exquisitely detailed summary of the happenings at the Open Networking Summit. The key observations behind SDN: 1) if the entire system is under single administrative control, central routing control is possible, 2) at the scale of a single administrative domain, central control of networking routing decisions is practical, and 3) central routing control allows many advantages including faster convergence on failure, priority-based routing decisions when resource constrained, application-aware routing and it enables the same software system that manages application deployment to manage network configuration.
- Cloud computing Walmart effect: Call it the "Walmart effect" of the cloud. Gone are your system admins, routing professionals, and DBAs. They are replaced by cloud computers and cloud databases. All that talent that you could have on your team is deemed marginal and useless with phrases like "cost of sys admin" instead you rely on white papers written by people with self serving motives as your source of truth. Inferior products like Amazon high latency network can push out smaller better faster network providers. Why even write innovative code? You can not compete with Amazon DynamoDB, or whatever Amazon white paper says is the solution to your problem?
Today's Musical Selection:
Reader Comments (2)
Thanks again for the shout out. I write the stuff I write because it interests me, and I routinely use my own blog as a reference. That's really what my blog and all the software projects on my web site are about: references and reference implementations primarily for my own use. I'm always a little pleasantly surprised when others find my stuff interesting or useful.
Thanks for the pointer to Scott Miller's write up on "How many processors do I have?". My only nit is that as an old guy, I find his definition of "processor" -- which is basically the physical chip -- a little bothersome. He's spot on in that's how the typical consumer as the local computer store thinks of it. But us old folks (I'm basically a brain in a jar) tend to think "processor" and "CPU" are the same thing, or maybe "processor" is the CPU with its I/O surround. I think I'd use the term "package" instead. Miller's remarks on the difficulty of figuring out what the heck you really have is also spot on. You almost need to look at the chip schematic and data sheet to figure out what actual functional units you have that may work in parallel (e.g. floating point processing units, arithmetic units, etc.).
Thanks again.
"Nasdaq characterized the problem as a race condition. A race condition occurs when two or more parts of a program that rely on each other get locked in an infinite loop, halting forward progress of the program as a whole." ... isn't that actually rather a live lock?