« Riak - web-shaped data storage system | Main | Building a Unique Data Warehouse »
Wednesday
Oct072009

How to Avoid the Top 5 Scale-Out Pitfalls

Scale-Out is incrementally adding servers as needed to scale rather than buying larger servers. Here's the MySQL idea of what a scale-out architecture looks like:


This MySQL article lists 5 problems to avoid when scaling out:
  1. Don't Think Synchronously. Introduce asynchronous communication, parallelization, and strategies to deal with approximate or slightly outdated data.
  2. Don't Think Vertically.  Scaling by bigger machines won't work. Plan on horizontal scaling and asynchronous architectures form the start which make it easy to add capacity on demand.
  3. Don't Mix Transactions with Business Intelligence. Transactions and analytics are inherently different. Separate out different types of data onto different databases.
  4. Avoid Mixing Hot and Cold Data. Static and fast changing data are inherently different. Separate out different types of data onto different databases.
  5. Don't Forget the Power of Memory.  Make data accessible in RAM by smartly partitioning data across servers.

More information at Scale-Out & Replication Best Practices for High-Growth Businesses.

Reader Comments (3)

They left out a couple of other important pitfalls:

(1) *No* single-master solution is adequate for real scale.

(2) Strong consistency conflicts with high scale.

(3) Cross-product operations (e.g. SQL joins) require exponential computation and/or communication, and don't scale.

In high-scale applications, preference should be given to fully distributed solutions where failure of a component (even with failover) impacts only some of the data or some of the users (e.g. sharding), where requests aren't serialized by questionably-necessary consistency maintenance (e.g. non-ACID databases), and where any exponential-time operations are carefully avoided.

Of course, it's entirely predictable that Oracle/Sun/MySQL would omit these, given the nature of their non-solutions. For people who used to say "the network is the computer" they sure don't Get It when it comes to the modern grid/cloud/whatever.

October 9, 2009 | Unregistered CommenterJeff Darcy

May I also suggest one problem to solve when scaling out--though it's one that often overlooked by us as developers.

Some of the administrative tasks that must be undertaken for each machine in an application stack (such as regular system patching) grow linearly as the stack scales out. However, sysadmin resources don't often grow at the same rate, so it takes longer to maintain system consistency across the stack as it scales out.

October 10, 2009 | Unregistered CommenterTed Walpole

"Don't think vertically?"

Really? There are many thousands of OLTP applications that need scaling upwards. Scaling is not just about Web2.0/3.0 trends. There are more applications that require scaling that aren't DB sharded or async processed and codebases with customers and customizations so large and complex one can not simply change it "just like that."

Scaling vertically is a challenge unto itself. I led a cluster of 112+ nodes against a grid'd DB that pulled local flat data, processed then pushed the results back up. Many billions of records/events to process requiring lots of CPU in aggregate. We bought tons of blade servers and never took into consideration the network nor management overhead so the CPUs were pegged/idle/pegged/idle, averaged not that well.

After tuning down to get 24 hours of data processed in 26 hours we bit the bullet and purchased a big IBM pSeries, threw Oracle Enterprise on it and went at it the old fashioned way. 24 hours of data processed in 6. Sometimes we overthink this stuff. Sure that AIX and Oracle license and pSeries cost a pretty $, but moreso did 100s of man-months doing it the other way.

October 13, 2009 | Unregistered CommenterXailor

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>