Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy


Customer: - Name - Country Product: - Code - Name - Description Purchases: - Reference to Product Entity - Reference to Customer Entity - Date of orderAnyone from a relational background would look at this schema and give it a big thumbs up. With a little effort we can imagine the original physical purchase order that has now been normalized into three different tables. To recreate the original purchase order a join on purchases, produce and customer is needed. Read speed is not optimized, safety is optimized. Here’s what the same schema looks like optimized for reading:
Purchase: - Customer Name - Customer Country - Product Code - Product Name - Purchase Order Number - Date Of OrderThe three original tables have been folded into one entity. Now a purchase order can be read in one get operation. No join necessary. Notice how the entity looks more like an original purchase order. It is also what would probably be cached and is what our model would probably look like. But what if you want to update a product name or a customer name? Those attributes are duplicated in all entities. Here’s where the protection offered by the relational model comes in. Only one entity needs updating in a normalized model. In BigTable you have to remember everywhere a customer name and product name and change every instance to new values. It’s not a simple, safe, or reliable approach. But it does optimize for read speed and scalability. For an application with a high proportion of updates to reads this approach wouldn’t make sense. But on the web reads usually dominate. How often do you really change a customer name or a product name? Seldom. How often do you read them? All the time. Designing to scale for reads and taking the pain on writes takes some getting used to. It’s a massive change to standard relational tactics. But this is what it takes to scale web applications, even if it feels a little strange at first.
Om proposes one solution to the Twitter Problem is to limit followers to three square meals a day. The reasonable idea being that lower limits should mean fewer scaling problems. And as a kicker raising those limits is a good way to raise much needed revenue. Scoble thinks users should consume without limit and will drive to another buffet if all-you-can-eat privileges are revoked. The reasonable idea being that if an internet service can't solve internet scale problems then there's not much use for it. Dave says comp power users a top floor suite and shower them with free passes to the buffet. Let the good times roll! The reasonable idea being that power users help create popular restaurants, er, services in the first place and limiting them starves users and starved users won't come back. So, should web services like Twitter be a buffet, a fixed eight course fine dining experience, a small plate restaurant, a family style joint, or a vending machine? Or something else entirely? In a distant barely remembered past I actually worked at an all-you-can-eat buffet. The food was very good and most customers didn't over over indulge. If they did the place wouldn't stay in business long. But some customers did. They were called stackers. Stackers were so named because a large stack of plates would pile up on their table throughout the meal. Stackers followed a power law distribution. Few customers at any one time were stackers, but their effect could be devastating. How devastating depended on their favorite foods... A stacker who loved potato salad was manageable. We had plenty of potato salad and it was cheap and quick to make. No problem. Stacking itself was not frowned upon and never discouraged. It's an all-you-can-eat buffet after all! But if a stacker's favorite food was roast beef, that was trouble. Not only is roast beef expensive, it comes in a limited supply because it has to be prepared ahead of time. Once you ran out there was no more roast beef for the rest of the night. Good roast beef takes hours to prepare, it must be planned for. Management's job was to carefully balance projected demand against waste. The goal was to prepare enough meat to meet demand, yet not have a lot of left-overs. Stackers blow apart the finely balanced calculation of how much roast beef to make and the carving station is left trying to push the ham while apologizing for an embarrassing lack of roast beef. An ugly ugly scene. As a carver you are armed with a long scary looking knife and you are shielded by Medieval chain-mail looking glove, but hungry customers are mean and fast. You never see it coming. Unfortunately the distribution of stackers on any given night is unpredictable. You can't always cook a maximum amount of meat or you'll go broke. And if you make too little everyone is unhappy. It needs to be just right. As a person with serious stacker tendencies I try to remember the cost of things and keep a reasonable balance. The only way to make Goldilocks happy and have just the right balance is to place limits. Eventually the restaurant had to limit the number of trips to the roast beef station to three a meal. Enough that you get value for your dollar, but not so much that the restaurant goes under. Everyone happy? Of course not. The world doesn't work like that. It's all-you-can-eat some would say so I should be able to eat all I can eat ! But there are always limits. Would it be fair to back a truck up to the restaurant and start loading up because that's part of your meal? No. Is it fair to stuff your backpack with food on the way out? No. So there are always limits. The question is what are fair limits? It has been said FriendFeed has no problems handling 10,000 friends so neither should Twitter. Now, let's imagine if I spun up 1000 EC2 servers whose only task was to add more friends to feed. Would FriendFeed limit me then? Of course. It's basic web site self-defense, a right guaranteed under the constitution and long recognized by the courts in certain situations. But still, what are fair limits? How much roast beef should you be able to eat? Limit setting is a strategy we've talked about many times as a way of protecting sites from complete devastation. My favorite example is Mailinator whose prime directive is surviving attacks and they've deployed many clever practices in their own defense. And most every large web site on earth is busy watching your every move so they can bounce you at the first sign of DDOS Armageddon. Limits aren't inherently bad. But limits don't make you scale, they simply stop you from unscaling. An adequate scalable infrastructure must still be put in place. In the end I agree with Scoble in that the power of the internet is having interesting conversations with interesting people about interesting topics. For interesting conversations to happen you must be able to freely create relationships. If you or they have to pay for relationships they simply won't form. Would Google's Page Rank algorithm work so well if it could only analyze paid relationships? A web formed under a paid relationship model would look totally different and be decidedly less valuable. Similarly, a social network that can't grow naturally through preferential attachment would have much less value. Scaling relationships is a core social network competency. Relationships should be subject to DDOS type limits, but not limits artificially out of proportion with a user's internet audience. I doubt Twitter would disagree, but they are going through a tough time right now. I also agree with Om. The Freemium model is a great idea and linking that to site protecting prophylactics is even better. But limiting a core competency may not be the right target. Fotolog is an example of a service that puts Freemium ideas to good use. They charge extra for adding more photos a day, more comments a day, custom profile abilities, and social status add ons. What is the equivalent in Twitter? I don't know, but I would try to treat relationships more like potato salad than roast beef. And I also agree with Dave. It's hard to get noticed on the web. Those who help you storm the attention barrier shouldn't be punished. They should be rewarded with a tasty appropriately sized meal.
High Performance Multithreaded Access to Amazon SimpleDB is a great follow up to the idea in How SimpleDB Differs from a RDBMS that more programming is the price paid for performance in SimpleDB. It shows how much work and infrastructure is required to batter better performance out of SimpleDB. Remember, in SimpleDB you get keys to records from queries so if you want to get all the fields for records you need to make separate requests. Since SimpleDB isn't exactly a speed daemon the obvious strategy is to parallelize. Even if a job takes a 100 msecs you can get a lot done in a little time if you can execute enough jobs in parallel. Parallelization is the approach taken by Haakon@AWS in his Java code example of how to get the most out of SimpleDB. You can find the code at Indexing and Querying Amazon S3 Metadata with Amazon SimpleDB. We'll also consider how a back-end service architecture built on Erlang may be a better fit with cloud computing. Two general mechanisms of parallelism are available: threads and boxes. To get the most bang out of a single machine you need threads (events, etc). To scale beyond the load handled by a single machine you need multiple boxes. The example code uses the Executor Thread Pool for parallelism within a program. Thread pools are a pretty common idiom by now. Amazon's queue service SQS was used to distribute work amongst boxes. Work was queued to SQS in batches of 1000 work items. The items were pulled by the thread pool and processed. Why 1000? The idea is to balance processing overhead with work overhead. You don't want popping items off SQS to dominate your processing time so you have to do enough work in each pass to make it worth the investment. The architecture uses two thread pools: one to run queries and one to get record values. Applications must carefully tune the number of threads in each pool so the queries to overwhelm the gets. Using a query thread pool with 2 threads and a get thread pool with 32 threads it was possible to perform 300 TPS on a small EC2 instances. Theoretically the advantage of this architecture is that it will scale to any size you need. SQS is your work distribution backbone and you just spin up the number of thread pool instances you need. The disadvantage is that this is a lot of programmer effort. But let's consider that you had to do some serious processing on each record, you would need something like this approach anyway to scale out the processing. But to perform simple aggregation operations it's total overkill which is why more time needs to be spent on the write site of the equation in SimpleDB/BigTable than the read side as we are used to with a RDBMS. What's the best way to go parallel? On the front-end life is simple. Go shared nothing and compose your pages from scalable back-end services. This is how Amazon does it and it's how Google AppEngine does it. GAE completely punts on the back-end service layer architecture. Unfortunately we still need to create a back-end architecture for more complex applications. Thread pools and SQS is one parallelization approach. Instead of thread pools something like Java's fork/join framework could be used. Initially I thought piling on more low level primitive threading facilities into Java was the wrong way to go. Yes, it is a "'multicore-friendly lightweight parallel framework' that supports a style of parallel programming where problems are recursively split into smaller fragments, solved in parallel and recombined," but it's also a style of programming that is very difficult to program correctly. If cloud architectures will rely on these primitives for efficiency then I think we have regressed. Erlang style architectures described by Luke Hoersten in Scalable Web Apps: Erlang + Python is a simpler more reliable to programming model. An event driven actor based approach is much harder to screw up than closely cooperating threads in a shared memory space. Erlang originally ran in embedded systems where the requirement was to reliably squeeze the most work possible out of limited CPU and other compute resources. Oddly enough the embedded node of old closely parallels your basic cloud VM. Start your work horse Erlang (or other similar system) instances and let them efficiently chew up your work loads. Erlang's scheduling model fits perfectly with a service centric job engine cloud instance. It will get more work done then your typical thread based system ever would.
Update 2: A HSCALE benchmark finds HSCALE "adds a maximum overhead of about 0.24 ms per query (against a partitioned table)." Future releases promise much improved results. Update: A new presentation at An Introduction to HSCALE. After writing Skype Plans for PostgreSQL to Scale to 1 Billion Users, which shows how Skype smartly uses a proxy architecture for scaling, I'm now seeing MySQL Proxy articles all over the place. It's like those "get rich quick" books that say all you have to do is visualize a giraffe with a big yellow dot superimposed over it and by sympathetic magic giraffes will suddenly stampede into your life. Without realizing it I must have visualized transparent proxies smothered in yellow dots. One of the brightest images is a wonderful series of articles by Peter Romianowski describing the evolution of their proxy architecture. Their application is an OLTP system executing 200 million transaction per month, tables with more than 1.5 billion rows, and a 600 GB total database size. They ran into a wall buying bigger boxes and wanted to move to a sharded architecture. The question for them was: how do you implement sharding? In the first article four approaches to sharding were identified:
Update: Arjen links to video Supporting Scalable Online Statistical Processing which shows "rather than doing complete aggregates, use statistical sampling to provide a reasonable estimate (unbiased guess) of the result." When you have a lot of data, sampling allows you to draw conclusions from a much smaller amount of data. That's why sampling is a scalability solution. If you don't have to process all your data to get the information you need then you've made the problem smaller and you'll need fewer resources and you'll get more timely results. Sampling is not useful when you need a complete list that matches a specific criteria. If you need to know the exact set of people who bought a car in the last week then sampling won't help. But, if you want to know many people bought a car then you could take a sample and then create estimate of the full data-set. The difference is you won't really know the exact car count. You'll have a confidence interval saying how confident you are in your estimate. We generally like exact numbers. But if running a report takes an entire day because the data set is so large, then taking a sample is an excellent way to scale.
It's a sad fact of life, but processes die. I know, it's horrible. You start them, send them out into process space, and hope for the best. Yet sometimes, despite your best coding, they core dump, seg fault, or some other calamity befalls them. Unlike our messy biological world so cruelly ruled by entropy, in the digital world processes can be given another chance. They can be restarted. A greater destiny awaits. And hopefully this time the random lottery of unforeseen killing factors will be avoided and a long productive life will be had by all. This is fun code to write because it's a lot more complicated than you might think. And restarting processes is a highly effective high availability strategy. Most faults are transient, caused by an unexpected series of events. Rather than taking drastic action, like taking a node out of production or failing over, transients can be effectively masked by simply restarting failed processes. Though complexity makes it a fun problem, it's also why you may want to "buy" rather than build. If you are in the market, Supervisor looks worth a visit. Adapted from their website: Supervisor is a Python program that allows you to start, stop, and restart other programs on UNIX systems. It can restart crashed processes.
Update: Nice explanation in The importance of bandwidth versus latency of how long latencies cause cascading delays in resource loading. Doloto tries to optimize how resources are loaded. Twenty new rules have been added to the original 14 rules for sizzling web performance. Part of scalability is worrying about performance too. The front-end is where 80-90% of end-user response time is spent and following these best practices improved the performance of Yahoo! properties by 25-50%. The rules are divided into server, content, cookie, JavaScript, CSS, images, and mobile categories. The new rules are:
Cal Henderson writes at thinkvitamin.com: "With our so-called "Web 2.0' applications and their rich content and interaction, we expect our applications to increasingly make use of CSS and JavaScript. To make sure these applications are nice and snappy to use, we need to optimize the size and nature of content required to render the page, making sure we’re delivering the optimum experience. In practice, this means a combination of making our content as small and fast to download as possible, while avoiding unnecessarily refetching unmodified resources." A lot of good comments too.
A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-)