Thursday
Aug162007
Scaling Secret #2: Denormalizing Your Way to Speed and Profit

Alan Watts once observed how after we accepted Descartes' separation of the mind and body we've been trying to smash them back together again ever since when really they were never separate to begin with.
The database normalization-denormalization dualism has the same mobius shaped reverberations as Descartes' error. We separate data into a million jagged little pieces and then spend all our time stooping over, picking them and up, and joining them back together again.
Normalization has been standard practice now for decades. But times are changing. Many mega-website architects are concluding Watts was right: the data was never separate to begin with. And even more radical, we may even need to store multiple copies of data.
Information Sources
We normalize data to prevent anomalies. Anomalies are bad things like forgetting to update someone's address in an all the places its been stored when they move. This anomaly happens because the address has been duplicated. So to prevent the anomaly we don't duplicate data. We split everything up so it is stored once and exactly once. Bad things are far less likely to happen if we follow this strategy. And that's a good thing.
The process of getting rid of all potential bad things is called normalization and we have a bunch of rules to follow to normalize our data. The price of normalization is that when we want a person's address we have to go find the person and their address in separate operations and bring the data together again. This is called a join.
The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again. Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update.
If you say your database is the bottleneck then the finger is pointed back and you and you are asked what you are doing wrong. Have you created proper indexes? Is your schema design good? Is your database efficient? Are you tuning your queries? Have you cached in the database? Have you used views? Have you cached complicated queries in memcached? Can you get more parallel IO out of your database?
And all these are valid and good questions. For your typical transactional database these would be your normal paths of attack. But we aren't talking about your normal database. We are talking about web scale services that have to process loads higher than any database can scale to. At some point you need a different approach.
Many mega-scale websites with billions of records, petabytes of data, many thousands of simultaneous users, and millions of queries a day are doing is using a sharding scheme and some are even advocating denormalization as the best strategy for architecting the data tier.
We sees this with Ebay who moved all significant functionality out of the database and into applications. Flickr shards and replicates their data to reach high performance levels. For Flickr this moves transaction logic back into their application layer, but the win is higher scalability.
Joe Gregorio has identified some common themes across these new mega-data systems:
It's the web model pushed to the data tier. Ironically, it may take a web model on the back-end to support a web model on the front-end.
The Great Data Ownership Wars: The Database vs. The Application
A not so subtle clue as to who won the data wars is to look at the words used. Data that are split up are considered "normal." Those who keep their data whole are considered "de-normal." All right, that's not what those words mean, but it was to good to pass up. :-)
Traditionally the database owns the data. Referential integrity, triggers, stored procedures, and everything else that keeps the data safe and whole is in the database. Applications are prevented from screwing up the data.
And this makes sense until you scale. Centralizing all behavior in the database won't mega-scale as the web does, which is why Ebay went completely the other way.
Ebay maintains data integrity through a service layer that encapsulates all data access. The service layer handles referential integrity, managing replicated copies, doing joins, and so on. It's more error prone than having the database do all this work, but you are able to do scale past what even the highest end databases can handle.
All this sharding and denormalization and duplicating at one levels feels so wrong because it's so different than we were all taught. And unless you are a really large website you probably don't need to worry about this level of complexity. But it's a really fascinating and unexpected evolution in design. Scaling to handle the world wide web requires techniques and strategies that are often at odds with our years of experience. It will be fun to see where it all leads.
Reader Comments (3)
You mention that one of the reasons we normalize is to "prevent anomalies." Yes, this may be true. But don't forget that normalization is an optimization for a write mostly database.
You're correct to point out that this is all in an effort to reduce I/O. In traditional OLTP applications, the database experiences a high write demand. In a normalized environment, the amounts of writes is minimized, because each atom of information is as small as it can get.
In read mostly databases, or traditional OLAP environments, normalization as you point out is a performance drain. In read mostly databases, I/O is precious but in the opposite way from OLTP environments.
So before you go running to denormalize everything, remember what the needs of your application are. Scaling applications which are read-mostly? Absolutely denormalize. But I haven't seen the literature or research that tells me that denormalizing is the solution to scaling a write-mostly application.
Depends on what you have to write.
If you're always doing a single update/insert/delete, that's fine and dandy. But if you end up having to write a chunk of data across several normalized portions, you're fetching multiple blocks from different parts of the hard drive. As opposed to fetching one big chunk from a denormalized table, updating several fields from that one contiguous block, and moving on.
The row data from a single table is stored "nearby". Writing to row data on various tables could be stored anywhere. More lookups, more IO, more delay.
I'm not quite sure what you mean when you say transactionless. If in a denormalized storage system, you will often have to update 2 or more tables to complete a logical transaction within the system. If you don't use some form of transaction model, then there is no way you can recover (redo or undo) if you fail midway through making the multiple updates for this logical transaction. Basically, you will end up with logical corruption in your system. Maybe you can fix this by building background scanning servers that try to detect and fix this logical corruption? What are your thoughts on this?