Against all the odds
Friday, July 17, 2009 at 3:18AM
HFadeel in databases, key-value store

This article not about Mariah Carey, or its song. It's about Storing System, Database.

First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution.

I think this problem come from the education, personally, and some companies also I think it's involved in this.

To start to fix this bad thinking, we all should agree in the following points:

  1. The End of an Architectural Era (It's Time for a Complete Rewrite)

  2. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

I hope if you agreed with me in the previous points. So the question do we really need Database in every application?

There are many scenario shouldn't use Database resisters, such as: Web search engine, Caching, File sharing system, DNS system, etc. In the other hand there many of scenarios should use Database, such as: Customer database, Address book, ERP, etc.

Tiny URL services for example, shouldn't use Database at all because it's require very simple needs, just map a small/tiny URL to the real/big URL. If you start agreed with me, you likely want ask: But what we can use beside or instead of Databases?

There are a lot of tools that fallowing CAP, BASE model, instead of ACID model. But first let's describe ACID:

  1. The problem with ACID is that it gives you too much; it trips you up when you are trying to scale a system across multiple nodes.

  2. Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures.

  3. To make scalable systems that can handle lots and lots of reads and writes you need many more nodes.

  4. Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed.

In other hand CAP model is about:

  1. CAP is easy to scale, distribute. CAP is scalable by nature.

  2. Everyone who builds big applications builds them on CAP. Who use CAP: Google, Yahoo, Facebook, Kngine, Amazon, eBay, etc.

For example in any in-memory or in-disk caching system you will never need all the Database features. You just need CAP like system. Today there are a lot of: column oriented, and key-value oriented systems. But first let's describe Column oriented:

A column-oriented is a database management system (DBMS) which stores its content by column rather than by row. This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items. This approach is contrasted with row-oriented databases and with correlation databases, which use a value-based storage structure. For more information check Wikipedia page.

Distributed key-value stores:

Distributed column stores (Bigtable-like systems):

Something a little different:

Resource:

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.