« A Yes for a NoSQL Taxonomy | Main | Squeeze more performance from Parallelism »
Wednesday
Nov042009

Damn, Which Database do I Use Now?

With so many database options available these days, like for the rest of life, it's natural to wonder how it all fits together. Amazon complicated, or rather expanded the available options by introducing RDS, their relational database service. RDS is MySQL safely cocooned as a manageable cloud element, resting boldly within an energy providing elastic CPU pool, supported by a virtually infinite supply of very capable virtualized storage .

MySQL in AWS is now easy to start, stop, monitor, backup, snapshot, expand, and effortlessly move up and down the instance hierarchy. What it's not, contrary to what you might expect, is a scale-out solution, it's a scale-up solution. You get more by buying a bigger instance, not by horizontally adding more instances. There's a limit. Admittedly a larger limit now with Amazon's new high memory instances.

That's OK, well maybe not for people who helped grow Amazon's ecosystem by offering a similar product, but so many projects use MySQL that this is a big win for a lot of people. It makes life easier even if the promise of infinite relational database storage has yet to be realized.

If one of the reasons you were considering using a Platform as a Service is to knock the database item off your worry list, RDS is one more reason to consider playing your own general contractor and orchestrating all the elements together yourself. As more services become packaged into cloud capable components this is likely how many systems will be bolted together in the future.

But we are left wondering, how RDS fits together with SimpleDB and all the other database options?

This is where James Hamilton comes in with his sense making One Size Does Not Fit All posting. As an organizing principle James came up with this taxonomy for databases based on what the customer wants to do, their intent: 

  1. features-first - This is the relation database bucket where the database provides a lot of high level features that make the programmer's job a lot easier. The downside is scaling is hard or very expensive. Examples: Oracle, SQL Server, DB2, MySQL, PostgreSQL, RDS.
  2. scale-first - These are systems that must scale from the start. They generally lack features and put a lot more responsibility back on the programmer. Examples: Voldemort, Ringo, SimpleDB, Ringo, Kai, Dynomite, PNUTS, ThruDB, Hypertable, CouchDB, Cassandra, MemcacheDB.
  3. simple structure storage - These are key-value stores where the emphasis is on storing and getting values of arbitrary structure. They generally don't have the features or scalability of other systems. Examples: file systems, Cassandra, BerkelyDB, SimpleDB.
  4. purpose-optimized stores - These are databases systems built to be good at one thing, like data warehousing. Examples: StreamBase, Vertica, VoltDB, Aster Data, Netezza, Greenplum.

I'm not sure where a graph database like Neo4j would fit in the list, but James has created a useful way to look at things and of course he provides a lot more detail in his post. So all you have to do is figure out your intention, find out where you are belong in the list, and make a choice :-) I know, you want it all, you deserve it all, but that doesn't seem to be happening. Sites for a very long time will be forged from a mix of technologies and we are the master mixoligists. 

Related Article

  • Windows Azure
  • NoSQL is a Horsless Carriage by Steve Yen. A nice NoSQL taxonomy: key‐value‐cache, key‐value‐store, eventually‐consistent key‐value‐store, ordered‐key‐value‐store, data‐structures server, tuple‐store, object database, document store, wide columnar store.

     

 

Reader Comments (5)

The new memory instances are nice, but you're still likely going to be killed in the long run by the crappy performance of their EBS volumes. A database that requires as much memory as one of those larger instances provide will also require a lot more consistent IOPs than you are a likely to get from a single EBS volume. You can raid/10 a few EBS to squeeze a little extra juice out of them, but you sacrifice a lot of their flexibility and utility by going down that path.

It's still better to partition your data across many smaller instances if you can manage it.

November 4, 2009 | Unregistered CommenterBryan Murphy

You have Cassandra in opposing camps. On one hand Casandra is "scale-first" and on the other Cassandra is among database systems that "generally don't have the features or scalability of other systems". Which one is it?

November 4, 2009 | Unregistered CommenterBrian

isn't the main design goal behind Cassandra horizontal scalability and availability.

November 4, 2009 | Unregistered CommenterJA

From what I read on their site, they do plan to offer replicated services soon, which will give you that scale out ability. Well, for reads anyway...

November 4, 2009 | Unregistered CommenterRobert

Where would you put a DB like InterSystems Caché or GT.M?

June 20, 2010 | Unregistered CommenterIan

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>