DynamoDB Talk Notes and the SSD Hot S3 Cold Pattern
Monday, May 14, 2012 at 9:15AM
HighScalability Team in Strategy

My impression of DynamoDB before attending a Amazon DynamoDB for Developers talk is that it’s the usual quality service produced by Amazon: simple, fast, scalable, geographically redundant, expensive enough to make you think twice about using it, and delightfully NoOp.

After the talk my impression has become more nuanced. The quality impression still stands. Look at the forums and you’ll see the typical issues every product has, but no real surprises. And as a SimpleDB++, DynamoDB seems to have avoided second system syndrome and produced a more elegant design.

What was surprising is how un-cloudy DynamoDB appears to be. The cloud pillars of pay for what you use and quick elastic response to bursty traffic have been abandoned, for some understandable reasons, but the result is you really have to consider your use cases before making DynamoDB the default choice.

Here are some of my impressions from the talk...

Store Hot Data in DynamoDB, Cold Data in S3, and use Hadoop/Hive to Make them Look the Same

One of the most interesting ideas of the talk is how a new Hadoop/Hive ecosystem is being used to act as a unifying bridge between DynamoDB and S3. The idea is that data stored in DynamoDB costs 10 times as much as data in S3, so what you want to do is move historical or cold data to S3 as soon possible and just keep the hot data in DynamoDB. For example, time series data is often stored by day, week, or month, so rather than keep all that historical time series data in DynamoDB, move it to S3 and save some money.

The problem is now you have two very different ways to access data. DynamoDB is purely programmatic access via tables and S3 is via files. How do you bridge that gap without writing a lot of code?

Using EMR and Hive sophisticated queries can be run against data in DynamoDB and S3, allowing a common data access layer against the cheapest storage option. It’s a good example of how well all these tools work together to provide a powerful ecosystem.

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.