« Hot Scalabilty Links for October 30 2009 | Main | Digg - Looking to the Future with Cassandra »
Thursday
Oct292009

Paper: No Relation: The Mixed Blessings of Non-Relational Databases

This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.

The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?
There are a few things to like about this paper. A running a example is used to show the different ways to model data depending on which type of solution you are targeting, especially covering how many-to-many relationships are modeled, data integrity, and how to support optional attributes. There's also a brief survey of some of the major systems.
The most interesting section of the report is where it tackles the problem of design for non-relational systems. The approach has two different phases: design questions and design strategies.
The questions you should ask yourself about your problem are:
  1. What degree of normalization is sensible?
  2. Which entities participate in transactions together?
  3. Where are areas of high contention?
  4. What are the history requirements of the application?
  5. Is Eventual Consistency an option?
  6. Does a Hash Table already model your problem?
  7. Is the Entity/Attribute/Value pattern inherent in the data?
  8. Are there hierarchical or recursive relationships in the data?
  9. Are there natural functional boundaries to partition along?
  10. Are there compounding factors that might influence your design?

With a hefty amount of self-reflection behind you, not it's time to follow a few strategies:

  1. Logical Model First
  2. Consider Several Physical Approaches
  3. Keep It Simple
  4. Play It Safe
  5. Show Your True Consistency
  6. Stick To The Map (Reduce)
  7. Evolve Gracefully

The summary ends up on a good note I think. Key-value systems may be just a feature of a larger database management system instead of standalone product:

This author would advocate, therefore, that the developments exemplified by nonrelational databases should not remain an outside challenger to the legacy of relational databases, but should instead be researched, understood, and eventually, incorporated into a unified model. There's nothing to say that implementation as a key/value store shouldn't be part of the suite of implementation choices for a database whose data is structured relationally; likewise, there is room in the world of relational databases for the conceptual data design advantages offered by non-relational databases; the option to use optimistic concurrency control, to keep multiple versions of a cell per the columnar database model, to accept and support semi-structured (or run-time structured) data efficiently, to maintain multiple simultaneous values for a cell, and to scale across a cluster using some sort of ancestry or grouping relationship—these would all be conceptually coherent additions to the relational database world, provided the mathematical model for their incorporation is sound, and the configuration of the options is transparent and cohesive.

Related Articles

 

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (5)

Is there a URL or at least a reference to where the paper can be found?

October 29, 2009 | Unregistered CommenterGlen Campbell

Sorry, it takes a while to overcome my Drupal muscle memory where the link is included as part of the post.

October 29, 2009 | Registered CommenterHighScalability Team

This was an amazing read !

October 29, 2009 | Unregistered Commenterdjoog

Really amazing and useful read! Thanks a lot

October 31, 2009 | Unregistered Commenterzihotki

No mention of the Herarchical databases of yesteryear on mainframe hardware.
The Mainframe technical problem domain was very similar to the emerging cloud architecture, and merits some investigation.

The mainframe solution space developed into massively parallel systems, with simpley-indexed fixed-length records, whose structure was application-defined.
Modern IO hardware wouldn't especially benefit from fixed-length records, but all the rest petty much applies to cloud.

j

March 12, 2010 | Unregistered Commenterjmullee

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>