« Product: Facebook's Cassandra - A Massive Distributed Store | Main | Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases »
Thursday
Jul022009

Product: Project Voldemort - A Distributed Database

Update: Presentation from the NoSQL conference: slides, video 1, video 2.

Project Voldemort is an open source implementation of the basic parts of Dynamo (Amazon’s Highly Available Key-value Store) distributed key-value storage system. LinkedIn is using it in their production environment for "certain high-scalability storage problems where simple functional partitioning is not sufficient."

From their website:

  • Data is automatically replicated over multiple servers.
  • Data is automatically partitioned so each server contains only a subset of the total data
  • Server failure is handled transparently
  • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization
  • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
  • Each node is independent of other nodes with no central point of failure or coordination
  • Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, and the replication factor
  • Support for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart.

    They also have a nice design page going over some of their architectural choices: key-value store only, no complex queries or joins; consistent hashing is used to assign data to nodes; JSON is used for schema definition; versioning and read-repair for distributed consistency; a strict layered architecture with put, get, and delete as the interface between layers.

    Just a hint when naming a project: don't name it after one of the most popular key words in muggledom. The only way someone will find your genius via search is with a dark spell. As I am a Good Witch I couldn't find much on Voldemort in the real world. But the idea is great and is very much in line with current thinking on scalable database design. Worth a look.

    Related Articles

  • The CouchDB Project
  • Reader Comments (7)

    See also Cliff Moon's open source version of Dynamo - Dynomite, written in Erlang:

    http://github.com/cliffmoon/dynomite/tree/master

    December 31, 1999 | Unregistered CommenterAnonymous

    Interesting... having just read up on CouchDB, a couple obvious differences jump out:

    o Voldemort doesn't seem to mention anything akin to CouchDB's incrementally-updated views. Presumably you could roll your own, where the objects that make up the view are simply more objects with their own identifying keys.

    o CouchDB doesn't have the features that actually manage a distributed system, such as allocating keys to servers and handling failure scenarios. Presumably you could roll your own, where the CouchDB instances act as the storage mechanism in a similar way to how BerkeleyDB or MySQL can act as Dynamo's storage mechanism. If Voldemort is similar to the description of the Dynamo paper, then it presumably does have that.

    Since Voldemort and Dynomite apparently both aim to be an open source implementation of Dynamo, that naturally leads to a wish to compare those two. Looks like Voldemort is written in Java and hosted in SVN on Google Code, while Dynomite is written in Erlang and hosted on Github. Sounds like Dynomite gets all the cool points :)

    On the other hand, Dynamo is already in production use at LinkedIn and has some nice illustrated documentation, while I haven't seen anything indicating how mature or widely used (or not) Dynomite is....

    I'm also curious about the idea of combining Dynomite (or something like it) with CouchDB, since they appear to target slightly different levels of the stack. Dynamo and its kin don't actually take on the job of storage, after all. Is there something to be gained by combining its highly available architecture with CouchDB's highly available node, and incorporating CouchDB's incremental mapreduce views?

    December 31, 1999 | Unregistered CommenterCharlie

    Hi Todd, I am the primary author of this system. It is used in production at LinkedIn for a few applications that have high scalability requirements. We actually only just released the code and haven't done anything to promote the project yet (but we will!), so I am quite surprised you found it. We hope to become a little more findable when we do some self promotion. :-)

    As a someone who has long enjoyed this blog, I would welcome anyone who is interested to take a look. Feel free to contact me with any question, comments, bugs, or patches (jay.kreps at gmail).

    December 31, 1999 | Unregistered CommenterJay Kreps

    Having just read up on nonrelation database stuff, including CouchDB and dynamo, a couple obvious differences jump out:

    o Voldemort doesn't seem to mention anything akin to CouchDB's incrementally-updated views. Presumably you could roll your own, where the objects that make up the view are simply more objects with their own identifying keys.

    o CouchDB doesn't have the features that actually manage a distributed system, such as allocating keys to servers and handling failure scenarios. Presumably you could roll your own, where the CouchDB instances act as the storage mechanism in a similar way to how BerkeleyDB or MySQL can act as Dynamo's storage mechanism. If Voldemort is similar to the description of the Dynamo paper, then it presumably does have that.

    Since Voldemort and Dynomite apparently both aim to be an open source implementation of Dynamo, that naturally leads to a wish to compare those two. Looks like Voldemort is written in Java and hosted in SVN on Google Code, while Dynomite is written in Erlang and hosted on Github. Sounds like Dynomite gets all the cool points :)

    On the other hand, Dynamo is already in production use at LinkedIn and has some nice illustrated documentation, while I haven't seen anything indicating how mature or widely used (or not) Dynomite is....

    I'm also curious about the idea of combining Dynomite (or something like it) with CouchDB, since they appear to target slightly different levels of the stack. Dynamo and its kin don't actually take on the job of storage, after all. Is there something to be gained by combining its highly available architecture with CouchDB's highly available node, and incorporating CouchDB's incremental mapreduce views?

    December 31, 1999 | Unregistered CommenterCharlie

    Hi Jay. Thanks for making it available and taking the time to provide documentation. I knew about Voldemort because someone was kind enough to email about it and I was happy to post about it.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    I think that 'read-repair' as an alternative to a two phase commit works for a shopping cart update, but fails for 'bid on an auction'. Amazon has an obvious requirement for the former, and read-repair IS more efficient. However, many apps (like games, for instance) would have a problem using it.

    December 31, 1999 | Unregistered Commenterawootton

    does this have anything to do with harry potter?

    December 31, 1999 | Unregistered CommenterAnonymous

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>