Thursday
Jul022009
Product: Project Voldemort - A Distributed Database
Thursday, July 2, 2009 at 12:02AM
Update: Presentation from the NoSQL conference: slides, video 1, video 2.
Project Voldemort is an open source implementation of the basic parts of Dynamo (Amazon’s Highly Available Key-value Store) distributed key-value storage system. LinkedIn is using it in their production environment for "certain high-scalability storage problems where simple functional partitioning is not sufficient."
From their website:
They also have a nice design page going over some of their architectural choices: key-value store only, no complex queries or joins; consistent hashing is used to assign data to nodes; JSON is used for schema definition; versioning and read-repair for distributed consistency; a strict layered architecture with put, get, and delete as the interface between layers.
Just a hint when naming a project: don't name it after one of the most popular key words in muggledom. The only way someone will find your genius via search is with a dark spell. As I am a Good Witch I couldn't find much on Voldemort in the real world. But the idea is great and is very much in line with current thinking on scalable database design. Worth a look.
Reader Comments (7)
See also Cliff Moon's open source version of Dynamo - Dynomite, written in Erlang:
http://github.com/cliffmoon/dynomite/tree/master
Interesting... having just read up on CouchDB, a couple obvious differences jump out:
o Voldemort doesn't seem to mention anything akin to CouchDB's incrementally-updated views. Presumably you could roll your own, where the objects that make up the view are simply more objects with their own identifying keys.
o CouchDB doesn't have the features that actually manage a distributed system, such as allocating keys to servers and handling failure scenarios. Presumably you could roll your own, where the CouchDB instances act as the storage mechanism in a similar way to how BerkeleyDB or MySQL can act as Dynamo's storage mechanism. If Voldemort is similar to the description of the Dynamo paper, then it presumably does have that.
Since Voldemort and Dynomite apparently both aim to be an open source implementation of Dynamo, that naturally leads to a wish to compare those two. Looks like Voldemort is written in Java and hosted in SVN on Google Code, while Dynomite is written in Erlang and hosted on Github. Sounds like Dynomite gets all the cool points :)
On the other hand, Dynamo is already in production use at LinkedIn and has some nice illustrated documentation, while I haven't seen anything indicating how mature or widely used (or not) Dynomite is....
I'm also curious about the idea of combining Dynomite (or something like it) with CouchDB, since they appear to target slightly different levels of the stack. Dynamo and its kin don't actually take on the job of storage, after all. Is there something to be gained by combining its highly available architecture with CouchDB's highly available node, and incorporating CouchDB's incremental mapreduce views?
Hi Todd, I am the primary author of this system. It is used in production at LinkedIn for a few applications that have high scalability requirements. We actually only just released the code and haven't done anything to promote the project yet (but we will!), so I am quite surprised you found it. We hope to become a little more findable when we do some self promotion. :-)
As a someone who has long enjoyed this blog, I would welcome anyone who is interested to take a look. Feel free to contact me with any question, comments, bugs, or patches (jay.kreps at gmail).
Having just read up on nonrelation database stuff, including CouchDB and dynamo, a couple obvious differences jump out:
o Voldemort doesn't seem to mention anything akin to CouchDB's incrementally-updated views. Presumably you could roll your own, where the objects that make up the view are simply more objects with their own identifying keys.
o CouchDB doesn't have the features that actually manage a distributed system, such as allocating keys to servers and handling failure scenarios. Presumably you could roll your own, where the CouchDB instances act as the storage mechanism in a similar way to how BerkeleyDB or MySQL can act as Dynamo's storage mechanism. If Voldemort is similar to the description of the Dynamo paper, then it presumably does have that.
Since Voldemort and Dynomite apparently both aim to be an open source implementation of Dynamo, that naturally leads to a wish to compare those two. Looks like Voldemort is written in Java and hosted in SVN on Google Code, while Dynomite is written in Erlang and hosted on Github. Sounds like Dynomite gets all the cool points :)
On the other hand, Dynamo is already in production use at LinkedIn and has some nice illustrated documentation, while I haven't seen anything indicating how mature or widely used (or not) Dynomite is....
I'm also curious about the idea of combining Dynomite (or something like it) with CouchDB, since they appear to target slightly different levels of the stack. Dynamo and its kin don't actually take on the job of storage, after all. Is there something to be gained by combining its highly available architecture with CouchDB's highly available node, and incorporating CouchDB's incremental mapreduce views?
Hi Jay. Thanks for making it available and taking the time to provide documentation. I knew about Voldemort because someone was kind enough to email about it and I was happy to post about it.
I think that 'read-repair' as an alternative to a two phase commit works for a shopping cart update, but fails for 'bid on an auction'. Amazon has an obvious requirement for the former, and read-repair IS more efficient. However, many apps (like games, for instance) would have a problem using it.
does this have anything to do with harry potter?