« Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day | Main | Stuff The Internet Says On Scalability For February 17, 2012 »
Monday
Feb202012

Berkeley DB Architecture - NoSQL Before NoSQL was Cool

After the filesystem and simple library packages like dbmBerkeley DB was the original luxury embedded database widely used by applications as their core database engine. NoSQL before NoSQL was cool. The hidden secret making complex applications sing. If you want to dispense with all the network overhead of a server based system, it's still a a good choice.

There's a great writeup for the architecture behind Berkeley DB in the book The Architecture of Open Source Applications. If you want to understand more about how a database works or if you are pondering how to build your own, it's rich in detail, explanations, and lessons. Here's the Berkeley DB  chapter from the book. It covers topics like: Architectural Overview; The Access Methods: Btree, Hash, Recno, Queue; The Library Interface Layer; The Buffer Manager: Mpool; Write-ahead Logging; The Lock Manager: Lock; The Log Manager: Log; The Transaction Manager: Txn. 

Related Articles 

Reader Comments (3)

Absolutely, BerkeleyDB is amazingly fast if your scenario is a non-sharded single web server. I've used BerkeleyDB on BookMooch.com for 6 years, and real-world query speeds of 2 million queries per second, for a single web page, are typical. Those aren't simulated speeds: that's real world, after locking, overheard, etc... That kind of speed lets me use BerkeleyDB in place of in-memory arrays, and then you get automatic persistence (much like Perl does).

-john

February 20, 2012 | Unregistered CommenterJohn Buckman

Lets me say first that the "NoSQL Before NoSQL was Cool" moniker MarkLogic had came up with for quite sometime(I have the shirt to prove it). That being said, MarkLogic is the de-facto XML database when it comes to speed and scalability. MarkLogic does not require horizontal sharding, because it was built for clustering and coordination of thousands of nodes and petabytes of data. MarkLogic has installations in some of the largest companies with massively complex content/data and can perform subsecond queries against any node or document. I think Berkeley is a great tool, but is novel at best, I would be interested in who in the enterprise is using it and at what scale. Being someone who has intimately worked with and for MarkLogic. I know it scales and solves alot of informational problems, whether you have 100 GB or 100 TB of content.

-Gary Vidal

February 25, 2012 | Unregistered CommenterGary Vidal

Please check out Bangdb. Currently the embedded version is released at www.iqlect.com. There is an interesting perf comparison document with BerkleyDB and LevelDB. Please check out when you can.
Thanks

September 16, 2012 | Unregistered Commentersachin

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>