Berkeley DB Architecture - NoSQL Before NoSQL was Cool
After the filesystem and simple library packages like dbm, Berkeley DB was the original luxury embedded database widely used by applications as their core database engine. NoSQL before NoSQL was cool. The hidden secret making complex applications sing. If you want to dispense with all the network overhead of a server based system, it's still a a good choice.
There's a great writeup for the architecture behind Berkeley DB in the book The Architecture of Open Source Applications. If you want to understand more about how a database works or if you are pondering how to build your own, it's rich in detail, explanations, and lessons. Here's the Berkeley DB chapter from the book. It covers topics like: Architectural Overview; The Access Methods: Btree, Hash, Recno, Queue; The Library Interface Layer; The Buffer Manager: Mpool; Write-ahead Logging; The Lock Manager: Lock; The Log Manager: Log; The Transaction Manager: Txn.
Reader Comments (3)
Absolutely, BerkeleyDB is amazingly fast if your scenario is a non-sharded single web server. I've used BerkeleyDB on BookMooch.com for 6 years, and real-world query speeds of 2 million queries per second, for a single web page, are typical. Those aren't simulated speeds: that's real world, after locking, overheard, etc... That kind of speed lets me use BerkeleyDB in place of in-memory arrays, and then you get automatic persistence (much like Perl does).
-john
Lets me say first that the "NoSQL Before NoSQL was Cool" moniker MarkLogic had came up with for quite sometime(I have the shirt to prove it). That being said, MarkLogic is the de-facto XML database when it comes to speed and scalability. MarkLogic does not require horizontal sharding, because it was built for clustering and coordination of thousands of nodes and petabytes of data. MarkLogic has installations in some of the largest companies with massively complex content/data and can perform subsecond queries against any node or document. I think Berkeley is a great tool, but is novel at best, I would be interested in who in the enterprise is using it and at what scale. Being someone who has intimately worked with and for MarkLogic. I know it scales and solves alot of informational problems, whether you have 100 GB or 100 TB of content.
-Gary Vidal
Please check out Bangdb. Currently the embedded version is released at www.iqlect.com. There is an interesting perf comparison document with BerkleyDB and LevelDB. Please check out when you can.
Thanks