Paper: NoSQL Databases - NoSQL Introduction and Overview
Christof Strauch, from Stuttgart Media University, has written an incredible 120+ page paper titled NoSQL Databases as an introduction and overview to NoSQL databases . The paper was written between 2010-06 and 2011-02, so it may be a bit out of date, but if you are looking to take in the NoSQL world in one big gulp, this is your chance. I asked Christof to give us a short taste of what he was trying to accomplish in his paper:
The paper aims at giving a systematic and thorough introduction and overview of the NoSQL field by assembling information dispersed among blogs, wikis and scientific papers. It firstly discusses reasons, rationales and motives for the development and usage of nonrelational database systems. These can be summarized by the need for high scalability, the processing of large amounts of data, the ability to distribute data among many (often commodity) servers, consequently a distribution-aware design of DBMSs.
The paper then introduces fundamental concepts, techniques and patterns that are commonly used by NoSQL databases to address consistency, partitioning, storage layout, querying, and distributed data processing. Important concepts like eventual consistency and ACID vs. BASE transaction characteristics are discussed along with a number of notable techniques such as multi-version storage, vector clocks, state vs. operational transfer models, consistent hashing, MapReduce, and row-based vs. columnar vs. log-structured merge tree persistence.
As a first class of NoSQL databases, key-value-stores are examined by looking at the proprietary, fully distributed, eventual consistent Amazon Dynamo store as well as popular opensource key-value-stores like Project Voldemort, Tokyo Cabinet/Tyrant and Redis.In the following, document stores are being observed by reviewing CouchDB and MongoDB as the two major representatives of this class of NoSQL databases. Lastly, the paper takes a look at column-stores by discussing Google’s Bigtable, Hypertable and HBase, as well as Apache Cassandra which integrates the full-distribution and eventual consistency of Amazon’s Dynamo with the data model of Google’s Bigtable."
Related Articles
- Ultra-large-scale Sites - a collection of papers written by students at Stuttgart Media University.
Reader Comments (4)
With respect to the paper, NoSQL is not non-relational, it always implements a subset of a conventional relational algebra (as do "relational" database implementations). NoSQL, as the name implies, is about not implementing SQL and optimizing for a narrower set of use cases than a conventional relational database.
Very interesting. During the same time I wrote my master thesis "Storing and Analyzing Social Data". I know how hard it is to not publish something that is already outdated. At the moment the development in this area is so fast that publishing with a delay of 2-3 months already means the paper or report is outdated.
@Nicolas: Thanks for the link to your interesting thesis. I fully agree on the pace and momentum in the NoSQL field (at least was it hard for me to keep up to it as I wrote the paper besides my master thesis and subsequent day job). Therefore, I think the most valuable and "durable" part of my paper is chapter 3 on the fundamental building blocks of scalable and available datastores; the subsequent chapters then show how these incarnate and can be arranged to build real-life systems.
My big gratitude to Christof Strauch. The paper was extremely useful as an overview and was helpful in writing the chapter on HBase and other NoSQL databases in my book on Hadoop.