A Yes for a NoSQL Taxonomy

NorthScale's Steven Yen in his highly entertaining NoSQL is a Horseless Carriage presentation has come up with a NoSQL taxonomy that thankfully focuses a little more on what NoSQL is, than what it isn't:
- key‐value‐cache
- memcached, repcached, coherence, infinispan, eXtreme scale, jboss cache, velocity, terracoqa
- key‐value‐store
- keyspace, flare, schema‐free, RAMCloud
- eventually‐consistent key‐value‐store
- dynamo, voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb
- ordered‐key‐value‐store
- tokyo tyrant, lightcloud, NMDB, luxio, memcachedb, actord
- data‐structures server
- redis
- tuple‐store
- gigaspaces, coord, apache river
- object database
- ZopeDB, db4o, Shoal
- document store
- CouchDB, Mongo, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak Basho, Scalaris
- wide columnar store
- BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
"Who will win?" Steven asks. He answers: the most approachable API with enough power will win. Steven touts the contender with the most devastating knock out punch will be document stores because "everyone groks documents." Though the thought is there will be just a few winners and products will converge in functionality.
Steven is banking on the "worse is better" model of dominance, which is hard to argue with as it has been so successful an adoption pattern in our field. The convergence idea is something I also agree with. What we have now are a lot features masquerading as products. Over time they will merge together to become more full featured offerings.
The key question though is what is enough power to win? Just getting a value back for a key won't be enough. Who are you putting your money on?
Related Articles
- NoSQL is a horseless carriage blog post by Steven Yen.
- Damn, Which Database do I Use Now? for another take an organizing the database landscape.
- NoSQL Meetup Report by Andraz Tori
Reader Comments (13)
Sure, a lot of these data stores don't yet have "approachable" APIs, but my feeling is that we would all be better served by one single API that can use any one of there as a back-end.
Think of an alternative QL (to SQL), and an alternative ORM (OHM? representing objects as hashmaps, rather than relational tables).
There should be a programming model that is common to all of these datastores, even if the implementation (of the drivers) is more difficult in some implementations than others.
Using (behind the scenes) secondary indexes, reverse indexes, it should be possible to develop a feature rich API that supports most of these back ends, in an approachable, common object API.
Anyone working on such a thing?
The feature that differentiates Keyspace is actually replication. More precisely, so-called consistent replication.
I also like defining things in the positive as opposed to horseless carriage or wireless phone (a la SQL-less database). As someone who's CEO of a XML database company and who's worked in databases for over 20 years, I have two thoughts on the taxonomy.
[1] I'd say XML databases should be their own category, split from document stores. First, not all XML databases are document-oriented -- while many are, some are more mid-tier persistent stores for XML-wrapped data (e.g., EII). Second, MarkLogic is in a group with a bunch of stuff I've never heard of which tends to, uh, beg questions about the taxonomy.
[2] Most of the entries are specific project/products (e.g., BigTable) and by entering XML databases as a category within a category specifics get lost.
While I certainly have my angle to grind, MarkLogic is most certainly all about NoSQL (we are pure XQuery) and challenges the notion that RDBMSs can do anything and that the RDBMS is the end-point in database evolution.
CouchDB... What they have now is already useful. The API has been amazingly easy to use for several versions. I've yet to use any of the others on the landscape so I can't draw any comparisons.
You forget to mention Neo4J (http://neo4j.org/).
lightcloud is not a ordered‐key‐value‐store
it's just 2 consistent hash rings, on each ring , you can store redis, tokyotyrant, or memcached object... it's for distributed...you can easily get more capacity with adding more computers...
You missed two k-v / k-v cache systems
redis
TSnosql
You could also mention Neo4J which is a "graph databse". They made the headlines recently when they announced some funding.
Also, InfoGrid. http://infogrid.org/
It has a federated graph database accessed right from the web browser.
Not sure which of your categories it fits, but MckoiDDB probably warrants inclusion.
http://www.mckoi.com/
'He' didn't miss anything, the above list is a table copied from the [highly entertaining and insightful] slides linked in the article for a talk that was already given, so there's no point forking the list just for the sake of completeness. Also redis is in there, in the more accurately titled 'data structures store'.
Ah, the approachable API... a noble goal indeed.
My team produced a fully-replicated (yet using some clever tricks to hide replication lag!) DB that sits somewhere in the "ordered key-value" and "document store" camps, for an internal project...
...and found that even internal developer adoption was hindered by it having its own API. Even though the API was (if I may say so myself) as easy as it could get.
So, not to be outdone, we wrote a MySQL storage engine that backs onto our database (which doesn't really use the full flexibility of the schemaless document store, but hey, it's still there for direct API access if you want it), so our tables can be accessed as MySQL tables.
So there you have it - with a little work, people can use a very familiar API to access a NoSQL database - MySQL :-) While still gaining the raw throughput and flexible-schema advantages of going to the native API beneath when they want to.
While you enter here into details, imho, I could simply put: NoSQL databases are disguised object databases! With relaxed contraints.
Here is my post:
http://www.jroller.com/dmdevito/entry/thinking_about_nosql_databases_classification