NoSQL Took Away the Relational Model and Gave Nothing Back
Update: Benjamin Black said he was the source of the quote and also said I was wrong about what he meant. His real point: The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them.
At the A NoSQL Evening in Palo Alto, an audience member, sorry, I couldn't tell who, said something I found really interesting: NoSQL took away the relational model and gave nothing back.
The idea being that NoSQL has focussed on ease of use, scalability, performance, etc, but it has lost the idea of how data relates to other data. True to its name, the relational model is very good at capturing a managing relationships. With NoSQL all relationships have been pushed back onto the poor programmer to implement in code rather than the database managing it. We've sacrificed usability. NoSQL is about concurrency, latency, and scalability, but it's not about data.
My ears perked up because I said something similar a while back while commenting on VoltDB's criticism of the NoSQL transaction model: I agree completely that moving the repair logic to the programmer is a recipe for disaster. Having programmers worry about read repair, vector clocks, the commutativity of transactions, how to design compensatory transactions to make up for previous failed transactions, and the other very careful bits of design, is asking for a very fragile system. ACID transactions are clean and understandable and that's why people like them.
Relationships are very much in the same spirit. Managing relationships without explicit support or multiple object transaction support puts a huge burden on the programmer. At one level key-value systems are far simpler at every level to use. That's great. But, for more complex data all that work really comes back and falls on the overburdened shoulders of the programmer. What I liked about this comment during the event is that it put an emphasis on making the programmers life easier across a wider variety of use cases, which is always a good thing, and it was worth surfacing.
Related Articles
- Interesting Hacker News Thread, especially the commentary on state of the art IO subsystems.
Reader Comments (24)
My feelings exactly. I was at a SpringSource conference and there was a topic on NoSQL databases. Quite a few big names were thrown around when it came to listing companies that were using it.
But then when it came to demonstraing a simple join condition between two tables and a couple of where clauses in them, the suggestion given was: Change your NoSQL database design!
I mean come on! Really?! So my schema has to change for every query I come up with? Relational DBs do this in their sleep.
My memory's a bit foggy as to which implemenation of NoSQL they were discussing at the time, but it was a popular one.
I can't think of anything beyond a simple app that I can dare to use NoSQL for. Maybe it's a matter of learning to accept change....but I am really skeptical.
It's a great little phrase. FWIW, my own bias is towards trying to build better features on top of an RDBMS core. Shard. Make schema changes easier by letting one "kind" point to more than one table. (Or allow tacking entity/attribute/value data onto a conventional table.) Duplicate a little data to make sharding easier. Build a layer on the client to emulate more normal SQL aggregate queries and joins when writing SQL isn't possible.
From the RDBMS core, you get compact storage and old low-level code whose flaws are known. On top of that you can get past some of the pain points of MySQL these days, like schema changes that lock everything and inability to scale out. It's one thing to describe an idea like that and another to come up with a detailed design, build it, and get the bugs out, but it seems like a promising approach -- scaling the database and making it flexible, without reinventing it.
It's an interesting thought. Though I'm left wondering what of dbs like Mongo where you can do it completely relational, document based, or both. Initially many people going to a NoSQL solution will shy away from relational bits - but that's not a good idea. Many times you want to manage that data in a relational way (and other times you get great things out of looking at it as a document).
So while I can't speak to other nosql solutions (as my experience is limited there) I don't feel like mongo did take the relational model away and give nothing back. In a lot of ways it gave a lot of flexibility and more accurate ways to model our applications relationship with the data. There are certainly trade offs, but I don't think that's one of them across the board.
Seems like an awfully broad, and incorrect generalization.
i agree 101%.
if your *data* is important, then i simply cannot understand how a nosql solution would fit into the picture. eventual consistency means that there is never a real moment in time that you can be guaranteed that your data is consistent. besides logging (i.e. throw away data), i don't see how that artifact could ever be useful.
I think the problem with NoSQL is it's currently in it's Hype stage. Where it's the new cool hip thing to use NoSQL when RDBMS is most likely more suited to the problem.
Lots of websites would be better suited for NoSQL. Blogs for instance are a prime example of where you don't really benefit from having normalized data.
But trying to tailor NoSQL around a website that requires complex searching or data mining and you're using the wrong tools for the job.
My opinion anyway.
The "Relation" in the relational model is what we currently commonly call a table. Attributes (columns) in a relation (table) are related to a candidate key (and a primary key is usually identified from the candidates) and relations can be normalized according to the standard normal forms so that you get all the usual benefits of normal form. This is usually enforced by the database system with foreign key constraints, but these "relationships" are not the "relations" referred to in relational databases. SQL is not required to restrict joins to only columns which are constrained with foreign key constraints - i.e. the joins and functionality of SQL is not generally restricted to the declared data model.
Because one normally would assume a relational database to have some normalization in addition to the standard model, one gets the benefits of the mature traditional RDBMS. It is certainly possible to also have a high performance denormalized model in a relational database (for instance the Kimball dimensional model in the data warehousing arena) with a mature and well-tested methodology.
RDBMS systems are very large abstractions. Getting them to scale and perform is not simple, but the standard model is there - building normalized and denormalized databases is well understood, as is indexing and tuning the various engines. Obviously, these general purpose engines have overhead which many people feel they can do without, thus the large efforts to build new databases which leave out various features of RDBMS systems.
Unfortunately, NoSQL is an umbrella for such a variety of systems, there really is no benchmark or reference model to describe it as one architecture. Although SQL Server and Oracle certainly have very different architectures, when looking at them in compared to any two particular NoSQL systems, it is clear that the name NoSQL is really just "not a relational database", but that could be said about the file system. Allowing NoSQL systems to call themselves databases is also rather disingenuous - like people who call their Excel membership lists a database.
I'm not going to dispute that NoSQL systems have a place, but the umbrella name is useless, because it only tells you that a particular architecture has been rejected, not one which has been embraced.
That was me (as a follow-up to excellent comments from Scott Waterman). The point being made bears no relation (hah!) to the comments on this post. The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them. It was most definitely _not_ meant to support these ill-informed opinions on NoSQL vs SQL.
NoSQL took away the Relational Model...
...and I don't think I'll be missing it.
What is ill-informed Benjamin?
You could argue that RDF could be the standard NoSQL API. Though of course you can also implement a triplestore on top of a relational database...
The relational model is approaching 40 years old and has never been implemented.
Every mention of an RDBMS in these comments is about an SQL (or some variant) database, not a relational database. Anyone who studies relational theory can point out the difference in a heartbeat.
Read Chris Date's endless, sad squawkings, "Let's implement the relational model!"
Of course, no one ever will; an RDBMS would be immediately obsolete if it were ever built: SQL databases are far, far more powerful.
Todd,
Your post only misinterpreted what I meant. I think Scott and I agree, but he will have to comment on that. The question was not whether the NoSQL systems should exist since they eschew the relational model, but what the future holds for a new model that is as useful. I know I am not alone in expecting late binding of schema and explicit parallelism, exemplified by systems like Hive, Pig, Cascading, and Cascalog, to be central to whatever emerges as the 'standard' model and language for the map-reduce NoSQL systems. Will it have some similarities in syntax with SQL? Probably, if only because it is familiar.
The "ill-informed" applied to the comments on the post. Sorry for the further confusion.
b
Yes really. The whole point of the NoSQL solutions is you give up your fancy ad hoc queries. In return, you get better scalability and faster throughput. So yes, your schema has to change for every query that you come up with. All that means is you have to understand you data and how it relates to your application before you build anything. The old SQLish method of just building out the data into a normalized data set and adding indexes later on doesn't work.
The real technology, behind all the other technology, is language
Every sentence in every language is either a declaration, a query or a command. (That last sentence is a declaration :)
Every sentence has an subject, predicate and object.
A DB is simply a collection of declarations. Declare the relationships, and you have relational data. SQL or not.
ie:
Tweety is a bird.
Bird is an animal.
Tweety is owned by Granny.
Tweety nemesis is Sylvester.
Sylvester is owned by Granny.
Sylvester is a cat
Cat is an animal.
With those declarations i can find all animals, all birds owned by Granny etc.
The real technology, behind all the other technology, is language
Every sentence in every language is either a declaration, a query or a command. (That last sentence is a declaration :)
Every sentence has an subject, predicate and object.
As Henri mentioned, it's trivial to use an SQL DB as a NoSQL system, as well as creating relational data using NoSQL.
A DB is simply a collection of declarations. Declare the relationships, and you have relational data. SQL or not.
ie:
Tweety is a bird.
Bird is an animal.
Tweety is owned by Granny.
Tweety nemesis is Sylvester.
Sylvester is owned by Granny.
Sylvester is a cat
Cat is an animal.
With those declarations i can find all animals, all birds owned by Granny etc.
The DB method is mostly irrelevant.
I don't think this criticism is valid for graph databases. They provide a high-level abstraction to relate things arguably substantially more powerful than a relational database does. (They are not a panacea for every problem, of course)
So from my point of view, graph databases (like InfoGrid) took away SQL and put something more interesting there.
Todd seems to be just stirring a storm in a cup by articles such as this.
Read here: There are 2 types of fools. One who think it is good because it is old and established. Another type who thinks it is better because it is new.
There are the googles and facebooks of tomorrrow out there that are technology agnostic and depending on the use case, will either use SQL or noSQL or roll something new out. The rest of us will take stands.
I knew I wasn't going to agree with the article when I saw the title, but this quote may be the most incorrect thing said on the Internet today :-)
Relatiionships are the most unsupported and mishandled part of relational databases. The few bits that are there are buried as constraints or even worse, as text in a view that you're never going to be able to work with directly!
The entire relational database discipline is built on the massively flawed concept that some database "architect" somewhere completely understands every possible attribute about every possible entity IN ADVANCE in order to build the schemas. Then it claims the second most important thing in data management is the ability for anyone that can log in to do ad-hoc queries, which drives the need for SQL to be crammed in between your business logic and your data.
I've been making my living off of Oracle for the past 20+ years, and I still can't believe anyone thinks this is the RIGHT way to work with data :-)
Schema-less, SQL-less databases like MongoDB are designed to store the relationships as first class citizens in the world of data. And they don't make it easy for any admin assistant with an account and a DB GUI to kick off 20 cartesian joins across your biggest tables in the middle of the afternoon. And they don't rely on some buggy optimizer to randomly pick the path it's going to glue your data together this time. And they don't go through a dozen layers of buffers and abstractions to get the data from the store into your app and back again. And they don't require ORMs and other insane-beyond-belief approaches to solve the problems caused by all the "benefits" of RDBMS technology.
There are a lot of things broken in computing today, and relational databases are high on that list. The NoSQL (I hate that term, too) solutions are by no means perfect, but they are a big step in the right direction.
AJ, do you really think that's my style?
Terry, I didn't really have ad-hoc queries and joins in mind. More that case that when you have many-many relationships that span records, without some mechanism, keeping references consistent is impossible for a programmer. Pretty basic stuff really.
In this context, I think "NoSQL took away the relational model, and gave us higher concurrency".
One relationship, fine. Two great. Three hmmm... things are taking a bit longer... four relationships the DBAs are complaining... five relationships the site is starting to crawl... six relationships find a new job.
My site can get something like 28 million hits per hour, and I model a lot relationship. So for pity sake do not use a RDBMS. Store things in a distributed, replicated, persistent K/V store by 'PK'. Store other related things in other K/V stores by the same PK (aka a FK). Do the join at the client or middleware tier (using scatter / gather).
It can and does work.Yes someone has to code the join (scatter / gather) the first time.
28 million hits per hour? I call BS. Twitter does 50M a day
http://searchengineland.com/by-the-numbers-twitter-vs-facebook-vs-google-buzz-36709
IT guys love to design for those 1 in a million websites like Twitter or FB, but the reality is, any DB is sufficient for 99%+ of all websites.
"IT guys love to design for those 1 in a million websites like Twitter or FB, but the reality is, any DB is sufficient for 99%+ of all websites."
Agreed - and if DB is not good enough, you can always throw memcache.
Actually I talked to a FB engineer a few weeks ago and Cassandra is actually be zoned out, simply because FB had enough people who really know how to develop/use/manage MySQL that transitioning completely to Cassandra was tough. To that end they have modified memcache to make it a write-through cache to MySQL.
So the FB beast releases Cassandra to the world, but they themselves are not using it.
Most people who are screaming against NOSQL just don't get and don't understand NOSQL.
First, NOSQL is not here to replace RDBMS. If something works fine with RDBMS, most likely it will still be fine without RDBMS. A banking DB doesn't need to be done NOSQL. Most likely it really needs the relationship with all the tables.
Second, NOSQL has its own purpose. Most of the time, a NOSQL DB will fit best for web applications.
Third, NOSQL (Mongodb at least) leave the hard to the programmer, while it's only storing the data.
Because they are all DBs doesn't mean, they can be good for the same job.
In my opinion the expressiveness of the SQL query syntax has not alternative in the NO-SQL world. An alternative has been alive and more closely aligns with the No-SQL world. SPARQL is a W3C spec and address models in Triples, unlike the Key-Value pair in No-SQL. Triple stores are known to run on Key-Value store (see Jena TDB Voldermot project, although experimental).
The trade-off for any new language will be expressivity and SQL does a reasonably good job. I am no expert in grammar, but SPARQL does the same. The new query language must balance this expressivity with the performance of current Key-Value store implementations. This is a tall order given the headstart the SQL and SPARQL world have.