« Stuff The Internet Says On Scalability For April 8, 2011 | Main | Netflix: Run Consistency Checkers All the time to Fixup Transactions »
Thursday
Apr072011

Paper: A Co-Relational Model of Data for Large Shared Data Banks

Let's play a quick game of truth or sacrilage: are SQL and NoSQL are really just two sides of the same coin? That's what Erik Meijer and Gavin Bierman would have us believe in their "we can all get along and make a lot of money" article in the Communications of the ACM, A Co-Relational Model of Data for Large Shared Data Banks. You don't believe it? It's math, so it must be true :-) Some key points:

In this article we present a mathematical data model for the most common noSQL databases—namely, key/value relationships—and demonstrate that this data model is the mathematical dual of SQL's relational data model of foreign-/primary-key relationships

...we believe that our categorical data-model formalization and monadic query language will allow the same economic growth to occur for coSQL key-value stores.

...In contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus coSQL. While the coSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for "small" data as well. On the other hand, it is possible to scale SQL databases by careful partitioning.
What this all means is that coSQL and SQL are not in conflict, like good and evil. Instead they are two opposites that coexist in harmony and can transmute into each other like yin and yang. Because of the common query language based on monads, both can be implemented using the same principles.

I'm certainly in no position to judge this work, or what it means at some deep level. After reading a 1000 treatments on monads I still have no idea what they are. But, like the Standard Model in physics, it would be satisfying if some unifying principles underlay all this stuff. Would we all get along? That's a completely different question...

Reader Comments (3)

We'll have to wait until some implementation proves it =)

April 7, 2011 | Unregistered Commentermaop

There was a good citation at the bottom about the duality between IEnumerator (push) and IObservable (pull). Duality is an invitation to refactor, and a "proof" that refactoring will yield similar semantics.

Overall, the pitch is a "one ring that can bind them all" sort, i.e., that there can be one language which one could program higher-order work in, which would rest on monad comprehensions. Use that language, list some constraints, and a configuration or strategy would choose if the implementation of your higher-order program will be deterministic or parallel, push or pull, SQL or co-SQL.

What would one call the language --- pig_or_SQL?

April 10, 2011 | Unregistered CommenterTony

Firstly, I respect Erik very much. But in this case I am not quite sure whether two authors from Microsoft are completely free of conflict of interest. After all, SQL Server is one of their flagships and neither Hadoop nor noSQL can be commercialized by them. I think it looks just to "convenient" to proof that in the end this is all the same.

October 5, 2012 | Unregistered CommenterCaravaggio

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>