Michael Stonebraker sure knows how to stir up a storm. Unlike for others, that doesn't make him a troll in my mind, he's way too accomplished in the field to be that, but he does have a bit of Barnum & Bailey in him, which serves to get the discussion flowing, and that's a good thing. A lot of previously hidden wisdom and passion unlocks, which we'll try to capture here.
This disturbance in the force is over OldSQL vs NoSQL vs NewSQL. Warning, these are not crisp categories, there's leakage all over the place, watch your step:
OK, got it? Then you might be the only one...
The disturbance first started with this article by Derrick Harris, which gets a lot of mileage out of a few quotes by Stonebraker. The short of it is:
That prompted a doozy of a response post by Facebook Database Engineer Domas Mituzas, who feels MySQL suits their purposes just fine, given Facebook actually knows what their purposes are:
Then the stream split in this post by Bob Warfield, which riffs on the idea of NoSQL being premature optimization:
Jeremy Zawodny did not care for this take, and a memo to some of his commenters, if anyone has the chops to have an opinion on all of this, it's Jeremy. The short of it:
In a parallel subject continuum, I think this comment by Slappy198s in Lady Gaga's solo piano performance of "The Edge Of Glory" on Howard Stern, nicely summarizes the zeitgeist:
You know, there's a difference between not liking someone's music and not recognizing their talent. If you can't recognize the fact that Lady GaGa is, in fact, extremely talented in many ways, then you may want to try to look at her with less of a bias. There's plenty of artists I can't stand, but still respect their talent.
Even if you don't like Lada Gaga's schtick, that is a great performance. I get the feeling a lot SQL people don't recognize the talent of NoSQL, whereas NoSQL people are generally use the best tool for the job types who have no problem with you using SQL if that works for you. Which leads to...
Jeremy does a good job saying why NoSQL isn't a case of premature optimization. It's not just about scale, it's about solving a problem in a way that is sensible for your job.
There's a deeper fundamental problem I have with the characterization of NoSQL as premature optimization, and it's the same one I have when this phrase is used by programmers. Usually it's just code for I don't like your design choices; I would do it another way so whatever you are doing is wrong. This is a common tactic in in the rough and tumble world of design reviews. It's dismissive. The problem is the person making the assertion probably doesn't have the same experiences as the person doing the implementation and they probably haven't thought through their problem as deeply, so it's often arrogance to say you know what is premature optimization for another's problem.
The arrogance here is seeing a SQL product as the default choice with any deviation requiring a justification. This is not the case. There is no default choice. You don't have to make excuses for choosing something different. You just have solve your particular problem. That's it.
cogman10 has a good sense of it:
Picking Tech A over Tech B is NOT a premature optimization. Would the author claim that "Using InnoDB is a premature optimization because MySQL is better supported!" It is called planning, you do that whenever you write a new application.
Use the database that best matches your data. If some non-relational database is a perfect match for the data you want to store, by all means use it. Don't give two shits about people like the author that think SQL is the one and only query language. (hell, I wish that SQL would die in flames, but it is heavily built into current business models. Not because it is the best, but because it is common.)
Insisting on the wrong tech is not premature optimization. It is stupidity.
Errr also makes some good points:
NoSQL isn’t just something that you replace your SQL database with when it reaches a certain scale. If that was the only case where NoSQL was useful then it would be just an optimization, but in a lot of cases it’s easier, cheaper and more efficient to implement it as an NoSQL database right away.
You’re hardly an expert if you’re going to devote more time, effort and money to fit something into a SQL database when all that can be achieved much easier using a NoSQL database.
I'll bet on Domas Mituzas knowing what he's doing, so it seems highly unlikely that Facebook is screwed. He's looking from the long view, the perspective of someone who provides a working service for 700+ million people, accessing uncountable bytes of old and new data, that will ideally last forever. That's a completely different perspective from someone coming from a place of building relatively simple and specialized transactional systems.
Facebook can still do what they need doing, so there's no reason to switch, but it's reasonable to ask if Facebook would be in a better place if they started today with a different technology? Would they prefer to use their new HBase stack, for example? It may not be as clear cut as you think. Take a moment to browse MySQL at Facebook, Facebook's own page on how they use MySQL. What's clear is they are MySQL experts and they deal with other stone cold experts like Percona. They know MySQL backwards and forwards and can make it do what they want. Would you change something that's working to use a new technology that you don't know?
At Facebook it's not a matter of MySQL or else either. With a policy of using the best technology for the job and deep experience in other stacks, Facebook could switch stacks if they wanted. Not every product at Facebook uses MySQL. Take Facebook's New Real-Time Messaging System which uses HBase, for example.
Don't believe the switch would be too hard and that they are locked in. Facebook uses APIs that would allow incrementally moving data over to another backend datastore over time. That they aren't doing this means they are satisfied enough with their highly tuned memcached + highly tuned MySQL + highly automated operations approach to stand pat.
As Mituzas hints at, when operating at this scale, all the programs created to manage the system completely dwarf all the other code, regardless of the starting point of the technology, and that's what Facebook has mastered, so the reports of Facebook's impending infrastructure death may be premature.