Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
Monday, October 22, 2012 at 9:15AM
HighScalability Team in Example, google

A lot of people seem to passionately dislike the term NewSQL, or pretty much any newly coined term for that matter, but after watching Alex Lloyd, Senior Staff Software Engineer Google, give a great talk on Building Spanner, that’s the term that fits Spanner best.

Spanner wraps the SQL + transaction model of OldSQL around the reworked bones of a globally distributed NoSQL system. That seems NewSQL to me.

As Spanner is a not so distant cousin of BigTable, the NoSQL component should be no surprise. Spanner is charged with spanning millions of machines inside any number of geographically distributed datacenters. What is surprising is how OldSQL has been embraced. In an earlier 2011 talk given by Alex at the HotStorage conference, the reason for embracing OldSQL was the desire to make it easier and faster for programmers to build applications. The main ideas will seem quite familiar:

Spanner did not start out with the goal of becoming a NewSQL star. Spanner started as a BigTable clone, with a distributed file system metaphor. Then Spanner evolved into a global ProtocolBuf container. Eventually Spanner was pushed by internal Google customers to become more relational and application programmer friendly.

Apparently the use of Dremel inside Google had shown developers it was possible to have OLAP with SQL at scale and they wanted that same ease of use and time to market for their OLTP apps. It seems Google has a lot of applications to get out the door and programmers didn’t like dealing with real-world complexities of producing reliable products on top of an eventually consistent system. 

The trick was in figuring out how to make SQL work at truly huge scales. As an indicator of how deep we are still in the empirical phase of programming, that process has taken even Google over five years of development effort. Alex said the real work has actually been in building a complex reliable distributed systems. That’s the hard part to get correct.  

With all the talk about atomic clocks, etc., you might get the impression that there’s magic in the system. That you can make huge cross table, cross datacenter transactions on millions of records with no penalty. That is not true. Spanner is an OLTP system. It uses a two phase commit, so long and large updates will still lock and block, programmers are still on the hook to get in and get out. The idea is these restrictions are worth the programmer productivity and any bottlenecks that do arise can be dealt with on case by case basis. From the talk I get the feeling over time, specialized application domains like pub-sub, will be brought within Spanner's domain. While the transaction side may be conventional, except for all the global repartitioning magic happening transparently under the covers, their timestamp approach to transactions does have a lot of cool capabilities on the read path.

As an illustration of the difficulties of scaling to a large number of replicas per Paxos group, Alex turned to a hydrology metaphor:

You could use a Spanner partition as a strongly ordered pub-sub scheme where you have read-only replicas all over the place of some partition and you are trying to use it to distribute some data in an ordered way to a lot of different datacenters. This creates different challenges. What if you are out of bandwidth to some subset of those datacenters? You don’t want data buffered in the leader too long. If you spill it to disk you don’t want to incur the seek penalty when bandwidth becomes available. It becomes like hydrology. You have all this data going to different places at different times and you want to keep all the flows moving smoothly under changing conditions. Smoothly means fewer server restarts, means better latency tail, means better programming model.

This was perhaps my favorite part of the talk. I just love the image of data flowing like water drops through millions of machines and networks, temporarily pooling in caverns of memory and disk, always splitting, always recombining, always in flux, always making progress, part of a great always flowing data cycle that never loses a drop. Just wonderful.

If you have an opportunity to watch the video I highly recommend that you do, it is really good, there’s very little fluff at all. The section on the use of clocks in distributed transactions is particularly well done. But, in case you are short on time, here’s a gloss of the talk:

Read Path

What are they doing now?

Q&A

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.