Paper: Netflix’s Transition to High-Availability Storage Systems
In an audacious move for such an established property, Netflix is moving their website out of the comfort of their own datacenter and into the wilds of the Amazon cloud. This paper by Netflix's Siddharth “Sid” Anand, Netflix’s Transition to High-Availability Storage Systems, gives a detailed look at this transition and does a deep dive on SimpleDB best practices, focussing especially on techniques useful to those who are making the move from a RDBMS.
Sid is going to give a talk at QCon based on this paper and he would appreciate your feedback. So if you have any comments or thoughts please comment here or email Sid at r39132@hotmail.com or Twitter at @r39132 Here's the introduction from the paper:
Circa late 2008, Netflix had a single data center. This single data center raised a few concerns. As a single-point-of-failure (a.k.a. SPOF), it represented a liability – data center outages meant interruptions to service and negative customer impact. Additionally, with growth in both streaming adoption and subscription levels, Netflix would soon outgrow this data center -- we foresaw an immediate need for more power, better cooling, more space, and more hardware.
One option was to build more data centers. Aside from high upfront costs, this endeavor would likely tie up key engineering resources in data center scale out activities, making them unavailable for new product initiatives. Additionally, we recognized the management of multiple data centers to be a complex task. Building out and managing multiple data centers seemed a risky distraction.
Rather than embarking on this path, we chose a more radical one. We decided to leverage one of the leading IAAS (a.k.a. Infrastructure-As-A-Service) offerings at the time, Amazon Web Services (a.k.a. AWS). With multiple, large data centers already in operation and multiple levels of redundancy in various web services (e.g. S3 and SimpleDB), AWS promised better availability and scalability in a relatively short amount of time.
By migrating various network and back-end operations to a 3rd party cloud provider, Netflix chose to focuson its core competency: to deliver movies and TV shows.
Here are some of the questions I had for Sid, accompanied by Sid's responses:
Q. The bidrectional syncing between Oracle and the cloud is an
interesting transition strategy, given all the work involved. Though
it seems logical that a big bang move wouldn't be a good idea and that
many of your system will stay in the datacenter for a while. How are
you handling consistency between the two silos? It seems like they
could easily get out of sync.
A. Currently, most of our data syncing use-cases are uni-directional. The few that are bidirectional are done on a best-effort basis and use a fixer job. I have an idea of how to implement a bi-directional eventually consistent replication scheme -- the scheme leverages SimpleDB's Consistent Put/Delete API (i.e. SimpleDB's Optimistic Concurrency Control), but I haven't had the time. So, I just do best effort currently.
Q. Could you go into a little bit of detail why Oracle is being
dropped? That part of the video was hard to hear so I didn't really
get the key reasons. It's a lot of work to switch over.
A. Oracle does not have a simple, self-managed solution for a distributed, replicated database that works in AWS. We could run Oracle on an EC2 instance and associate an EBS volume with that instance to store the Oracle database files. However, if that instance or the availability zone were to either go down or experience networking issues, we would need to manually intervene. That's not practical. Also, at this juncture in our growth, we were ready to forego strong consistency in favor of high availability (i.e. loosely-speaking, we were willing to trade off C for AP in the CAP theorem parlance). In other words, we need availability and are ok with some low level of inconsistencies -- especially if we can fix those inconsistencies via read repair or through a fixer job.
Q. How are you handling consistency and denormalization? At the end
you say maybe something like RDS would be needed for complex
relationships and you also say there should be no relationships
between domains. Were you really able to do this? It seem hard to
believe there ware no many-many relationships. If there where, how do
you handle consistency? This seems to be major issue all around. How
to keep relationships consistent and how to keep the different
databases consistent with each other.
A. I was worried about these things at the outset. it turned out that most of our data is looked up either by movie or by user. This is essentially a key or secondary index lookup, both of which translate well to key-value stores like SimpleDB and S3. Some of our data sets do have complex relationships. We either use our own custom MySQL solution for those cases or ferry the query back to our data center today. We are looking more deeply to moving this data to the cloud. We now have additional options in RDS or other technologies.
Q. Now that you've explored your application domain and at least one
NoSQL product in detail, do you have any thoughts on what your ideal
product what look like? What products are you looking at and why?
SimpleDB was certainly the best choice when you started, but there are
a lot more options now.
A. Simple key-value look ups can be handled by a key-value store, though, not all key-value stores are the same. Currently, we use S3 and SimpleDB because they each offer different trade-offs. Both however suffer from performance (i.e. response time) issues, so we are investigating Cassandra. However, Cassandra has its own trade-offs. We are still in the early stages of evaluating it. Hence, we might use a mix of key-value store technologies in the foreseeable future depending on the fine-granularity needs of our use-cases. For more complex relation use-cases, we will look into RDS, as it becomes more feature rich.
Reader Comments (3)
Todd, in your Q/A you are talking about some video Sid did? can you link to it?
That linked PDF is absolutely awesome and really gave me an insight to help with my upcoming projects.
Thank you.
It was actually a talk at the Cloud Computing Meetup that I was going to post on later. Here's the link: http://blip.tv/file/4252897#
About code swap, scala/jvm has OSGI which is much better in my opinion.