Netflix: Developing, Deploying, and Supporting Software According to the Way of the Cloud
Monday, December 12, 2011 at 9:05AM
HighScalability Team in Strategy, netflix

At a Cloud Computing Meetup, Siddharth "Sid" Anand of Netflix, backed by a merry band of Netflixians, gave an interesting talk: Keeping Movies Running Amid Thunderstorms. While the talk gave a good overview of their move to the cloud, issues with capacity planning, thundering herds, latency problems, and simian armageddon, I found myself most taken with how they handle software deployment in the cloud.

I've worked on half a dozen or more build and deployment systems, some small, some quite large, but never for a large organization like Netflix in the cloud. The cloud has this amazing capability that has never existed before that enables a novel approach to fault-tolerant software deployments: the ability to spin up huge numbers of instances to completely run a new release while running the old release at the same time.

The process goes something like: 

Previously:

The cloud makes both of these concerns old school. The elasticity of the cloud (blah blah) makes spinning up enough instances for a hosting a new release both affordable and easy. The load balancer infrastructure in the cloud both has an API and is usually under programmer control, so programmers can mess everything up with abandon.

You might at this point raise your hand and say: what about that database? Schema migration issues are always a pain, so this approach does not apply to the database. The database service layer still uses a rolling upgrade path, but it is hidden behind an API and load balancer so the process is somewhat less painful to clients.

How Netflix's Team Organization Implies a Software Architecture and Policies

Netflix has some attributes about their software team infrastructure and software architecture that makes this approach a good fit for them:

Use Load Balancers for Isolation

Their architecture is divided into layers: an Internet facing Edge layer; Middle layer serving the edge layer; Backend layer. All this rests on Amazon services. Each layer is fronted by a load balancer and each layer has the ultimate goal of being auto scaling. 

The load balancer allows the isolation of components. It's possible to spin up a parallel cluster and route requests behind the load balancer without any other service layer having an idea that it is being done.

For the Internet facing services they use Amazon's load balancers because they have public IP addresses. For the other layers they use their own service discovery service which handles load balancing internally.

 

Function follows form. We conform to and innovate on the capabilities of the underlying system and our forms eventually come to reflect that function. It will be curious to see how experience in the cloud will cause changes in long held development practices. That's the Way of the Cloud.

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.