Snakes in a Facebook Datacenter
Tuesday, September 22, 2020 at 10:10AM
HighScalability Team in facebook

 

What do you do when you find a snake in your datacenter? You might say this. (NSFW) 

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of some of its components. You might think Facebook solved all of its fault tolerance problems long ago, but when a serpent enters the Edenic datacenter realm, even Facebook must consult the Tree of Knowledge.

In this case, it's not good or evil we'll learn about, but Workload Placement, which is a method of optimally placing work over a set of failure domains. 

Here's my gloss of the Fault Tolerance through Optimal Workload Placement talk:

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.