Entries in facebook (29)

Tuesday
Sep222020

Snakes in a Facebook Datacenter

 

What do you do when you find a snake in your datacenter? You might say this. (NSFW)

 

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of some of its components. You might think Facebook solved all of its fault tolerance problems long ago, but when a serpent enters the Edenic datacenter realm, even Facebook must consult the Tree of Knowledge.

In this case, it's not good or evil we'll learn about, but Workload Placement, which is a method of optimally placing work over a set of failure domains. 

Here's my gloss of the Fault Tolerance through Optimal Workload Placement talk:

Click to read more ...

Monday
Jan232012

Facebook Timeline: Brought to You by the Power of Denormalization

Facebook Timeline is audacious in scope. It wants to compile a complete scrollable version of your life story from photos, locations, videos, status updates, and everything you do. That could be many decades of data (hopefully) that must stored and made quickly available at any point in time. A huge technical challenge, even for Facebook, which we know are experts in handing big data. And they built it all in 6 months.

Facebook's Ryan Mack shares quite a bit of Timeline's own implementation story in his excellent article: Building Timeline: Scaling up to hold your life story

Five big takeaways from the article are:

Click to read more ...

Wednesday
Jan042012

How Facebook Handled the New Year's Eve Onslaught

How does Facebook handle the massive New Year's Eve traffic spike? Thanks to Mike Swift, in Facebook gets ready for New Year's Eve, we get a little insight as to their method for the madness, nothing really detailed, but still interesting.

Problem Setup

Click to read more ...

Friday
Sep232011

The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…

It’s how much load that really generates and how it scales to meet the challenge.

imageThere’s some amount of debate whether Facebook really crossed over the one trillion page view per month threshold. While one report says it did, another respected firm says it did not; that its monthly page views are a mere 467 billion per month.

In the big scheme of things, the discrepancy is somewhat irrelevant, as neither show the true load on Facebook’s infrastructure – which is far more impressive a set of numbers than its externally measured “page view” metric.

Click to read more ...

Tuesday
May172011

Facebook: An Example Canonical Architecture for Scaling Billions of Messages

What should the architecture of your scalable, real-time, highly available service look like? There are as many options as there are developers, but if you are looking for a general template, this architecture as described by Prashant Malik, Facebook's lead for the Messages back end team, in Scaling the Messages Application Back End, is a very good example to consider. 

Although Messages is tasked with handling 135+ billion messages a month, from email, IM, SMS,  text messages, and Facebook messages, you may think this is an example of BigArchitecture and doesn't apply to smaller sites. Not so. It's a good, well thought out example of a non-cloud architecture exhibiting many qualities any mom would be proud of:

Click to read more ...

Tuesday
Mar222011

Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day

Facebook did it again. They've built another system capable of doing something useful with ginormous streams of realtime data. Last time we saw Facebook release their New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month. This time it's a realtime analytics system handling over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds

Alex Himel, Engineering Manager at Facebook, explains what they've built (video) and the scale required:

Social plugins have become an important and growing source of traffic for millions of websites over the past year. We released a new version of Insights for Websites last week to give site owners better analytics on how people interact with their content and to help them optimize their websites in real time. To accomplish this, we had to engineer a system that could process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds. 

Alex does an excellent job with the presentation. Highly recommended. But let's take a little deeper look at what's going on...

Click to read more ...

Friday
Dec312010

Facebook in 20 Minutes: 2.7M Photos, 10.2M Comments, 4.6M Messages

To celebrate the new year Facebook has shared the results of a little end of the year introspection. It has been a fecund year for Facebook:

  • 43,869,800 changed their status to single
  • 3,025,791 changed their status to "it's complicated"
  • 28,460,516 changed their status to in a relationship
  • 5,974,574 changed their status to engaged
  • 36,774,801 changes their status to married

If these numbers are simply to large to grasp, it doesn't get any better when you look at happens in a mere 20 minutes:

Click to read more ...

Tuesday
Nov162010

Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others.

Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.

HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.

Facebook chose HBase because they monitored their usage and figured out what the really needed. What they needed was a system that could handle two types of data patterns:

  1. A short set of temporal data that tends to be volatile
  2. An ever-growing set of data that rarely gets accessed

Makes sense. You read what's current in your inbox once and then rarely if ever take a look at it again. These are so different one might expect two different systems to be used, but apparently HBase works well enough for both.

Some key aspects of their system:

Click to read more ...

Tuesday
Nov092010

Facebook Uses Non-Stored Procedures to Update Social Graphs

Facebook's Ryan Mack gave a MySQL Tech Talk where he talked about using what he called Non-stored Procedures for adding edges to Facebook's social graph. The question is: how can edges quickly be added to the social graph? The answer is ultimately one of deciding where logic should be executed, especially when locks are kept open during network hops.

Ryan explained a key element of the Facebook data model are the connections between people, things they've liked, and places they've checked-in. A lot of their writes are adding edges to the social graph. 

Currently this is a two step process, run inside a transaction:

Click to read more ...

Thursday
Nov042010

Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance

Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second.

But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic:

  • Query response times: 4ms reads, 5ms writes. 
  • Rows read per second: 450M peak
  • Network bytes per second: 38GB peak
  • Queries per second: 13M peak
  • Rows changed per second: 3.5M peak
  • InnoDB disk ops per second: 5.2M peak

 Some thoughts on creating quality, not quantity:

Click to read more ...