Entries in facebook (29)

Thursday
Sep302010

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems

Facebook has been so reliable that when a site outage does occur it's a definite learning opportunity. Fortunately for us we can learn something because in More Details on Today's Outage, Facebook's Robert Johnson gave a pretty candid explanation of what caused a rare 2.5 hour period of down time for Facebook. It wasn't a simple problem. The root causes were feedback loops and transient spikes caused ultimately by the complexity of weakly interacting layers in modern systems. You know, the kind everyone is building these days. Problems like this are notoriously hard to fix and finding a real solution may send Facebook back to the whiteboard. There's a technical debt that must be paid. 

The outline and my interpretation (reading between the lines) of what happened is:

Click to read more ...

Tuesday
Jun222010

Exploring the software behind Facebook, the world’s largest site

Peter Alguacil at Pingdom wrote a HighScalability worthy article on Facebook's architecture: Exploring the software behind Facebook, the world’s largest site. It covers the challenges Facebook faces, the software Facebook uses, and the techniques Facebook uses to keep on scaling. Definitely worth a look.

 

Thursday
Jun102010

The Four Meta Secrets of Scaling at Facebook

Aditya Agarwal, Director of Engineering at Facebook, gave an excellent Scale at Facebook talk that covers their architecture, but the talk is really more about how to scale an organization by preserving the best parts of its culture. The key take home of the talk is: 

You can get the code right, you can get the products right, but you need to get the culture right first. If you don't get the culture right then your company won't scale.

This leads into the four meta secrets of scaling at Facebook:

  1. Scaling takes Iteration
  2. Don't Over Design
  3. Choose the right tool for the job, but realize that your choice comes with overhead.
  4. Get the culture right. Move Fast - break things. Huge Impact - small teams. Be bold - innovate.

Click to read more ...

Monday
Feb082010

How FarmVille Scales to Harvest 75 Million Players a Month

Several readers had follow-up questions in response to this article. Luke's responses can be found in How FarmVille Scales - The Follow-up.

If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn. I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time.

How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say...

Click to read more ...

Friday
Jan222010

How BuddyPoke Scales on Facebook Using Google App Engine

How do you scale a viral Facebook app that has skyrocketed to a mind boggling 65 million installs (the population of France)? That's the fortunate problem BuddyPoke co-founder Dave Westwood has and he talked about his solution at Wednesday's Facebook Meetup. Slides for the complete talk are here. For those not quite sure what BuddyPoke is, it's a social network application that lets users show their mood, hug, kiss, and poke their friends through on-line avatars.

In many ways BuddyPoke is the quintessentially modern web application. It thrives off the energy of social network driven ecosystems. Game play mechanics, viral loops, and creative monetization strategies are all part of if its everyday conceptualization. It mashes together different technologies, not in a dark Frankensteining sort of way, but in a smart way that gets the most bang for the buck. Part of it runs on Facebook servers (free). Part of it runs on flash in a browser (free). Part of it runs on a storage cloud (higher cost). And part of runs on a Platform as a Service environment (that's GAE) (low cost). It also integrates tightly with other services like PayPal (a slice). Real $$$ are made selling virtual goods like gold coins redeemable in pokes. User's can also have their avatars made into dolls, t-shirts, and a whole army of other Zazzle powered gifts.

Click to read more ...

Monday
Oct262009

Facebook's Memcached Multiget Hole: More machines != More Capacity 

When you are on the bleeding edge of scale like Facebook is, you run into some interesting problems. As of 2008 Facebook had over 800 memcached servers supplying over 28 terabytes of cache. With those staggering numbers it's a fair bet to think they've seen their share of Dr. House worthy memcached problems.

Jeff Rothschild, Vice President of Technology at Facebook, describes one such problem they've dubbed the Multiget Hole.

You fall into the multiget hole when memcached servers are CPU bound, adding more memcached servers seems like the right way to add more capacity so more requests can be served, but against all logic adding servers doesn't help serve more requests. This puts you in a hole that adding more servers can't dig you out of. What's the treatment?

Click to read more ...

Tuesday
Oct132009

Why are Facebook, Digg, and Twitter so hard to scale?

Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild, Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out.

Traditional websites are easier to scale than social networking sites for two reasons:

Click to read more ...

Thursday
Jul022009

Product: Facebook's Cassandra - A Massive Distributed Store

Update 2: Presentation from the NoSQL conference: slides, video.
Update: Why you won't be building your killer app on a distributed hash table by Jonathan Ellis. Why I think Cassandra is the most promising of the open-source distributed databases --you get a relatively rich data model and a distribution model that supports efficient range queries. These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.

James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes.

  • Google Code for Cassandra - A Structured Storage System on a P2P Network
  • SIGMOD 2008 Presentation.
  • Video Presentation at Facebook
  • Facebook Engineering Blog for Cassandra
  • Anti-RDBMS: A list of distributed key-value stores
  • Facebook Cassandra Architecture and Design by James Hamilton
  • Wednesday
    Jun102009

    Hive - A Petabyte Scale Data Warehouse using Hadoop

    This post about using Hive and Hadoop for analytics comes straight from Facebook engineers.

    Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics.

    These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product.

    As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook.

    Read the rest of the article on Engineering @ Facebook's Notes page

    Monday
    May112009

    Facebook, Hadoop, and Hive

    Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first.

    Learn how they do it and what are the challenges on DBMS2 blog, which is a blog for people who care about database and analytic technologies.