Entries in Example (248)

Tuesday
Jun222010

Exploring the software behind Facebook, the world’s largest site

Peter Alguacil at Pingdom wrote a HighScalability worthy article on Facebook's architecture: Exploring the software behind Facebook, the world’s largest site. It covers the challenges Facebook faces, the software Facebook uses, and the techniques Facebook uses to keep on scaling. Definitely worth a look.

 

Monday
May172010

7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...

Monday
May102010

Sify.com Architecture - A Portal at 3900 Requests Per Second

Sify.com is one of the leading portals in India. Samachar.com is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...

Monday
May032010

MocoSpace Architecture - 3 Billion Mobile Page Views a Month

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

Stats

Click to read more ...

Monday
Apr122010

Poppen.de Architecture

This is a guest a post by Alvaro Videla describing their architecture for Poppen.de, a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung.

What is Poppen.de?

Poppen.de (NSFW) is the top dating website in Germany, and while it may be a small site compared to giants like Flickr or Facebook, we believe it's a nice architecture to learn from if you are starting to get some scaling problems.

The Stats

  • 2.000.000 users
  • 20.000 concurrent users
  • 300.000 private messages per day
  • 250.000 logins per day
  • We have a team of eleven developers, two designers and two sysadmins for this project.

Click to read more ...

Friday
Mar262010

Strategy: Caching 404s Saved the Onion 66% on Server Time

In the article The Onion Uses Django, And Why It Matters To Us, a lot of interesting points are made about their ambitious infrastructure move from Drupal/PHP to Django/Python: the move wasn't that hard, it just took time and work because of their previous experience moving the A.V. Club website; churn in core framework APIs make it more attractive to move than stay; supporting the structure of older versions of the site is an unsolved problem; the built-in Django admin saved a lot of work; group development is easier with "fewer specialized or hacked together pieces"; they use IRC for distributed development; sphinx for full-text search; nginx is the media server and reverse proxy; haproxy made the launch process a 5 second procedure; capistrano for deployment; clean component separation makes moving easier; Git for version control; ORM with complicated querysets is a performance problem; memcached for caching rendered pages; the CDN checks for updates every 10 minutes; videos, articles, images, 404 pages are all served by a CDN.

But the most surprising point had to be:

Click to read more ...

Tuesday
Mar162010

1 Billion Reasons Why Adobe Chose HBase 

Cosmin Lehene wrote two excellent articles on Adobe's experiences with HBase: Why we’re using HBase: Part 1 and Why we’re using HBase: Part 2. Adobe needed a generic, real-time, structured data storage and processing system that could handle any data volume, with access times under 50ms, with no downtime and no data loss. The article goes into great detail about their experiences with HBase and their evaluation process, providing a "well reasoned impartial use case from a commercial user". It talks about failure handling, availability, write performance, read performance, random reads, sequential scans, and consistency. 

One of the knocks against HBase has been it's complexity, as it has many parts that need installation and configuration. All is not lost according to the Adobe team:

HBase is more complex than other systems (you need Hadoop, Zookeeper, cluster machines have multiple roles). We believe that for HBase, this is not accidental complexity and that the argument that “HBase is not a good choice because it is complex” is irrelevant. The advantages far outweigh the problems. Relying on decoupled components plays nice with the Unix philosophy: do one thing and do it well. Distributed storage is delegated to HDFS, so is distributed processing, cluster state goes to Zookeeper. All these systems are developed and tested separately, and are good at what they do. More than that, this allows you to scale your cluster on separate vectors. This is not optimal, but it allows for incremental investment in either spindles, CPU or RAM. You don’t have to add them all at the same time.

Highly recommended, especially if you need some sort of balance to the recent gush of Cassandra articles. 

Tuesday
Mar162010

Justin.tv's Live Video Broadcasting Architecture

The future is live. The future is real-time. The future is now. That's the hype anyway. And as it has a habit of doing, the hype is slowly becoming reality. We are seeing live searches, live tweets, live location, live reality augmentation, live crab (fresh and local), and live event publishing. One of the most challenging of all live technologies is that of live video broadcasting. Imagine a world in which everyone becomes a broadcaster and a consumer of video streams, all in real-time (< 250 msec latency), all so you can talk and interact directly without feeling like you are in the middle of a time shift war. The resources and the engineering needed to make this happened must be substantial. How do you do that?

To find out I talked to Kyle Vogt, Justin.tv Founder and VP of Engineering. Justin.tv certainly has the numbers. Their 30 million unique monthly visitors even outshine YouTube in the video upload game, reportedly uploading nearly 30 hours per minute of video compared to YouTube's 23. I asked for an interview after listening to an interview with Justin Kan, another Founder of the eponymously named Justin.tv. Justin talked about how live video was fundamentally different than YouTube's batch video approach, where all the video is stored on disk and replayed later on demand. Live video can't be made by pushing video faster, it takes a completely differently architecture. Since the YouTube Architecture article is the most popular article ever on this site, I thought people might also enjoy learning about live side of the video world. Kyle was unbelievably generous with his time and insight into how Justin.tv makes all this live video magic happen, going way beyond the call, providing a tremendous number of juicy details. Anyone building a system can learn something from how they run their business. I can't thank Kyle enough for putting up with my never ending prodding.

Click to read more ...

Wednesday
Mar102010

How FarmVille Scales - The Follow-up

Several readers had follow-up questions in response to How FarmVille Scales to Harvest 75 Million Players a Month. Here are Luke's response to those questions (and a few of mine).

How does social networking makes things easier or harder?

The primary interesting aspect of social networking games is how you wind up with a graph of connected users who need to be access each other's data on a frequent basis. This makes the overall dataset difficult if not impossible to partition.

What are examples of the Facebook calls you try to avoid and how they impact game play?

We can make a call for facebook friend data to retrieve information about your friends playing the game. Normally, we show a friend ladder at the bottom of the game that shows friend information, including name and facebook photo. 

Can you say where your cache is, what form it takes, and how much cached there is? Do you have a peering relationship with Facebook, as one might expect at that bandwidth?

Click to read more ...

Thursday
Mar042010

How MySpace Tested Their Live Site with 1 Million Concurrent Users

This is a guest post by Dan Bartow, VP of SOASTA, talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article.

In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music.  These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live. 

If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises.  You want to understand your breaking points, define your capacity thresholds, and know how to react when those thresholds are exceeded.  Testing the production infrastructure with actual anticipated load levels is the only way to understand how things will behave when peak traffic arrives. 

For MySpace, the goal was to test an additional 1 million concurrent users on their live site stressing the new video features.  The key word here is ‘concurrent’.  Not over the course of an hour or day… 1 million users concurrently active on the site. It should be noted that 1 million virtual users are only a portion of what MySpace typically has on the site during its peaks.  They wanted to supplement the live traffic with test traffic to get an idea of the overall performance impact of the new launch on the entire infrastructure.  This requires a massive amount of load generation capability, which is where cloud computing comes into play. To do this testing, MySpace worked with SOASTA to use the cloud as a load generation platform. 

Here are the details of the load that was generated during testing.

Click to read more ...