Entries in Example (248)

Tuesday
Feb222011

Is Node.js Becoming a Part of the Stack? SimpleGeo Says Yes.

This is an interview with Wade Simmons, an Infrastructure Engineer at SimpleGeo, a service making it easy for developers to create location-aware applications, on their increasing use of Node.js as a backend service component, replacing code that would have at one time been written in Java, Python or Ruby. Node.js is finding it's way into many stacks these days and I was curious why that might be. My experience writing several messaging systems is that programmers don't like the async model and it's a big surprise that a pure async programming model like Node.js, especially one that uses server-side Javascript, would be taking off. Wade was generous enough to help explain their reasoning behind using Node.js at SimpleGeo. I'd really like to thank Wade for taking the time for this interview. He did a really great job and provided a lot of insight on how the modern web stack is evolving in the crucible of real-life experience.  

And here begins the interview with Wade Simmons:

Click to read more ...

Tuesday
Feb152011

Wordnik - 10 million API Requests a Day on MongoDB and Scala

Wordnik is an online dictionary and language resource that has both a website and an API component. Their goal is to show you as much information as possible, as fast as we can find it, for every word in English, and to give you a place where you can make your own opinions about words known. As cool as that is, what is really cool is the information they share in their blog about their experiences building a web service. They've written an excellent series of articles and presentations you may find useful:
  • What has technology done for words lately?
    • Eventual consistency. Using an eventually consistent model they can do work in parallel and we count as many words as possible when we can, and add them all up when there’s a lag. The count’s always in the ballpark, and we never have to stop.D
    • Document-oriented storage. Dictionary entries are more naturally modeled as hierarchical documents and using that model has made it quicker to find data and is easier for development.

Click to read more ...

Tuesday
Feb082011

Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second

Mollom is one of those cool SaaS companies every developer dreams of creating when they wrack their brains looking for a viable software-as-a-service startup. Mollom profitably runs a useful service—spam filtering—with a small group of geographically distributed developers. Mollom helps protect nearly 40,000 websites from spam, including one of mine, which is where I first learned about Mollom. In a desperate attempt to stop spam on a Drupal site, where every other form of CAPTCHA had failed miserably, I installed Mollom in about 10 minutes and it immediately started working. That's the out of the box experience I was looking for.

From the time Mollom opened its digital inspection system they've rejected over 373 million spams and in the process they've learned that a stunning 90% of all messages are spam. This spam torrent is handled by only two geographically distributed machines that handle 100 requests/ second, each running a Java application server and Cassandra. So few resources are necessary because they've created a very efficient machine learning system. Isn't that cool? So, how do they do it?

Click to read more ...

Monday
Jan102011

Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data

How would you implement a key-value storage system if you were starting from scratch? The approach Basho settled on with Bitcask, their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes.  In this excellent Changelog interview, some folks from Basho describe Bitcask in more detail.

The essential Bitcask:

Click to read more ...

Thursday
Jan062011

BankSimple Mini-Architecture - Using a Next Generation Toolchain

I know people are always interested in what others are using to build their systems. Alex Payne, CTO of the new startup BankSimple, gives us a quick hit on their toolchain choices in this Quora thread. BankSimple positions itself as a customer-focused alternative to online banking. You may remember Alex from the early days of Twitter. Alex was always helpful to me on Twitter's programmer support list, so I really wish them well. Alex is also a bit of an outside the box thinker, which is reflected in some of their choices:

Click to read more ...

Wednesday
Dec292010

Pinboard.in Architecture - Pay to Play to Keep a System Small  

How do you keep a system small enough, while still being successful, that a simple scale-up strategy becomes the preferred architecture? StackOverflow, for example, could stick with a tool chain they were comfortable with because they had a natural brake on how fast they could grow: there are only so many programmers in the world. If this doesn't work for you, here's another natural braking strategy to consider: charge for your service

This interesting point, one I hadn't properly considered before, was brought up by Maciej Ceglowski, co-founder of Pinboard.in, in an interview with Leo Laporte and Amber MacArthur on their their net@night show.

Pinboard is a lean, mean, pay for bookmarking machine, a timely replacement for the nearly departed Delicious. And as a self professed anti-social bookmarking site, it emphasizes speed over socializing. Maciej considers Pinboard a personal archive, where you can keep a history of what you are reading: forever. When the demise of Delicious was announced, if Pinboard had been a free site they'd have been down immediately, but being a paid site helped flatten out their growth curve.

Bookmarking sites used to about sharing links with your friends, but Twitter has largely taken over that role. Twitter, however, is infamous for presenting only a small slice of your tweet history. What you really want is a big server sucking down your bookmarks from wherever you might bookmark them, and that's just what Pinboard does.

A few points struck me as particularly cool about Pinboard:

Click to read more ...

Tuesday
Nov162010

Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others.

Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.

HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.

Facebook chose HBase because they monitored their usage and figured out what the really needed. What they needed was a system that could handle two types of data patterns:

  1. A short set of temporal data that tends to be volatile
  2. An ever-growing set of data that rarely gets accessed

Makes sense. You read what's current in your inbox once and then rarely if ever take a look at it again. These are so different one might expect two different systems to be used, but apparently HBase works well enough for both.

Some key aspects of their system:

Click to read more ...

Tuesday
Oct262010

Scaling DISQUS to 75 Million Comments and 17,000 RPS

This presentation and video by Jason Yan and David Cramer discusses how they scaled DISQUS, a comments as a service service for easily adding comments to your site and connecting communities. The presentation is very good, so here are just a few highlights: 

  • Traffic: 17,000 requests/second peak; 450,000 websites; 15 million profiles; 75 million comments; 250 million visitors; 40 million monthly users / developer.
  • Forces: unpredictable traffic patterns because of celebrity gossip and events like disasters; discussion never expire which means they can't fit in memory; must always be up.
  • Machines: 100 servers; 30% web servers (Appache + mod_wsgi); 10% databases (PostgreSQL); 25% cache servers (memcached); 20% load balancing / high availability (HAProxy + heartbeat); 15% Utility servers (Python scripts).
  • Architecture: Requests are load balanced across an Apache cluster. Apache talks to memcached, HAProxy/pgbouncer to handle connection pooling to the database, and a central queue service. 
  • Strategies: make sure indexes fit in memory; log slow queries; use connection pooling; the data model consists of user, forum, thread, post; partitions horizontally (Disqus, Your blog, etc) and vertically (forums, posts, users, sentry) at application level; joins performed in Python; Hudson is used for continuous integration; Redmine is used for bug tracking; extensive test suite; feature switches are used to turn off features; isolate slow functions from transactions; use autocommit for read slaves; a queue is used for low priority tasks; Django QuerySet caching is turned off to save memory.
Tuesday
Sep282010

6 Strategies for Scaling BBC iPlayer

The BBC's iPlayer site averages 8 million page views a day for 1.3 million users. Technical Architect Simon Frost describes how they scaled their site in Scaling the BBC iPlayer to handle demand:

  1. Use frameworks. Frameworks support component based development which makes it convenient for team development, but can introduce delays that have to be minimized. Zend/PHP is used because it supports components and is easy to recruit for.  MySQL is used for program metadata. CouchDB is used for key-value access for fast read/write of user-focused data.
  2. Prove architecture before building it. Eliminate guesswork by coming up with alternate architectures and create prototypes to determine which option works best. Balance performance with factors like ease of development.
  3. Cache a lot. Data is cached in memcached for a few seconds to minutes. Short cache invalidation periods keep the data up to date for the users, but even these short periods make a huge difference in performance. Caching doesn't have to be for a long time to see a benefit. Varnish is used to cache HTML pages. Much of the invalidation is time or action-based (e.g. someone adds a new favourite).
  4. Click to read more ...

Tuesday
Sep212010

Playfish's Social Gaming Architecture - 50 Million Monthly Users and Growing

Ten million players a day and over fifty million players a month interact socially with friends using Playfish games on social platforms like The Facebook, MySpace, and the iPhone. Playfish was an early innovator in the fastest growing segment of the game industry: social gaming, which is the love child between casual gaming and social networking. Playfish was also an early adopter of the Amazon cloud, running their system entirely on 100s of cloud servers. Playfish finds itself at the nexus of some hot trends (which may by why EA bought them for $300 million and they think a $1 billion game is possible): building games on social networks, build applications in the cloud, mobile gaming, leveraging data driven design to continuously evolve and improve systems, agile development and deployment, and selling virtual good as a business model.

How can a small company make all this happen? To explain the magic I interviewed Playfish's Jodi Moran, Senior Director of Engineering, and Martin Frost, Chief Architext, first Engineer and Operations guy at Playfish. Lots of good stuff, so let's move on to the nitty gritty.

Click to read more ...