Entries in Mongrel (3)

Tuesday
Jul282009

37signals Architecture

Update 7: Basecamp, now with more vroom. Basecamp application servers running Ruby code were upgraded and virtualization was removed. The result: A 66 % reduction in the response time while handling multiples of the traffic is beyond what I expected. They still use virtualization (Linux KVM), just less of it now.
Update 6: Things We’ve Learned at 37Signals. Themes: less is more; don't worry be happy.
Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
Update 2: customer support is handled in real-time using Campfire.
Update: highly useful information on creating a customer billing system.


In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

Site: http://www.37signals.com

Information Sources

  • Ask 37signals: Numbers?
  • Ask 37signals: How do you process credit cards?
  • Behind the scenes at 37signals: Support
  • Ask 37signals: Why did you restart Highrise?

    Platform

  • Ruby on Rails
  • Memcached
  • Xen
  • MySQL
  • S3 for image storage

    The Stats

  • 30 servers ranging from single processor file servers to 8 CPU application servers for about 100 CPUs and 200GB of RAM.
  • Plan to diagonally scale by reducing the number of servers to 16 for about 92 CPU cores (each significantly faster than what are used today) and 230 GB of combined RAM.
  • Xen virtualization will be used to improve system management.
  • Basecamp (web based project management)
    * 2,000,000 people with accounts
    * 1,340,000 projects
    * 13,200,000 to-do items
    * 9,200,000 messages
    * 12,200,000 comments
    * 5,500,000 time tracking entries
    * 4,000,000 milestones

  • Backpack (personal and small business information management)
    * Just under 1,000,000 pages
    * 6,800,000 to-do items
    * 1,500,000 notes
    * 829,000 photos
    * 370,000 files

  • Overall storage stats (Nov 2007)
    * 5.9 terabytes of customer-uploaded files
    * 888 GB files uploaded (900,000 requests)
    * 2 TB files downloaded (8,500,000 requests)

    The Architecture

  • Memcached caching is used and they are looking to add more. Yields impressive performance results.
  • URL helper methods are used rather than building the URLs by hand.
  • Standard ActiveRecord built queries are used, but for performance reasons they will also "dig in and use" find_by_sql when necessary.
  • They fix Rails when they run into performance problems. It pays to be king :-)
  • Amazon’s S3 is used for storage of files upload by users. Extremely happy with results.

    Credit Card Processing Process

  • Bill monthly. It makes credit card companies more comfortable because they won't be on the hook for a large chunk of change if your company goes out of business. Customers also like it better because it costs less up front and you don't need a contract. Just pay as long as you want the service.

  • Get a Merchant Account. One is needed to process credit cards. They use Chase Bank. Use someone you trust and later negotiate rates when you get enough volume that it matters.
  • Authorize.net is the gateway they use to process the credit card charge.
  • A custom built system handles the monthly billing. It runs each night and bills the appropriate people and records the result.
  • On success an invoice is sent via email.
  • On failure an explanation is sent to the customer.
  • If the card is declined three times the account is frozen until a valid card number is provided.
  • Error handling is critical because problems with charges are common. Freeze to fast is bad, freezing too slow is also bad.
  • All products are being converted to using a centralized billing service.
  • You need to be PCI DSS (Payment Card Industry Data Security Standard) compliant.
  • Use a gateway service that makes it so you don't have to store credit card numbers on your site. That makes your life easier because of the greater security. Some gateway services do have reoccurring billing so you don't have to do it yourself.

    Customer Support

  • Campfire is used for customer service. Campfire is a web-based group chat tool, password-protectable, with chatting, file sharing, image previewing, and decision making.
  • Issues discussed are used to drive code changes and the subversion commit is shown in the conversation. Seems to skip a bug tracking system, which would make it hard to manage bugs and features in any traditional sense, ie, you can't track subversion changes back to a bug and you can't report what features and bugs are in a release.
  • Support can solve problems by customers uploading images, sharing screens, sharing files, and chatting in real-time.
  • Developers are always on within Campfire addressing problems in real-time with the customers.

    Lessons Learned

  • Take a lesson from Amazon and build internal functions as services from the start. This make it easier to share them across all product lines and transparently upgrade features.
  • Don't store credit card numbers on your site. This greatly reduces your security risk.
  • Developers and customers should interact in real-time on a public forum. Customers get better service as developers handle issues as they come up in the normal flow of their development cycle. Several layers of the usual BS are removed. Developers learn what customers like and dislike which makes product development more agile. Customers can see the responsiveness of the company to customers by reading the interactions. This goes a long ways to give potential customers the confidence and the motivation to sign up.
  • Evolve your software by actual features needed by users instead of making up features someone might need someday. Otherwise you end up building something that nobody wants and won't work anyway.
  • Tuesday
    Jul222008

    Scaling Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App  

    Several months ago I attended a Joyent presentation where the spokesman hinted that Joyent had the chops to support a one billion page per month Facebook Ruby on Rails application. Even under a few seconds of merciless grilling he would not give up the name of the application. Now we have the big reveal: it was LinkedIn's Bumper Sticker app. For those not currently sticking things on bumps, Bumper Sticker is quite surprisingly a viral media sharing application that allows users to express their individuality by sticking small virtual stickers on Facebook profiles. At the time I was quite curious how Joyent's cloud approach could be leveraged for this kind of app. Now that they've released a few details, we get to find out.

    Site: http://www.Facebook.com/apps/application.php?id=2427603417

    Information Sources

  • Video: Scaling to 1 Billion Page Views Per MonthVideo (very flashy)
  • Web Scalability Practices: Bumper Sticker on Rails by Ikai Lan and Jim Meyer from LinkedIn
  • 1 Billion Page Views a Month by David Young from Joyent
  • Ruby on Rails: scaling to 1 billion page views per month by Dennis Howlettby from Zdnet
  • Joyent's Grid Accelerators for Web Applications by Jason Hoffman from Joyent
  • On Grids, the Ambitions of Amazon and Joyent by Jason Hoffman from Joyent
  • Scaling Ruby on Rails to 1 Billion Page Views a Month by Joe Pruitt from DevCentral

    The Platform

  • MySQL
  • Nginx
  • Mongrel
  • CDN
  • Ruby on Rails (rapid prototype development approach)
  • Facebook
  • Joyent Accelerator - provides a highly scalable on-demand infrastructure for running web sites, including rich web applications written in Ruby on Rails, PHP, Python and Java. Joyent Accelerators are next-generation virtual computers that can grow and multiply (or shrink and consolidate) depending on the real world demands faced by your Web application. Accelerators are built on OpenSolaris, multi-core (8+), RAM-rich servers (32GB+ each) and vast amounts of NAS storage.
  • Masochism Plugin - provides an easy solution for Ruby on Rails applications to work in a replicated database environment. Connection proxy sends some database queries (those in a transaction, update statements, and ActiveRecord::Base#reload) to a master database, and the rest to the slave database.

    The Stats

  • 1 billion page views per month
  • 13.5 million installations
  • 1.5 million daily active users. Recruited 1 million users in first 46 days.
  • 20-27 million canvas page views a day
  • 13 web application servers running Nginx and Mongrel
  • 8 static asset servers serving over 3,500,000 stickers (migrating to a CDN)
  • 4 MySQL servers in a master/slave configuration using Masochism as a proxy to load balance database operations.
  • Cost is about $25K/month.

    The Architecture

  • Bumper Sticker was an experiment to see how fast the Light Engineering Development (LED) team at LinkedIn could build a Ruby on Rails Facebook application.
  • RoR was an easy an environment to prototype in, but they needed a production environment in which they could quickly develop, deploy, and scale. Joyent was selected.
  • Some Notes on Joyent:
    * Joyent is a scale on demand cloud. Allows customers to have a dynamic data center instead of being stuck using their own rigid infrastructure.
    * There's an API if you need one. The service is unmanaged, you get root on all your boxes.
    * They consider their infrastructure to be better and more open than Amazon. You get access to a high end load balancer and the capabilities of OpenSolaris (Dtrace, Zones, lower request processing overhead, sub 10 second reboot times).
    * Joyent's primary scalability principle is to organize apps around silos built from their powerful Accelerator blocks: put applications on different servers based on the quality of service you want to give them. For example, put static content on their own servers so the static content is always served fast and reliably. This allows you to prioritize based on what's important to you. You could, for example, prioritize the virality of your application by putting the Invite Friends functionality on their own servers, thus assuring the growth of your application through your viral functionality possibly at the expense of less important functionality.
    * Has three data centers in the US and are opening a fourth, none in Europe.
    * Considers their secret sauce to be their highly sophisticated administration system which allows a few people to easily manage a large infrastructure.
    * Has a peering relationship with Facebook. That means there are direct high-speed fiber links between Joyent’s data center in Emeryville and Facebook’s data center in San Francisco.

  • 80% of the content for Bumber Sticker is static. The Facebook API can directly render content at a specified memory location. Bumper Sticker was able to use the scripting feature of F5 BIG-IP load balancer to directly load static content by passing a pointer to the Facebook API.

    The Lessons

  • Rails scales exactly like any other app. Take into account all the components from the moment the request is received at the load balancer all the way down and all the way back again.
  • The development process is: put some measurements in place, find problems, fix problems, more people adopt and scale you out of your solution, and the cycle repeats. Sun's Dtrace feature makes it easy to instrument the stack to identify bottlenecks.
  • Rails scales as long as the development team using it understands that many of the bottlenecks are exactly those faced by developers on any other database-driven web platform.
  • Hit a disk spindle and you are screwed. Avoid going to the database or the file system. The more they avoided disk the fewer timeouts they experienced.
  • Convert anything dynamic into static content. Dynamic content is your enemy. Convert anything dynamic into static content so it can be removed from the disk path.
  • Push content to the edge. Move content as close to the client as possible. Move cache to the CDN. Reduce time going across the network.
  • Faster means more viral. On a viral system the better the performance the more people can play with your application. The more people who play with your system the more likely they are to pull more people in, which means the more the app will spread and go viral. Bumper Sticker has been successful at creating a community of fans who enjoy uploading and sharing their own stickers.

    Some issues:
  • Since most of the content is static and served by the load balancer, the impact of Rails in the system is not clear.
  • The functionality of Bumper Sticker is relatively simple. What would the impact be on scalability if other often requested features like search were added?

    Related Articles

  • Friends for Sale Architecture - A 300 Million Page View/Month Facebook RoR App
  • Monday
    Jan072008

    How Ruby on Rails Survived a 550k Pageview Digging

    Shanti Braford details how his Ruby on Rails based website survived a 24 hour 550,000+ pageview digg attack. His post cleanly lays out all the juicy setup details, so there's not much I can add. Hosting costs $370 a month for 1 web server, 1 database server, and sufficient bandwidth. The site is built on RoR, nginx, MySQL, and 7 mongrel servers. He thinks Rails 2.0 has improved performance and credits database avoidance and fragment caching for much of the performance boost. Keep in mind his system is relatively static, but it's a very interesting and useful experience report.

    Click to read more ...