« Database Sharding at Netlog, with MySQL and PHP | Main | Google AppEngine - A Second Look »
Sunday
Feb222009

Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way

Garry Tan, cofounder of Posterous, lists 12 lessons for scaling that apply to more than just Rails.

  • Use cloud storage for static files.
  • Use HTTP Cache Control to tell the browser what it can cache.
  • Use Sphinx for text search.
  • Use InnoDB for more crash resistant and faster writes.
  • Don't use textbook Rails ActiveRecord objects. Use New Relic to find exactly what is slow in your system.
  • Use memcache later so you find your database bottlenecks now.
  • Use mongrel proctitle to find your slow queries. You are only as fast as your slowest queries.
  • Use asynchronous job queuing to do work in parallel.
  • Use monitoring so you'll know when your site went down and why.
  • Learn by reading the source code, fixing problems, and submitting them back to the community.
  • Use new plugins. Old plugins can't be trusted.
  • Use new information. Old information can't be trusted.
  • Reader Comments (9)

    great post, and when u say could, do u mean something like amazon s3?

    December 31, 1999 | Unregistered Commenterniuwa

    Great tips! :-)

    December 31, 1999 | Unregistered CommenterIan

    Great article, thanks for sharing.
    I'm surprised you didn't mention, use CACHE as mush as possible (I mean rails cache capabilities, not only the HTTP Cache Control). There are a lot of different caching method (from the finest, and sorry if I forget one) :

    - sql caching
    - fragment caching
    - action caching (still going through rails, and especially through filters)
    - page caching (doesn't use rails any more, be careful !)
    - memcached caching (doesn't use rails any more, be careful !)

    Also, I would add using cookies session store will help to scale, since sessions are managed on the client side, not server side. You can safely use a L4 load-balancer instead of costly L7.

    Gravis

    December 31, 1999 | Unregistered CommenterGravis

    I'd suggest as well as considering "use cloud storage", consider an alternative: "don't serve static content from your inefficient rails web server software".

    Using nginx, thttpd, or lighttpd, and/or a caching reverse proxy like Varnish will equally/better help most scale a rails application (where processor/memory resources are the bottleneck, and not disk storage), and are usually easier and cheaper to implement. Not even mentioning S3 and other "cloud storage" services' reliability issues.

    December 31, 1999 | Unregistered Commenterwg

    I'm honored to have a post show up here on the high scalability blog. I've been quite inspired by what you guys are doing here, and thanks for the great work.

    December 31, 1999 | Unregistered CommenterGarry Tan

    I'm not sure what "wg" means by "cloud storage services' reliability issues" but I'm going to assume he's talking about the one, well-publicized S3 outage that lasted a few hours. S3 provides an environment that, in all likelihood, is significantly more stable than any startup could provide, especially someone still considering architecture decisions. I'd say host everything you can on S3 + CloudFront. Use it as your web host. Put your applications on EC2. The infrastructure frameworks that have emerged to support Rails on EC2 are (relatively) phenomenal. For a database, use a single instance with PostgreSQL and put memcached everywhere to alleviate read load. Use Linux software RAID to create a striped array of EBS volumes (back these to S3!). Learn to tune your PostgreSQL configs! Make sure you're shipping PITR logs somewhere for warm recovery. While MySQL has great support for read-only slaves and PostgreSQL's support for replication sucks, memcached can pick up the slack in 90% of the cases.

    PostgreSQL is not only more sophisticated but, as of 8.3 especially, is faster for highly-concurrent, write-heavy, OLTP-ish loads (aka a database-backed web applicaton). While MySQL has lost it's way, PostgreSQL is being picked up at a rapid clip. I've used both databases and they're both absolutely phenomenal tools, but if you're moving forward with a new idea, there is no reason to choose MySQL IMHO. HOWEVER! Don't take my word for it! Do your own benchmarks! Make sure you've got a correctly configured PostgreSQL and run your application code against it. Everyone's situation differs and it's best to run real benchmarks in real world scenarios. Make sure you benchmark a real architecture with caching and tweaked ActiveRecord calls.

    AWS, even with it's perceived difficulties, is the only cloud environment that "gets it." Basic components that give developers exactly what they need and nothing more. The dirty secret is that most good developers don't mind spending time on the systems side, they just don't want to waste time doing monkey work like switching out drivers and rebooting, swapping tapes, endless trial+error, etc. AWS eliminates (most of) the monkey work and let's you focus on architecture and development. It will take years for everyone else to catch up to where they are RIGHT NOW. Vogels and Bezos are briliant.

    December 31, 1999 | Unregistered CommenterRick Branson

    Thanks for the tips.

    I have a question.
    >>Don't use textbook Rails ActiveRecord objects.

    So what should be used instead?

    December 31, 1999 | Unregistered CommenterAnonymous

    Try DBI instead. Active record has a bad habit of maintaining connections with its own timeouts and if a mysql patch is not applied the irritating "broken pipe" error might occur. DBI is simple and easy to use although it is not even a 1.0 release.

    December 31, 1999 | Unregistered CommenterAnonymous

    Posterous: We've been emailed about this problem and hopefully we'll be able to fix it soon. Posterous requires cookies, so if you've disabled cookies, you may see this error. Re-enable cookies and try again.

    Oops

    December 31, 1999 | Unregistered CommenterS

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>