« INFOSCALE 2009 in June in Hong Kong | Main | Which Key value pair database to be used »
Friday
Apr242009

Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

Update 4: Heroku versus GAE & GAE/J

Update 3: Heroku has gone live!. Congratulations to the team. It's difficult right now to get a feeling for the relative cost and reliability of Heroku, but it's an impressive accomplishment and a viable option for people looking for a delivery platform.

Update 2: Heroku Architecture. A great interactive presentation of the Heroku stack. Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache. Then requests are injected into an Erlang based routing mesh that balances requests across a grid of dynos. Dynos are your application "VMs" that implement application specific behaviors. Dynos themselves are a stack of: POSIX, Ruby VM, App Server, Rack, Middleware, Framework, Your App. Applications can access PostgreSQL. Memcached is used as an application caching layer.

Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.


Adam Wiggins of Heroku presented at the lollapalooza that was the Cloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.

For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...

I must confess to having an inherent love of Heroku's idea because I had a similar notion many moons ago, but the trendy language of the time was Perl instead of Rails. At the time though it just didn't make sense. The economics of creating your own "cloud" for such a different model wasn't there. It's amazing the niches utility computing will seed, fertilize, and help grow. Even today when using Eclipse I really wish it was hosted in the cloud and I didn't have to deal with all its deployment headaches. Firefox based interfaces are pretty impressive these days. Why not?

Adam views their stack as:
1. Developer Tools
2. Application Management
3. Cluster Management
4. Elastic Compute Cloud

At the top level developers see a control panel that lets them edit code, deploy code, interact with the database, see logs, and so on. Your website is live from the first moment you start writing code. It's a powerful feeling to write normal code, see it run immediately, and know it will scale without further effort on your part. Now, will you be able toss your Facebook app into the Heroku engine and immediately handle a deluge of 500 million hits a month? It will be interesting to see how far a generic scaling model can go without special tweaking by a certified scaling professional. Elastra has the same sort of issue.

Underneath Heroku makes sure all the software components work together in Lennon-McCartney style harmony. They take care (or will take care of) starting and stopping VMs, deploying to those VMs, billing, load balancing, scaling, storage, upgrades, failover, etc. The dynamic nature of Ruby and the development and deployment infrastructure of Rails is what makes this type of hosting possible. You don't have to worry about builds. There's a great infrastructure for installing packages and plugins. And the big hard one of database upgrades is tackled with the new migrations feature.

A major issue in the Rails world is versioning. Given the precambrian explosion of Rails tools, how does Heroku make sure all the various versions of everything work together? Heroku sees this as their big value add. They are in charge of making sure everything works together. We see a lot companies on the web taking on the role of curator ([1], [2], [3]). A curator is a guardian or an overseer. Of curators Steve Rubel says: They acquire pieces that fit within the tone, direction and - above all - the purpose of the institution. They travel the corners of the world looking for "finds." Then, once located, clean them up and make sure they are presentable and offer the patron a high quality experience. That's the role Heroku will play for their deployable Rails environment.

With great automated power comes great restrictions. And great opportunity. Curating has a cost for developers: flexibility. The database they support is Postgres. Out of luck if you wan't MySQL. Want a different Ruby version or Rails version? Not if they don't support it. Want memcache? You just can't add it yourself. One forum poster wanted, for example, to use the command line version of ImageMagick but was told it wasn't installed and use RMagick instead. Not the end of the world. And this sort of curating has to be done to keep a happy and healthy environment running, but it is something to be aware of.

The upside of curation is stuff will work. And we all know how hard it can be to get stuff to work. When I see an EC2 AMI that already has most of what I need my heart goes pitter patter over the headaches I'll save because someone already did the heavy curation for me. A lot of the value in services like rPath offers, for example, is in curation. rPath helps you build images that work, that can be deployed automatically, and can be easily upgraded. It can take a big load off your shoulders.

There's a lot of competition for Heroku. Mosso has a hosting system that can do much of what Heroku wants to do. It can automatically scale up at the webserver, data, and storage tiers. It supports a variery of frameworks, including Rails. And Mosso also says all you have to do is load and go.

3Tera is another competitor. As one user said: It lets you visually (through a web ui) create "applications" based on "appliances". There is a standard portfolio of prebuilt applications (SugarCRM, etc.) and templates for LAMP, etc. So, we build our application by taking a firewall appliance, a CentOS appliance, a gateway, a MySql appliance, glue them together, customize them, and then create our own template. You can specify down to the appliance level, the amount of cpu, memory, disk, and bandwidth each are assigned which let's you scale up your capacity simply by tweaking values through the UI. We can now deploy our Rails/Java hosted offering for new customers in about 20 minutes on our grid. AppLogic has automatic failover so that if anything goes wrong, it reploys your application to a new node in your grid and restarts it. It's not as cheap as EC2, but much more powerful. True, 3Tera won't help with your application directly, but most of the hard bits are handled.

RightScale is another company that combines curation along with load balancing, scaling, failover, and system management.

What differentiates Heroku is their web based IDE that allows you to focus solely on the application and ignore the details. Though now that they have a command line based interface as well, it's not as clear how they will differentiate themselves from other offerings.

The hosting model has a possible downside if you want to do something other than straight web hosting. Let's say you want your system to insert commercials into podcasts. That sort of large scale batch logic doesn't cleanly fit into the hosting model. A separate service accessed via something like a REST interface needs to be created. Possibly double the work. Mosso suffers from this same concern. But maybe leaving the web front end to Heroku is exactly what you want to do. That would leave you to concentrate on the back end service without worrying about the web tier. That's a good approach too.

Heroku is just getting started so everything isn't in place yet. They've been working on how to scale their own infrastructure. Next is working on scaling user applications beyond starting and stopping mongrels based on load. They aren't doing any vertical scaling of the database yet. They plan on memcaching reads, implementing read-only slaves via Slony, and using the automatic partitioning features built into Postgres 8.3. The idea is to start a little smaller with them now and grow as they grow. By the time you need to scale bigger they should have the infrastructure in place.

One concern is that pricing isn't nailed down yet, but my gut says it will be fair. It's not clear how you will transfer an existing database over, especially from a non-Postgres database. And if you use the web IDE I wonder how you will normal project stuff like continuous integration, upgrades, branching, release tracking, and bug tracking? Certainly a lot of work to do and a lot of details to work out, but I am sure it's nothing they can't handle.

Related Articles

  • Heroku Rails Podcast
  • Heroku Open Source Plugins etc
  • Reader Comments (8)

    This looks a lot like a frontend for http://ec2onrails.rubyforge.org/>EC2 On Rails, with a bunch of Capistrano tasks, the interface looks VERY VERY polished and nice though.

    December 31, 1999 | Unregistered CommenterAnonymous

    Great article! I was unaware that these types of services ("auto" scaling) even existed.

    I think that its worth a mention that the majority of scaling issues have to be dealt with at design time (statelesness, coupling, etc) and not at deployment time. No amount of scaling will fix poor design.

    Also, the article suggests that available databases (Postgres in Heroku's case) may limit a developers flexability. How does an abstracted data layer adress this inflexability? Many ORM tools (Hibernate, Toplink, etc) abstract the underlying database implementation from the application logic. Wouldn't these tools and techniques reclaim some of the inflexability that these scaling services impose?

    December 31, 1999 | Unregistered CommenterSeth

    I'm always curious about this. I notice that no one provides details about how/what is scaled usually.

    When I dig in, I usually find that the scaling they provide is primitive at best.

    One fundamental issue is that scaling isn't just about throwing more machines at a problem. Another issue is that different apps scale around different bottleneck points.

    What bugs me is that 'auto-scaling' is more of a marketing buzzword at this point and less of a reality. I'd like to see more transparency and discussion around standardized mechanisms for scaling and less smoke and mirrors.

    --Randy

    December 31, 1999 | Unregistered CommenterRandy Bias

    Of great concern should be that these are all new companies, trying out a new business plan, relying on new technology. How stable is this moving, newborn, unproven foundation you're building on? The flipside of curating and making it a piece of cake to go from zero-to-big fast, may be that if they go away then you go from big-to-zero even faster...

    Even if you have all your code (and it's in a standard format, without proprietary additions), being able to port it over to a new host may be impossible if you don't have the underlying specs of how these curating hosts are assembled. And those specs may not be available, if they are considered trade secrets. If you are hoping for (sane) venture funding or if your putting your own blood and sweat into a new project hosted by these guys, you better make your first weekend project to duplicate the host setup and verify that your app runs on the duplicate. Otherwise your bound by your host, and can never grow bigger than them.

    December 31, 1999 | Unregistered CommenterBen Curtis

    Do check out www.morpheXchange.com for fuzz-free deployments for your Rails Apps. Guaranteed no lock-in and hosted on top of AWS based on open source technology.

    Best.
    alain

    December 31, 1999 | Unregistered Commenterfriarminor

    To give this type of a job can be quite hectic but giving a start would still be effective.
    -----
    http://underwaterseaplants.awardspace.com">sea plants
    http://underwaterseaplants.awardspace.com/seagrapes.htm">sea grapes...http://underwaterseaplants.awardspace.com/plantroots.htm">plant roots

    December 31, 1999 | Unregistered Commenterfarhaj

    This is starting to sound like that fantasy everybody not-quite-technical-enough has about how someday software will be just dragging and dropping little boxes on the screen, and everybody can do it! As it turns out, drag & drop doesn't seem to work too well for much of software development, but the idea of software being _really_ easy to develop and deploy seems to be coming to fruition. The best part is that it's coming from the geeks, not the guys who want LabVIEW, so I think it's actually going to, y'know, work well for geeks. Here's to Saasaas!

    _____________________
    Submited by : http://www.llegaelbebe.com">Bebes

    December 31, 1999 | Unregistered Commentercaballosweb


    Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache.

    Why is Nginx generally used as the proxy when varnish is advertised to be capable of doing both? Is there a compelling reason for this?

    December 31, 1999 | Unregistered CommenterShoan

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>