Entries in cloud (63)

Thursday
Sep172009

Infinispan narrows the gap between open source and commercial data caches 

Recently I attended a lecture presented by Manik Surtani , JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.

Why did I attend? Well, over the past few years I have worked on projects that have used commercial distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence . These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?

Read more at: http://bigdatamatters.com/bigdatamatters/2009/09/infinispan-vs-gigaspaces.html

Friday
Sep112009

The interactive cloud

How many times have you been called in the middle of the night by your operation guys telling you that your application throws some odd red alerts? How many times did you found out that when those issues happens you don't have enough information to analyze this incident? have you tried to increase the log level just to find out that your problem became even worse - now your application throws tons of information in a continues basis most of which is complete garbage...

The current separation between the way we implement our application and the way we manage it leads to many of this ridicules situations. Cloud makes those things even worse.

In this post i suggest an alternative approach. Why don't we run our application the way we run our business? I refer to this approach as the "interactive cloud" where our application behaves just like our project team and the operations just like our managers. As with our business our application would need to take more responsibility to the way it runs and take corrective actions such as balancing it own resources, re-assign tasks to the available resources in case of failure etc. It will need to involve its manager only when it runs out of resource. It will need to provide reports in a way that makes sense to our managers.

In the first part of this post describes the general concept behind this model and the second part provides technical background which include code snippet based on our experience in GigaSpaces.

Monday
Aug312009

Scaling MySQL on Amazon Web Services

I've recently started working with a large company who is looking to take one of their heavily utilized applications and move it to Amazon Web Services. I'm not looking to start a debate on the merits of EC2, the decision to move to aws is already made (and is a much better decision than paying a vendor millions to host it).

I've done my reasearch and I'm comfortable with creating this environment with one exception, scaling MySQL. I havent done much work with MySQL, i'm more of an Oracle guy up to now. I'm struggling to determine a way to scale MySQL on the fly in a way so that replication works, the server takes its proper place in line for master candidacy, and the apache servers become aware of it.

So this is really three questions:

1. What are some proven methods of load balancing the read traffic going from apache to MySQL.
2. How do I let the load balancing mechanism know when I scale up / down a new Mysql Server?
3. How to alert the master of the new server and initiate replication in an automated environment?

Personally, I dont like the idea of scaling the databases, but the traffic increases exponentially for three hours a day, and then plummets to almost nothing. So this would provide a significant cost savings.

The only way I've read to manage this sort of scaling I read here on slides 18-25:
http://assets.en.oreilly.com/1/event/21/Tricks%20and%20Tradeoffs%20of%20Deploying%20MySQL%20Clusters%20in%20the%20Cloud%20Presentation
Has anyone tried this method and either had success or have scripts available to do this? I try not to remake the wheel when I dont have to. Thanks in advance.

Saturday
Aug082009

1dbase vs. many and cloud hosting vs. dedicated server(s)?

Me and my partner are making a blueprint for an online webshop service. The purpose of this project is to make webshops available for small company's/ individuals automatically just by creating an account with us. Our webapp can be used to add products/pages/... to the store and we'll handle secure checkout by paypal.

Our app should be scalable and manageable. Because we also want to offer free webshops, the amount of webshops could be +10.000 within a few years. We are building on the Zend framework and are using mysql for database.

From the start we want to build our application for optimal and easy scalability in the future, to avoid a lot changes to our app/database in the future.

Now our questions are:

Should we use?:
* one database for all shops (or limited to X shops );
* one database for each new shop (each having products, orders... tables);

I think both approaches have PRO/CONS. What do you think ? Does anyone has experience with this kind of structure ?

PRO:
one database: easier to make changes to database layout
multiple databases:more scalable, easier to backup/restore

CON:
one database: harder to code because of extra keyfields in tables , slower or more difficult to backup/restore.
multiple databases: Takes longer to push changes to database layout

We are totally not clear on what hosting we should use. Would it be a solution to use a cloud service such as mosso/amazon/gogrid? Or is it better to just start with one dedicated server and expand 'manually' later?

Thanks in advance for your help!

Friday
Aug072009

The Canonical Cloud Architecture 

Update 2: Elastic Load Balancer and EC2 instance bandwidth. It turns out we are limited by bandwidth and not by CPU. Solution: use DNS Round Robin for two to three HighCPU medium instances.
Update: The Skinny Straw: Cloud Computing's Bottleneck and How to Address It. For cloud computing, bandwidth to and from the cloud provider is a bottleneck. Solution: Evaluate application architecture and consider application partitioning.

I'm writing this post as a sort of penance. My sin was getting involved in another mutli-threaded mess of a program that was rife with strange pauses and unexpected errors. I really should have known better. But when APIs choose to make callbacks from some mystery thread pool it's hard to keep things straight. I eventually sobered up and posted all events to a queue so I could make sure the program would work correctly. Doh. I may never know why the .Net console output stopped working, but I'll live with it.

And that reminded me that I've been meaning to write a post on the standard Cloud Architecture. I've tried to hit all the common architectures at one time or another, but there have been some excellent sources lately on structuring programs in a cloud that people may "know" in the same way I knew what not to do, but when the code hits the editor those thoughts may have hidden like a kid next to a broken cookie jar.

The easiest way to create a scalable service is to compose the service from other scalable services. This is how Google AppEngine works and is largely how AWS works as well (EC2, S3, SQS, SimpleDB, etc), though AWS also functions as a blank canvas on which you can draw your own designs.

The canonical cloud architecture that has evolved revolves around dynamically scalable CPUs consuming asynchronous, persistently queued events. We talked about this idea already in Flickr - Do the Essential Work Up-front and Queue the Rest. The cloud is just another way of implementing the same idea.

Amazon suggests a few applications of the Cloud Architecture as:

  • Processing Pipelines
    - Document processing pipelines – convert hundreds of thousands of documents from Microsoft Word to PDF, OCR millions of pages/images into raw searchable text
    - Image processing pipelines – create thumbnails or low resolution variants of an image, resize millions of images
    - Video transcoding pipelines – transcode AVI to MPEG movies
    - Indexing – create an index of web crawl data
    - Data mining – perform search over millions of records
  • Batch Processing Systems
    - Back-office applications (in financial, insurance or retail sectors)
    - Log analysis – analyze and generate daily/weekly reports
    - Nightly builds – perform nightly automated builds of source code repository every night in parallel
    - Automated Unit Testing and Deployment Testing – Test and deploy and perform automated unit testing (functional, load, quality) on different deployment configurations every night
  • Websites
    - Websites that “sleep” at night and auto-scale during the day
    - Instant Websites – websites for conferences or events (Super Bowl, sports tournaments)
    - Promotion websites
    - “Seasonal Websites” - websites that only run during the tax season or the holiday season (“Black Friday” or Christmas)

    A good list, but after having worked on a seasonal website for taxes AWS is a horrible match. AWS only works on the instance level, so you need a whole instance turned on all the time even when there's no demand. This is a complete waste of money. An AWS model truly based on use combined with an SLA driven dashboard would be very convenient. But on to cases where AWS is a good fit.

    SmugMug's Cloud Architecture

    AWS pioneer Don MacAskill of SmugMug details how they process high-resolution photos and high-definition video use a cloud hosted queuing architecture in SkyNet Lives! (aka EC2 @ SmugMug).

    SkyNet, as you might expect, operates completely without human minders and automatically scales up and down in relation to the work load. Their system has several components:
  • Work Initiators - Work comes in from your website and/or other software subsystems and is queued up for processing in the Queue Service. Work doesn't have to be large requests either. Work can be small independent parts of an overall pipeline. Don't keep state in the Workers. Bundle what you need done into a work request in shoot back into the Queuing Service for processing.
  • Provisioning Service - This is Amazon's infrastructure that allows instances to be automatically scaled up and down in relation to the work load. This will be the major difference between your VPS or typical datacenter setup. There's an API for starting and stopping AMIs and
    mechanisms for automatically configuring and running VMs.
  • Workers - These are the guys that continually pull work off queues and do something interesting with it. For SmugMug the results are stored on S3 but the results could be put in your own database, SimpleDB or whatever.
  • Queuing Service - This is where work is queued for consumption by the workers. SmugMug built their own queuing service, but you could just as easily use Amazon's own SQS. Creating a scalable, distributed, performant, highly available queue service is not easy, so you may want to take a look at a number of different queue product suggestions in Flickr - Do the Essential Work Up-front and Queue the Rest.
  • Controller - This component monitors many variables related to the work flow and decides how many instances of EC2 are necessary based on optimizing a small set of goals. Instances are add and removed as needed.

    Don shares a lot of practical detailia on how to efficiently use AWS, how their queue service works, and how their controller manages to balances minimizing cost while still being responsive to users. Achieving fairness and balance in a queue system can be difficult, but SmugMug appears to have done a good job of that.

    What rocks about queuing architectures is that they are just so damn robust. Work is safe in the queues. A random reboot won't cause a loss. If one component is producing events too fast the queue will buffer up events until they can be processed. New components can be cleanly added and removed from the system at any time. Timing isn't critical. Work is processed when someone gets around to it. Timeouts and retries are unnecessary. Programs are simple loops that block on the queue, do something, persist results, and feed back more parallelizable work requests back into the queue. Very hard to screw up. Compare and contrast to complex multi-threaded system with shared-state.

    Building GrepTheWeb in the Cloud

    Amazon has published a great couple of articles on building a canonical Cloud Architecture: Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures and Building GrepTheWeb in the Cloud, Part 2: Best Practices.

    These are really tight and well written articles so I'll just hit certain high points. The example used is an application called GrepTheWeb. GrepTheWeb searches using a regular expression across millions of web documents. So it's a grep for the web, ah got it now. The idea is to take an unpredictable but possibly large number of search requests, apply the search expression to hundreds of terabytes of documents, and return the results in a reasonable period of time.

    How exactly would you do such a thing? Here's how you do it in the cloud:
  • Amazon S3 for retrieving input datasets and for storing the output dataset
  • Amazon SQS for durably buffering requests acting as a “glue” between controllers
  • Amazon SimpleDB for storing intermediate status, log, and for user data about tasks
  • Amazon EC2 for running a large distributed processing Hadoop cluster on-demand
  • Hadoop for distributed processing, automatic parallelization, and job scheduling

    Clearly these are all (except for Hadoop) built on Amazon services, but the general ideas apply anywhere. For storing large amounts of data and accessing it efficiently in parallel you need a distributed file system like S3. To coordinate and dispatch work you need a queuing service like SQS. For keeping intermediate state you need a scalable database store like SimpleDB, though you could also imagine using S3. For dynamically scaling processing nodes something like EC2 is necessary. And for actually carrying out the document search a framework like Hadoop provides a lot of features, though you can imagine using other compute grid products.

    Here's their fabulous picture of what the system looks like:


    All the parts and linkages are described in the paper. What's important to note is that even though there are a lot of independently moving parts all the boundaries are clear and well described. In your typical program few will have any idea how it works. Using Cloud Architecture principles it's possible to create a system which both scales and easy to understand and explain.

    The paper makes several key architectural recommendations:
  • Use Scalable Ingredients - Ensure that your application is scalable by designing each component to be scalable on its own. If every component implements a service interface, responsible for its own scalability in all appropriate dimensions, then the overall system will have a scalable base.
  • Have Loosely Coupled Systems - For better manageability and high-availability, make sure that your components are loosely coupled. The key is to build components without having tight dependencies between each other, so that if one component were to die (fail), sleep (not respond) or remain busy (slow to respond) for some reason, the other components in the system are built so as to continue to work as if no failure is happening.
  • Think Parallel - Implement parallelization for better use of the infrastructure and for performance. Distributing the tasks on multiple machines, multithreading your requests and effective aggregation of results obtained in parallel are some of the techniques that help exploit the infrastructure.
  • Utilize On-Demand Requisition and Relinquishment - After designing the basic functionality, ask the question “What if this fails?” Use techniques and approaches that will ensure resilience. If any component fails (and failures happen all the time), the system should automatically alert, failover, and re-sync back to the “last known state” as if nothing had failed.
  • Use Designs that Are Resilient to Reboot and Re-Launch - Don’t forget the cost factor. The key to building a cost-effective application is using on-demand resources in your design. It’s wasteful to pay for infrastructure that is sitting idle.

    All good stuff which is why I like this paper so much. There's a big conceptual shift here, especially of you are used to relatively simple client-server and N-tier systems. It's like simulating in your mind how to keep an army of ants all working independently while still communicating, coordinating, and making progress on a goal. We implemented similar architecture in datacenters long before the cloud, it was just a lot harder as everything was roll your own. The cloud makes all the necessary components standard, featureful, and relatively inexpensive. This opens any application to completley different ways of structuring their backends than they did in the past.

    Related Articles

  • SkyNet Lives! (aka EC2 @ SmugMug).
  • Flickr - Do the Essential Work Up-front and Queue the Rest
  • Hadoop
  • GridGain: One Compute Grid, Many Data Grids
  • Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures
  • Building GrepTheWeb in the Cloud, Part 2: Best Practices.

  • Tuesday
    Jun302009

    Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines 

    You might think major Internet companies have a latency, availability, and bandwidth advantage because they can afford expensive dedicated point-to-point private line networks between their data centers. And you would be right. It's a great advantage. Or it at least it was a great advantage. Cost is the great equalizer and companies are now scrambling for ways to cut costs. Many of the most recognizable Internet companies are moving to IP VPNs (Virtual Private Networks) as a much cheaper alternative to private lines. This is a strategy you can effectively use too.

    This trend has historical precedent in the data center. In the same way leading edge companies moved early to virtualize their data centers, leading edge companies are now virtualizing their networks using IP VPNs to build inexpensive private networks over a shared public network. In kindergarten we learned sharing was polite, it turns out sharing can also save a lot of money in both the data center and on the network.

    The line of reasoning for adopting IP VPNs goes something like this:

  • Major companies are saving 1/4 to 1/2 of their networking costs by moving from private lines to IP VPNs. This does not even include the benefits of lower equipment costs (GigE ports are basically free) and more flexible provisioning (any-to-any connectivity, easy bandwidth dialup).
  • Cheaper comes with a cost. Private lines are reliable. The Internet is inherently unreliable, especially when two endpoints are linked by potentially dozens of routers in between. In particular Internet connections suffer from: 1) dropped packets 2) out of order packets. Statistically this may happen for only 1% of packets, but when it does the user experience plummets. To get a feel for the impact imagine you have a 200ms latency link to Europe and you're trying to do something interactive. Lose a packet and you'll have to wait for a retransmission which will take at least 1 second. So IP VPNs can provide an order of magnitude more bandwidth for less money, but they often have less actual throughput and reliability.
  • Since latency and quality are so important to Internet companies, how can they possibly afford to use IP VPNs? They cheat. They fix the IP connection by using WAN accelerators.
  • WAN accelerators are typically thought to be mostly about caching, but they can also can trick TCP into giving a better connection even over unreliable networks. It's like wearing corrective lenses for your network. And that's what you need when dropping dedicated lines for Internet connections.
  • Relatively inexpensive WAN accelerators can turn somewhat unreliable Internet connections into a very reliable cost effective connection option. Your customers won't believe it's not butter.
  • The result: lots of money saved and a quality costumer experience.

    We take TCP for granted so to learn it has this unsightly packet loss/delay problem is a bit unsettling. But here's the impact packet loss has on throughput:
  • Latency: 100ms, Loss: 1%, Throughput: 1.2 Mbps
  • Latency: 200ms, Loss: 1%, Throughput: .6 Mbps
  • Latency: 100ms, Loss: .5%, Throughput: 1.7 Mbps

    These numbers are independent of your WAN link capacity. You could have an 100Mbps link with 1% loss and 100ms latency and you're limited to 1Mbps!

    The reason why we have this bandwidth robbing state of affairs is because when TCP was designed packet loss meant network congestion. The way to deal with congestion is to stop sending data in order to avoid congestion. This drops throughput drastically for a very long time. Over long distance WAN connections packets can be delayed which seems like a packet loss which causes congestion avoidance measures to kick in. Or maybe only a single packet was dropped and that kicks in congestion avoidance.

    The trick is convincing TCP that everything is cool so the full connection bandwidth can be used. WAN accelerators have a number of complex features to keep TCP happy. Damon Ennis, VP Product Management and Customer Support for Silver Peak, a WAN accelerator company, talks about why clouds, IP VPNs, and WAN accelerators are a perfect match:
    Moving applications into the cloud offers substantial cost savings for enterprises. Unfortunately those savings come at the cost of application performance. Often performance is so hampered that users’ productivity is severely limited. In extreme cases, users refuse to utilize the cloud-based application altogether and resort to old habits like saving files locally without centralized backup or returning to their old “thick” applications.

    The cloud limits performance because the applications must be accessed over the WAN. WANs are different from LANs in three ways – WAN bandwidth is a fraction of LAN bandwidth, WAN latency is orders of magnitude higher than LAN latency, and packet loss exists on the WAN where none existed on the LAN. Most IT professionals are familiar with the impacts of bandwidth on transfer times – a 100MB file takes approximately 1 second to transfer on a Gbps LAN and approximately 10 seconds to transfer on a 100Mbps LAN. They then extrapolate this thinking to the WAN and assume that it will take 10 seconds to transfer the same file on a 100Mbps WAN. Unfortunately, this isn’t the case. Introduce 100ms of latency and this transfer now takes almost 3 minutes. Introduce just 1 % packet loss and this transfer now takes over 10 minutes.

    There’s a calculator available that will let you figure out the effective throughput of your own WAN if you know its bandwidth, latency, and loss. Once you know your effective throughput simply divide 800Mb (100MB) by your effective throughput to determine how long it would take to transfer the same example file over your WAN.

    Latency and loss don’t just impact file transfer times, they also have a dramatic impact on any applications that need to be accessed in real-time over the WAN. In this context a real-time application is one that requires real-time response to users’ keystrokes – think of any application that is served over a thin-client infrastructure or Virtual Desktop Infrastructure (VDI). Not only is the server 100 ms away but any lost packet will result in delays of up to half a second waiting for the loss to be detected and the retransmission to occur. This is the root cause of the frustrated user banging on the enter key looking for a response.
    This all seems like a lot effort, doesn't it? Why not just dump TCP and move to a better protocol? Sounds good but everything works on TCP so to change now would be monumental. And as strange as it seems TCP is doing it's job. It's a protocol so there's a separation of what's above it from what's below it which allows innovation at the TCP level without breaking anything. And that's what layering is all about.

    The upshot is with a little planning you can take advantage of much cheaper IP VPN costs, improve latency, and maximize bandwidth usage. Just like the big guys.

    Related Articles

  • Cloud Computing Requires Infrastructure 2.0 by Gregory Ness
  • Myth of Bandwidth and Application Performance by Ameet Dhillon
  • How Does WAN Optimization Work? by Paul Rubens
  • SilverPeak Technology Overview
  • Monday
    Jun292009

    Google App Engine plus Amazon AWS: Best of both worlds

    Google App Engine (GAE) is focused on making development easy, but limits your options. Amazon Web Services is focused on making development flexible, but complicates the development process. Real enterprise applications require both of these paradigms to achieve success… What we really want is the flexibility of AWS and the simplicity of GAE.

    For the rest of the post see http://natishalom.typepad.com/nati_shaloms_blog/2009/06/google-app-engine-plus-amazon-aws-best-of-both-worlds.html

    Sunday
    Jun142009

    CLOUD & GRID EVENT BY THE ONLINE GAMING HIGH SCALABILITY SIG

    The first meeting of this Online Gaming High Scalability SIG will be on the 9th of July 2009 in central London, starting at 10 AM and finishing around 5PM.

    The main topic of this meeting will be potentials for using cloud and grid technologies in online gaming systems. In addition to experience reports from the community, we have invited some of the leading cloud experts in the UK to discuss the benefits such as resource elasticity and challenges such as storage and security that companies from other industries have experienced. We will have a track for IT managers focused on business opportunities and issues and a track for architects and developers more focused on implementation issues.

    The event is free but up-front registration is required for capacity planning, so please let us know in advance, if you are planning to attend by completing the registration form on this page

    To propose a talk or for programme enquiries, contact meetings [at] gamingscalability [dot] org.

    Note: The event is planned to finish around 5 PM so that people can make their way to Victoria on time for CloudCamp London. CloudCamp is a meeting of the cloud computing community with short talks, is also free but you will have to register for it separately

    PROGRAMME: http://skillsmatter.com/event/cloud-grid/online-gaming-high-scalability-sig/wd-99

    Friday
    May292009

    Is Eucalyptus ready to be your private cloud?


    Update:: Eucalyptus Goes Commercial with $5.5M Funding Round. This removes my objection that it's an academic project only. Go team go!

    Rich Wolski, professor of Computer Science at the University of California, Santa Barbara, gave a spirited talk on Eucalyptus to a large group of very interested cloudsters at the Eucalyptus Cloud Meetup. If Rich could teach computer science at every school the state of the computer science industry would be stratospheric. Rich is dynamic, smart, passionate, and visionary. It's that vision that prompted him to create Eucalyptus in the first place. Rich and his group are experts in grid and distributed computing, having a long and glorious history in that space. When he saw cloud computing on the rise he decided the best way to explore it was to implement what everyone accepted as a real cloud, Amazon's API. In a remarkably short time they implement Eucalyptus and have been improving it and tracking Amazon's changes ever since.

    The question I had going into the meetup was: should Eucalyptus be used to make an organization's private cloud? The short answer is no. Wait wait, it's now yes, see the update at the beginning of the article.

    The project is of high quality, the people are of the highest quality, but in the end Eucalyptus is a research project from a university. As an academic project Eucalyptus is subject to changes in funding and the research interests of the team. When funding sources dry up so does the project. If the team finds another research area more interesting, or if they get tired of chasing a continuous stream of new Amazon features, or no new grad students sign on, which will happen in a few years, then the project goes dark.

    Fears over continuity have at least two solutions: community support and commercial support. Eucalyptus could become community supported open source project. This is unlikely to happen though as it conflicts with the research intent of Eucalyptus. The Eucalyptus team plans to control the core for research purposes and encourage external development of add-on service like SQS. Eucalyptus won't go commercial as University projects must stay clear from commercial pretensions. Amazon is "no comment" on Eucalyptus so it's not clear what they would think of commercial development should it occur.

    Taken together these concerns imply Eucalyptus is not a good base for an enterprise quality private cloud. Which they readily admit. It's not enterprise ready Rich repeats. It's not that the quality isn't there. It is and will be. And some will certainly base their private cloud on Eucalyptus, but when making a decision of this type you have to be sure your cloud infrastructure will be around for the long haul. With Eucalyptus that is not necessarily the case. Eucalyptus is still a good choice for it's original research purpose, or as cheap staging platform for Amazon, or as base for temporary clouds, but as your rock solid private cloud infrastructure of the future Eucalyptus isn't the answer.

    The long answer is a little more nuanced and interesting.

    The primary purpose for Eucalyptus is research. It was never meant to be our little untethered private Amazon cloud. But if it works, why not?

    Eucalyptus is Not a Full Implementation of the Amazon Stack

    Eucalyptus implements most of EC2 and a little of S3. They hope to get community support for the rest. That of course makes Eucalyptus far less interesting as a development platform. But if your use for Eucalyptus is as an instant provisioning framework you are still in the game. Their emulation of EC2 is so good RightScale was able to operate on top of Eucalyptus. Impressive.

    But even in the EC2 arena I have to wonder for how long they'll track Amazon development. If you are a researcher implementing every new Amazon feature is going to get mighty old after a while. It will be time to move on and if you are dependent on Eucalyptus you are in trouble. Sure, you can move to Amazon but what about that $1 million data center buildout?

    Developing software not tied to the Amazon service stack then Eucalyptus would work great.

    As an Amazon developer I would want my code to work without too much trouble in both environments. Certainly you can mock the different services for testing or create a service layer to hide different implementations, but that's not ideal and makes Eucalyptus as an Amazon proxy less attractive.

    One of the uses for Eucalyptus is to make Amazon cheaper and easier by testing code locally without out having to deploy into Amazon all the time. Given the size of images the bandwidth and storage costs add up after a while, so this could make Eucalyptus a valuable part of the development process.

    Eucalyptus is Not as Scalable as Amazon

    No kidding. Amazon has an army of sysadmins, network engineers, and programmers to make their system work at such ginormous scales. Eucalyptus was built on smarts, grit and pizza. It will never scale as well as Amazon, but Eucalyptus is scalable to 256 nodes right now. Which is not bad.

    Rich thinks with some work they already know about it could scale to 5000 nodes. Not exactly Amazon scale, but good enough for many data center dreams.

    One big limit Eucalyptus has is the self-imposed requirement to work well in any environment. It's just a tarball you can install on top of any network. They rightly felt this was necessary for adoption. Saying to potential customers that you need to setup a special network before you can test this software tends to slow down adoption. By making Eucalyptus work as an overlay they soothed a lot of early adopter pain.

    But by giving up control of the machines, the OS, the disk, and the network they limited how scalable they can be. There's more to scalability than just software. Amazon has total control and that gives them power. Eucalyptus plans to make more invasive and more scalable options available in the future.

    Lacks Some Private Cloud Features

    Organizations interested in a private cloud are often interested in:

  • Control
  • Privacy and Security
  • Utility Chargeback System
  • Instant Provisioning Framework
  • Multi-tenancy
  • Temporary Infrastructure for Proof of Concept for "Real" Provisioning
  • Cloud Management Infrastructure

    Eucalyptus satisfies many of these needs, but a couple are left wanting:
  • The Utility Chargeback System allows companies to bill back departments for the resources they use and is a great way get around a rigid provisioning process and still provide accountability back to the budgeting process. Eucalyptus won't do this for you.
  • A first class Cloud Management Infrastructure is not part of Eucalyptus because it's not part of Amazon's API. Amazon doesn't expose their internal management process. Eucalyptus is adding some higher level management tools, but they'll be pretty basic.

    These features may or may not be important to you.

    Clouds vs Grids

    Endless pixels have been killed defining clouds, grids, and how they are different enough that there's really a whole new market to sell into. Rich actually makes a convincing argument that grids and clouds are different and do require a completely different infrastructure. The differences:

    Cloud

  • Full private cluster is provisioned
  • Individual user can only get a tiny fraction of the total resource pool
  • No support for cloud federation except through the client interface
  • Opaque with respect to resources

    Grid

  • Built so that individual users can get most, if not all of the resources in a single request
  • Middleware approach takes federation as a first principle
  • Resources are exposed, often as bare metal

    Related Articles

  • Get Off of My Cloud by M. Jagger and K. Richards.
  • Rich Wolski's Home Page
  • Enomaly
  • Nimbus
  • Friday
    Apr242009

    Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

    Update 4: Heroku versus GAE & GAE/J

    Update 3: Heroku has gone live!. Congratulations to the team. It's difficult right now to get a feeling for the relative cost and reliability of Heroku, but it's an impressive accomplishment and a viable option for people looking for a delivery platform.

    Update 2: Heroku Architecture. A great interactive presentation of the Heroku stack. Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache. Then requests are injected into an Erlang based routing mesh that balances requests across a grid of dynos. Dynos are your application "VMs" that implement application specific behaviors. Dynos themselves are a stack of: POSIX, Ruby VM, App Server, Rack, Middleware, Framework, Your App. Applications can access PostgreSQL. Memcached is used as an application caching layer.

    Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.


    Adam Wiggins of Heroku presented at the lollapalooza that was the Cloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.

    For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...

    I must confess to having an inherent love of Heroku's idea because I had a similar notion many moons ago, but the trendy language of the time was Perl instead of Rails. At the time though it just didn't make sense. The economics of creating your own "cloud" for such a different model wasn't there. It's amazing the niches utility computing will seed, fertilize, and help grow. Even today when using Eclipse I really wish it was hosted in the cloud and I didn't have to deal with all its deployment headaches. Firefox based interfaces are pretty impressive these days. Why not?

    Adam views their stack as:
    1. Developer Tools
    2. Application Management
    3. Cluster Management
    4. Elastic Compute Cloud

    At the top level developers see a control panel that lets them edit code, deploy code, interact with the database, see logs, and so on. Your website is live from the first moment you start writing code. It's a powerful feeling to write normal code, see it run immediately, and know it will scale without further effort on your part. Now, will you be able toss your Facebook app into the Heroku engine and immediately handle a deluge of 500 million hits a month? It will be interesting to see how far a generic scaling model can go without special tweaking by a certified scaling professional. Elastra has the same sort of issue.

    Underneath Heroku makes sure all the software components work together in Lennon-McCartney style harmony. They take care (or will take care of) starting and stopping VMs, deploying to those VMs, billing, load balancing, scaling, storage, upgrades, failover, etc. The dynamic nature of Ruby and the development and deployment infrastructure of Rails is what makes this type of hosting possible. You don't have to worry about builds. There's a great infrastructure for installing packages and plugins. And the big hard one of database upgrades is tackled with the new migrations feature.

    A major issue in the Rails world is versioning. Given the precambrian explosion of Rails tools, how does Heroku make sure all the various versions of everything work together? Heroku sees this as their big value add. They are in charge of making sure everything works together. We see a lot companies on the web taking on the role of curator ([1], [2], [3]). A curator is a guardian or an overseer. Of curators Steve Rubel says: They acquire pieces that fit within the tone, direction and - above all - the purpose of the institution. They travel the corners of the world looking for "finds." Then, once located, clean them up and make sure they are presentable and offer the patron a high quality experience. That's the role Heroku will play for their deployable Rails environment.

    With great automated power comes great restrictions. And great opportunity. Curating has a cost for developers: flexibility. The database they support is Postgres. Out of luck if you wan't MySQL. Want a different Ruby version or Rails version? Not if they don't support it. Want memcache? You just can't add it yourself. One forum poster wanted, for example, to use the command line version of ImageMagick but was told it wasn't installed and use RMagick instead. Not the end of the world. And this sort of curating has to be done to keep a happy and healthy environment running, but it is something to be aware of.

    The upside of curation is stuff will work. And we all know how hard it can be to get stuff to work. When I see an EC2 AMI that already has most of what I need my heart goes pitter patter over the headaches I'll save because someone already did the heavy curation for me. A lot of the value in services like rPath offers, for example, is in curation. rPath helps you build images that work, that can be deployed automatically, and can be easily upgraded. It can take a big load off your shoulders.

    There's a lot of competition for Heroku. Mosso has a hosting system that can do much of what Heroku wants to do. It can automatically scale up at the webserver, data, and storage tiers. It supports a variery of frameworks, including Rails. And Mosso also says all you have to do is load and go.

    3Tera is another competitor. As one user said: It lets you visually (through a web ui) create "applications" based on "appliances". There is a standard portfolio of prebuilt applications (SugarCRM, etc.) and templates for LAMP, etc. So, we build our application by taking a firewall appliance, a CentOS appliance, a gateway, a MySql appliance, glue them together, customize them, and then create our own template. You can specify down to the appliance level, the amount of cpu, memory, disk, and bandwidth each are assigned which let's you scale up your capacity simply by tweaking values through the UI. We can now deploy our Rails/Java hosted offering for new customers in about 20 minutes on our grid. AppLogic has automatic failover so that if anything goes wrong, it reploys your application to a new node in your grid and restarts it. It's not as cheap as EC2, but much more powerful. True, 3Tera won't help with your application directly, but most of the hard bits are handled.

    RightScale is another company that combines curation along with load balancing, scaling, failover, and system management.

    What differentiates Heroku is their web based IDE that allows you to focus solely on the application and ignore the details. Though now that they have a command line based interface as well, it's not as clear how they will differentiate themselves from other offerings.

    The hosting model has a possible downside if you want to do something other than straight web hosting. Let's say you want your system to insert commercials into podcasts. That sort of large scale batch logic doesn't cleanly fit into the hosting model. A separate service accessed via something like a REST interface needs to be created. Possibly double the work. Mosso suffers from this same concern. But maybe leaving the web front end to Heroku is exactly what you want to do. That would leave you to concentrate on the back end service without worrying about the web tier. That's a good approach too.

    Heroku is just getting started so everything isn't in place yet. They've been working on how to scale their own infrastructure. Next is working on scaling user applications beyond starting and stopping mongrels based on load. They aren't doing any vertical scaling of the database yet. They plan on memcaching reads, implementing read-only slaves via Slony, and using the automatic partitioning features built into Postgres 8.3. The idea is to start a little smaller with them now and grow as they grow. By the time you need to scale bigger they should have the infrastructure in place.

    One concern is that pricing isn't nailed down yet, but my gut says it will be fair. It's not clear how you will transfer an existing database over, especially from a non-Postgres database. And if you use the web IDE I wonder how you will normal project stuff like continuous integration, upgrades, branching, release tracking, and bug tracking? Certainly a lot of work to do and a lot of details to work out, but I am sure it's nothing they can't handle.

    Related Articles

  • Heroku Rails Podcast
  • Heroku Open Source Plugins etc
  • Page 1 ... 2 3 4 5 6 ... 7 Next 10 Entries »