High Scalability -

Permalink |

Infinispan,

JBoss Cache,

cache,

data-grid,

gigaspaces

Friday

Sep112009

The interactive cloud

Friday, September 11, 2009 at 7:24PM

How many times have you been called in the middle of the night by your operation guys telling you that your application throws some odd red alerts? How many times did you found out that when those issues happens you don't have enough information to analyze this incident? have you tried to increase the log level just to find out that your problem became even worse - now your application throws tons of information in a continues basis most of which is complete garbage...

The current separation between the way we implement our application and the way we manage it leads to many of this ridicules situations. Cloud makes those things even worse.

In this post i suggest an alternative approach. Why don't we run our application the way we run our business? I refer to this approach as the "interactive cloud" where our application behaves just like our project team and the operations just like our managers. As with our business our application would need to take more responsibility to the way it runs and take corrective actions such as balancing it own resources, re-assign tasks to the available resources in case of failure etc. It will need to involve its manager only when it runs out of resource. It will need to provide reports in a way that makes sense to our managers.

In the first part of this post describes the general concept behind this model and the second part provides technical background which include code snippet based on our experience in GigaSpaces.

natis |

Permalink |

Appliction monitoring,

datacenter,

gigaspaces,

operations

Monday

Aug312009

Scaling MySQL on Amazon Web Services

Monday, August 31, 2009 at 7:57AM

I've recently started working with a large company who is looking to take one of their heavily utilized applications and move it to Amazon Web Services. I'm not looking to start a debate on the merits of EC2, the decision to move to aws is already made (and is a much better decision than paying a vendor millions to host it).

I've done my reasearch and I'm comfortable with creating this environment with one exception, scaling MySQL. I havent done much work with MySQL, i'm more of an Oracle guy up to now. I'm struggling to determine a way to scale MySQL on the fly in a way so that replication works, the server takes its proper place in line for master candidacy, and the apache servers become aware of it.

So this is really three questions:

1. What are some proven methods of load balancing the read traffic going from apache to MySQL.
2. How do I let the load balancing mechanism know when I scale up / down a new Mysql Server?
3. How to alert the master of the new server and initiate replication in an automated environment?

Personally, I dont like the idea of scaling the databases, but the traffic increases exponentially for three hours a day, and then plummets to almost nothing. So this would provide a significant cost savings.

The only way I've read to manage this sort of scaling I read here on slides 18-25:
http://assets.en.oreilly.com/1/event/21/Tricks%20and%20Tradeoffs%20of%20Deploying%20MySQL%20Clusters%20in%20the%20Cloud%20Presentation
Has anyone tried this method and either had success or have scripts available to do this? I try not to remake the wheel when I dont have to. Thanks in advance.

paulusdd |

1 Comment |

Permalink |

AWS,

EC2,

General Discussion,

MySQL,

MySQL Proxy,

cloud

Saturday

Aug082009

1dbase vs. many and cloud hosting vs. dedicated server(s)?

Saturday, August 8, 2009 at 9:22AM

Me and my partner are making a blueprint for an online webshop service. The purpose of this project is to make webshops available for small company's/ individuals automatically just by creating an account with us. Our webapp can be used to add products/pages/... to the store and we'll handle secure checkout by paypal.

Our app should be scalable and manageable. Because we also want to offer free webshops, the amount of webshops could be +10.000 within a few years. We are building on the Zend framework and are using mysql for database.

From the start we want to build our application for optimal and easy scalability in the future, to avoid a lot changes to our app/database in the future.

Now our questions are:

Should we use?:
* one database for all shops (or limited to X shops );
* one database for each new shop (each having products, orders... tables);

I think both approaches have PRO/CONS. What do you think ? Does anyone has experience with this kind of structure ?

PRO:
one database: easier to make changes to database layout
multiple databases:more scalable, easier to backup/restore

CON:
one database: harder to code because of extra keyfields in tables , slower or more difficult to backup/restore.
multiple databases: Takes longer to push changes to database layout

We are totally not clear on what hosting we should use. Would it be a solution to use a cloud service such as mosso/amazon/gogrid? Or is it better to just start with one dedicated server and expand 'manually' later?

Thanks in advance for your help!

jorre |

6 Comments |

Permalink |

SkyNet Lives! (aka EC2 @ SmugMug).

Friday

Aug072009

The Canonical Cloud Architecture

Friday, August 7, 2009 at 12:06AM

Update 2: Elastic Load Balancer and EC2 instance bandwidth. It turns out we are limited by bandwidth and not by CPU. Solution: use DNS Round Robin for two to three HighCPU medium instances.
Update: The Skinny Straw: Cloud Computing's Bottleneck and How to Address It. For cloud computing, bandwidth to and from the cloud provider is a bottleneck. Solution: Evaluate application architecture and consider application partitioning.

I'm writing this post as a sort of penance. My sin was getting involved in another mutli-threaded mess of a program that was rife with strange pauses and unexpected errors. I really should have known better. But when APIs choose to make callbacks from some mystery thread pool it's hard to keep things straight. I eventually sobered up and posted all events to a queue so I could make sure the program would work correctly. Doh. I may never know why the .Net console output stopped working, but I'll live with it.

And that reminded me that I've been meaning to write a post on the standard Cloud Architecture. I've tried to hit all the common architectures at one time or another, but there have been some excellent sources lately on structuring programs in a cloud that people may "know" in the same way I knew what not to do, but when the code hits the editor those thoughts may have hidden like a kid next to a broken cookie jar.

The easiest way to create a scalable service is to compose the service from other scalable services. This is how Google AppEngine works and is largely how AWS works as well (EC2, S3, SQS, SimpleDB, etc), though AWS also functions as a blank canvas on which you can draw your own designs.

The canonical cloud architecture that has evolved revolves around dynamically scalable CPUs consuming asynchronous, persistently queued events. We talked about this idea already in Flickr - Do the Essential Work Up-front and Queue the Rest. The cloud is just another way of implementing the same idea.

Amazon suggests a few applications of the Cloud Architecture as:

Processing Pipelines
- Document processing pipelines – convert hundreds of thousands of documents from Microsoft Word to PDF, OCR millions of pages/images into raw searchable text
- Image processing pipelines – create thumbnails or low resolution variants of an image, resize millions of images
- Video transcoding pipelines – transcode AVI to MPEG movies
- Indexing – create an index of web crawl data
- Data mining – perform search over millions of records

Batch Processing Systems
- Back-office applications (in financial, insurance or retail sectors)
- Log analysis – analyze and generate daily/weekly reports
- Nightly builds – perform nightly automated builds of source code repository every night in parallel
- Automated Unit Testing and Deployment Testing – Test and deploy and perform automated unit testing (functional, load, quality) on different deployment configurations every night

Websites
- Websites that “sleep” at night and auto-scale during the day
- Instant Websites – websites for conferences or events (Super Bowl, sports tournaments)
- Promotion websites
- “Seasonal Websites” - websites that only run during the tax season or the holiday season (“Black Friday” or Christmas)

A good list, but after having worked on a seasonal website for taxes AWS is a horrible match. AWS only works on the instance level, so you need a whole instance turned on all the time even when there's no demand. This is a complete waste of money. An AWS model truly based on use combined with an SLA driven dashboard would be very convenient. But on to cases where AWS is a good fit.

SmugMug's Cloud Architecture

AWS pioneer Don MacAskill of SmugMug details how they process high-resolution photos and high-definition video use a cloud hosted queuing architecture in SkyNet Lives! (aka EC2 @ SmugMug).

SkyNet, as you might expect, operates completely without human minders and automatically scales up and down in relation to the work load. Their system has several components:

Work Initiators - Work comes in from your website and/or other software subsystems and is queued up for processing in the Queue Service. Work doesn't have to be large requests either. Work can be small independent parts of an overall pipeline. Don't keep state in the Workers. Bundle what you need done into a work request in shoot back into the Queuing Service for processing.

Provisioning Service - This is Amazon's infrastructure that allows instances to be automatically scaled up and down in relation to the work load. This will be the major difference between your VPS or typical datacenter setup. There's an API for starting and stopping AMIs and
mechanisms for automatically configuring and running VMs.

Workers - These are the guys that continually pull work off queues and do something interesting with it. For SmugMug the results are stored on S3 but the results could be put in your own database, SimpleDB or whatever.

Queuing Service - This is where work is queued for consumption by the workers. SmugMug built their own queuing service, but you could just as easily use Amazon's own SQS. Creating a scalable, distributed, performant, highly available queue service is not easy, so you may want to take a look at a number of different queue product suggestions in Flickr - Do the Essential Work Up-front and Queue the Rest.

Controller - This component monitors many variables related to the work flow and decides how many instances of EC2 are necessary based on optimizing a small set of goals. Instances are add and removed as needed.

Don shares a lot of practical detailia on how to efficiently use AWS, how their queue service works, and how their controller manages to balances minimizing cost while still being responsive to users. Achieving fairness and balance in a queue system can be difficult, but SmugMug appears to have done a good job of that.

What rocks about queuing architectures is that they are just so damn robust. Work is safe in the queues. A random reboot won't cause a loss. If one component is producing events too fast the queue will buffer up events until they can be processed. New components can be cleanly added and removed from the system at any time. Timing isn't critical. Work is processed when someone gets around to it. Timeouts and retries are unnecessary. Programs are simple loops that block on the queue, do something, persist results, and feed back more parallelizable work requests back into the queue. Very hard to screw up. Compare and contrast to complex multi-threaded system with shared-state.

Building GrepTheWeb in the Cloud

Amazon has published a great couple of articles on building a canonical Cloud Architecture: Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures and Building GrepTheWeb in the Cloud, Part 2: Best Practices.

These are really tight and well written articles so I'll just hit certain high points. The example used is an application called GrepTheWeb. GrepTheWeb searches using a regular expression across millions of web documents. So it's a grep for the web, ah got it now. The idea is to take an unpredictable but possibly large number of search requests, apply the search expression to hundreds of terabytes of documents, and return the results in a reasonable period of time.

How exactly would you do such a thing? Here's how you do it in the cloud:

Amazon S3 for retrieving input datasets and for storing the output dataset

Amazon SQS for durably buffering requests acting as a “glue” between controllers

Amazon SimpleDB for storing intermediate status, log, and for user data about tasks

Amazon EC2 for running a large distributed processing Hadoop cluster on-demand

Hadoop for distributed processing, automatic parallelization, and job scheduling

Clearly these are all (except for Hadoop) built on Amazon services, but the general ideas apply anywhere. For storing large amounts of data and accessing it efficiently in parallel you need a distributed file system like S3. To coordinate and dispatch work you need a queuing service like SQS. For keeping intermediate state you need a scalable database store like SimpleDB, though you could also imagine using S3. For dynamically scaling processing nodes something like EC2 is necessary. And for actually carrying out the document search a framework like Hadoop provides a lot of features, though you can imagine using other compute grid products.

Here's their fabulous picture of what the system looks like:

All the parts and linkages are described in the paper. What's important to note is that even though there are a lot of independently moving parts all the boundaries are clear and well described. In your typical program few will have any idea how it works. Using Cloud Architecture principles it's possible to create a system which both scales and easy to understand and explain.

The paper makes several key architectural recommendations:

Use Scalable Ingredients - Ensure that your application is scalable by designing each component to be scalable on its own. If every component implements a service interface, responsible for its own scalability in all appropriate dimensions, then the overall system will have a scalable base.

Have Loosely Coupled Systems - For better manageability and high-availability, make sure that your components are loosely coupled. The key is to build components without having tight dependencies between each other, so that if one component were to die (fail), sleep (not respond) or remain busy (slow to respond) for some reason, the other components in the system are built so as to continue to work as if no failure is happening.

Think Parallel - Implement parallelization for better use of the infrastructure and for performance. Distributing the tasks on multiple machines, multithreading your requests and effective aggregation of results obtained in parallel are some of the techniques that help exploit the infrastructure.

Utilize On-Demand Requisition and Relinquishment - After designing the basic functionality, ask the question “What if this fails?” Use techniques and approaches that will ensure resilience. If any component fails (and failures happen all the time), the system should automatically alert, failover, and re-sync back to the “last known state” as if nothing had failed.

Use Designs that Are Resilient to Reboot and Re-Launch - Don’t forget the cost factor. The key to building a cost-effective application is using on-demand resources in your design. It’s wasteful to pay for infrastructure that is sitting idle.

All good stuff which is why I like this paper so much. There's a big conceptual shift here, especially of you are used to relatively simple client-server and N-tier systems. It's like simulating in your mind how to keep an army of ants all working independently while still communicating, coordinating, and making progress on a goal. We implemented similar architecture in datacenters long before the cloud, it was just a lot harder as everything was roll your own. The cloud makes all the necessary components standard, featureful, and relatively inexpensive. This opens any application to completley different ways of structuring their backends than they did in the past.

Flickr - Do the Essential Work Up-front and Queue the Rest

Hadoop

GridGain: One Compute Grid, Many Data Grids

Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures

Building GrepTheWeb in the Cloud, Part 2: Best Practices.

Todd Hoff |

6 Comments |

Permalink |

SilverPeak Technology Overview

Example,

cloud

Tuesday

Jun302009

Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines

Tuesday, June 30, 2009 at 4:38AM

You might think major Internet companies have a latency, availability, and bandwidth advantage because they can afford expensive dedicated point-to-point private line networks between their data centers. And you would be right. It's a great advantage. Or it at least it was a great advantage. Cost is the great equalizer and companies are now scrambling for ways to cut costs. Many of the most recognizable Internet companies are moving to IP VPNs (Virtual Private Networks) as a much cheaper alternative to private lines. This is a strategy you can effectively use too.

This trend has historical precedent in the data center. In the same way leading edge companies moved early to virtualize their data centers, leading edge companies are now virtualizing their networks using IP VPNs to build inexpensive private networks over a shared public network. In kindergarten we learned sharing was polite, it turns out sharing can also save a lot of money in both the data center and on the network.

The line of reasoning for adopting IP VPNs goes something like this:

Major companies are saving 1/4 to 1/2 of their networking costs by moving from private lines to IP VPNs. This does not even include the benefits of lower equipment costs (GigE ports are basically free) and more flexible provisioning (any-to-any connectivity, easy bandwidth dialup).

Cheaper comes with a cost. Private lines are reliable. The Internet is inherently unreliable, especially when two endpoints are linked by potentially dozens of routers in between. In particular Internet connections suffer from: 1) dropped packets 2) out of order packets. Statistically this may happen for only 1% of packets, but when it does the user experience plummets. To get a feel for the impact imagine you have a 200ms latency link to Europe and you're trying to do something interactive. Lose a packet and you'll have to wait for a retransmission which will take at least 1 second. So IP VPNs can provide an order of magnitude more bandwidth for less money, but they often have less actual throughput and reliability.

Since latency and quality are so important to Internet companies, how can they possibly afford to use IP VPNs? They cheat. They fix the IP connection by using WAN accelerators.

WAN accelerators are typically thought to be mostly about caching, but they can also can trick TCP into giving a better connection even over unreliable networks. It's like wearing corrective lenses for your network. And that's what you need when dropping dedicated lines for Internet connections.

Relatively inexpensive WAN accelerators can turn somewhat unreliable Internet connections into a very reliable cost effective connection option. Your customers won't believe it's not butter.

The result: lots of money saved and a quality costumer experience.

We take TCP for granted so to learn it has this unsightly packet loss/delay problem is a bit unsettling. But here's the impact packet loss has on throughput:

Latency: 100ms, Loss: 1%, Throughput: 1.2 Mbps

Latency: 200ms, Loss: 1%, Throughput: .6 Mbps

Latency: 100ms, Loss: .5%, Throughput: 1.7 Mbps

These numbers are independent of your WAN link capacity. You could have an 100Mbps link with 1% loss and 100ms latency and you're limited to 1Mbps!

The reason why we have this bandwidth robbing state of affairs is because when TCP was designed packet loss meant network congestion. The way to deal with congestion is to stop sending data in order to avoid congestion. This drops throughput drastically for a very long time. Over long distance WAN connections packets can be delayed which seems like a packet loss which causes congestion avoidance measures to kick in. Or maybe only a single packet was dropped and that kicks in congestion avoidance.

The trick is convincing TCP that everything is cool so the full connection bandwidth can be used. WAN accelerators have a number of complex features to keep TCP happy. Damon Ennis, VP Product Management and Customer Support for Silver Peak, a WAN accelerator company, talks about why clouds, IP VPNs, and WAN accelerators are a perfect match:

Moving applications into the cloud offers substantial cost savings for enterprises. Unfortunately those savings come at the cost of application performance. Often performance is so hampered that users’ productivity is severely limited. In extreme cases, users refuse to utilize the cloud-based application altogether and resort to old habits like saving files locally without centralized backup or returning to their old “thick” applications.

The cloud limits performance because the applications must be accessed over the WAN. WANs are different from LANs in three ways – WAN bandwidth is a fraction of LAN bandwidth, WAN latency is orders of magnitude higher than LAN latency, and packet loss exists on the WAN where none existed on the LAN. Most IT professionals are familiar with the impacts of bandwidth on transfer times – a 100MB file takes approximately 1 second to transfer on a Gbps LAN and approximately 10 seconds to transfer on a 100Mbps LAN. They then extrapolate this thinking to the WAN and assume that it will take 10 seconds to transfer the same file on a 100Mbps WAN. Unfortunately, this isn’t the case. Introduce 100ms of latency and this transfer now takes almost 3 minutes. Introduce just 1 % packet loss and this transfer now takes over 10 minutes.

There’s a calculator available that will let you figure out the effective throughput of your own WAN if you know its bandwidth, latency, and loss. Once you know your effective throughput simply divide 800Mb (100MB) by your effective throughput to determine how long it would take to transfer the same example file over your WAN.

Latency and loss don’t just impact file transfer times, they also have a dramatic impact on any applications that need to be accessed in real-time over the WAN. In this context a real-time application is one that requires real-time response to users’ keystrokes – think of any application that is served over a thin-client infrastructure or Virtual Desktop Infrastructure (VDI). Not only is the server 100 ms away but any lost packet will result in delays of up to half a second waiting for the loss to be detected and the retransmission to occur. This is the root cause of the frustrated user banging on the enter key looking for a response.

This all seems like a lot effort, doesn't it? Why not just dump TCP and move to a better protocol? Sounds good but everything works on TCP so to change now would be monumental. And as strange as it seems TCP is doing it's job. It's a protocol so there's a separation of what's above it from what's below it which allows innovation at the TCP level without breaking anything. And that's what layering is all about.

The upshot is with a little planning you can take advantage of much cheaper IP VPN costs, improve latency, and maximize bandwidth usage. Just like the big guys.

Cloud Computing Requires Infrastructure 2.0 by Gregory Ness

Myth of Bandwidth and Application Performance by Ameet Dhillon

How Does WAN Optimization Work? by Paul Rubens

Todd Hoff |

5 Comments |

Permalink |

Strategy,

cloud

Monday

Jun292009

Google App Engine plus Amazon AWS: Best of both worlds

Monday, June 29, 2009 at 7:02AM

Google App Engine (GAE) is focused on making development easy, but limits your options. Amazon Web Services is focused on making development flexible, but complicates the development process. Real enterprise applications require both of these paradigms to achieve success… What we really want is the flexibility of AWS and the simplicity of GAE.

For the rest of the post see http://natishalom.typepad.com/nati_shaloms_blog/2009/06/google-app-engine-plus-amazon-aws-best-of-both-worlds.html

natis |

Permalink |

GAE,

Java,

SAAS,

amazon,

gigaspaces,

paas

Sunday

Jun142009

CLOUD & GRID EVENT BY THE ONLINE GAMING HIGH SCALABILITY SIG

Sunday, June 14, 2009 at 9:29PM

The first meeting of this Online Gaming High Scalability SIG will be on the 9th of July 2009 in central London, starting at 10 AM and finishing around 5PM.

The main topic of this meeting will be potentials for using cloud and grid technologies in online gaming systems. In addition to experience reports from the community, we have invited some of the leading cloud experts in the UK to discuss the benefits such as resource elasticity and challenges such as storage and security that companies from other industries have experienced. We will have a track for IT managers focused on business opportunities and issues and a track for architects and developers more focused on implementation issues.

The event is free but up-front registration is required for capacity planning, so please let us know in advance, if you are planning to attend by completing the registration form on this page

To propose a talk or for programme enquiries, contact meetings [at] gamingscalability [dot] org.

Note: The event is planned to finish around 5 PM so that people can make their way to Victoria on time for CloudCamp London. CloudCamp is a meeting of the cloud computing community with short talks, is also free but you will have to register for it separately

PROGRAMME: http://skillsmatter.com/event/cloud-grid/online-gaming-high-scalability-sig/wd-99

wdevolder |

Permalink |

Event,

Grid,

gaming,

high-scalability,

london

Friday

May292009

Is Eucalyptus ready to be your private cloud?

Friday, May 29, 2009 at 12:48AM

Update:: Eucalyptus Goes Commercial with $5.5M Funding Round. This removes my objection that it's an academic project only. Go team go!

Rich Wolski, professor of Computer Science at the University of California, Santa Barbara, gave a spirited talk on Eucalyptus to a large group of very interested cloudsters at the Eucalyptus Cloud Meetup. If Rich could teach computer science at every school the state of the computer science industry would be stratospheric. Rich is dynamic, smart, passionate, and visionary. It's that vision that prompted him to create Eucalyptus in the first place. Rich and his group are experts in grid and distributed computing, having a long and glorious history in that space. When he saw cloud computing on the rise he decided the best way to explore it was to implement what everyone accepted as a real cloud, Amazon's API. In a remarkably short time they implement Eucalyptus and have been improving it and tracking Amazon's changes ever since.

The question I had going into the meetup was: should Eucalyptus be used to make an organization's private cloud? The short answer is no. Wait wait, it's now yes, see the update at the beginning of the article.

The project is of high quality, the people are of the highest quality, but in the end Eucalyptus is a research project from a university. As an academic project Eucalyptus is subject to changes in funding and the research interests of the team. When funding sources dry up so does the project. If the team finds another research area more interesting, or if they get tired of chasing a continuous stream of new Amazon features, or no new grad students sign on, which will happen in a few years, then the project goes dark.

Fears over continuity have at least two solutions: community support and commercial support. Eucalyptus could become community supported open source project. This is unlikely to happen though as it conflicts with the research intent of Eucalyptus. The Eucalyptus team plans to control the core for research purposes and encourage external development of add-on service like SQS. Eucalyptus won't go commercial as University projects must stay clear from commercial pretensions. Amazon is "no comment" on Eucalyptus so it's not clear what they would think of commercial development should it occur.

Taken together these concerns imply Eucalyptus is not a good base for an enterprise quality private cloud. Which they readily admit. It's not enterprise ready Rich repeats. It's not that the quality isn't there. It is and will be. And some will certainly base their private cloud on Eucalyptus, but when making a decision of this type you have to be sure your cloud infrastructure will be around for the long haul. With Eucalyptus that is not necessarily the case. Eucalyptus is still a good choice for it's original research purpose, or as cheap staging platform for Amazon, or as base for temporary clouds, but as your rock solid private cloud infrastructure of the future Eucalyptus isn't the answer.

The long answer is a little more nuanced and interesting.

The primary purpose for Eucalyptus is research. It was never meant to be our little untethered private Amazon cloud. But if it works, why not?

Eucalyptus is Not a Full Implementation of the Amazon Stack

Eucalyptus implements most of EC2 and a little of S3. They hope to get community support for the rest. That of course makes Eucalyptus far less interesting as a development platform. But if your use for Eucalyptus is as an instant provisioning framework you are still in the game. Their emulation of EC2 is so good RightScale was able to operate on top of Eucalyptus. Impressive.

But even in the EC2 arena I have to wonder for how long they'll track Amazon development. If you are a researcher implementing every new Amazon feature is going to get mighty old after a while. It will be time to move on and if you are dependent on Eucalyptus you are in trouble. Sure, you can move to Amazon but what about that $1 million data center buildout?

Developing software not tied to the Amazon service stack then Eucalyptus would work great.

As an Amazon developer I would want my code to work without too much trouble in both environments. Certainly you can mock the different services for testing or create a service layer to hide different implementations, but that's not ideal and makes Eucalyptus as an Amazon proxy less attractive.

One of the uses for Eucalyptus is to make Amazon cheaper and easier by testing code locally without out having to deploy into Amazon all the time. Given the size of images the bandwidth and storage costs add up after a while, so this could make Eucalyptus a valuable part of the development process.

Eucalyptus is Not as Scalable as Amazon

No kidding. Amazon has an army of sysadmins, network engineers, and programmers to make their system work at such ginormous scales. Eucalyptus was built on smarts, grit and pizza. It will never scale as well as Amazon, but Eucalyptus is scalable to 256 nodes right now. Which is not bad.

Rich thinks with some work they already know about it could scale to 5000 nodes. Not exactly Amazon scale, but good enough for many data center dreams.

One big limit Eucalyptus has is the self-imposed requirement to work well in any environment. It's just a tarball you can install on top of any network. They rightly felt this was necessary for adoption. Saying to potential customers that you need to setup a special network before you can test this software tends to slow down adoption. By making Eucalyptus work as an overlay they soothed a lot of early adopter pain.

But by giving up control of the machines, the OS, the disk, and the network they limited how scalable they can be. There's more to scalability than just software. Amazon has total control and that gives them power. Eucalyptus plans to make more invasive and more scalable options available in the future.

Lacks Some Private Cloud Features

Organizations interested in a private cloud are often interested in:

Control

Privacy and Security

Utility Chargeback System

Instant Provisioning Framework

Multi-tenancy

Temporary Infrastructure for Proof of Concept for "Real" Provisioning

Cloud Management Infrastructure

Eucalyptus satisfies many of these needs, but a couple are left wanting:

The Utility Chargeback System allows companies to bill back departments for the resources they use and is a great way get around a rigid provisioning process and still provide accountability back to the budgeting process. Eucalyptus won't do this for you.

A first class Cloud Management Infrastructure is not part of Eucalyptus because it's not part of Amazon's API. Amazon doesn't expose their internal management process. Eucalyptus is adding some higher level management tools, but they'll be pretty basic.

These features may or may not be important to you.

Clouds vs Grids

Endless pixels have been killed defining clouds, grids, and how they are different enough that there's really a whole new market to sell into. Rich actually makes a convincing argument that grids and clouds are different and do require a completely different infrastructure. The differences:

Cloud

Full private cluster is provisioned

Individual user can only get a tiny fraction of the total resource pool

No support for cloud federation except through the client interface

Opaque with respect to resources

Grid

Built so that individual users can get most, if not all of the resources in a single request

Middleware approach takes federation as a first principle

Resources are exposed, often as bare metal

Get Off of My Cloud by M. Jagger and K. Richards.

Rich Wolski's Home Page

Enomaly

Nimbus

Todd Hoff |

3 Comments |

Permalink |