« MySQL clustering strategies and comparisions | Main | Golden rule of web caching »
Friday
Dec282007

Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half

Update 2: Summize Computes Computing Resources for a Startup. Lots of nice graphs showing Amazon is hard to beat for small machines and become less cost efficient for well used larger machines. Long term storage costs may eat your saving away. And out of cloud bandwidth costs are high.
Update: via ProductionScale, a nice Digital Web article on how to setup S3 to store media files and how Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day for very little $$$. Also a Right Scale article on Network performance within Amazon EC2 and to Amazon S3. 75MB/s between EC2 instances, 10.2MB/s between EC2 and S3 for download, 6.9MB/s upload.

Now that Amazon's S3 (storage service) is out of beta and EC2 (elastic compute cloud) has added new instance types (the class of machine you can rent) with more CPU and more RAM, I thought it would be interesting to take a look out how their pricing stacks up.

The quick conclusion:the more you scale the more you save. A six node configuration in Amazon is about half the cost of a similar setup using a service provider. But cost may not be everything...

EC2 gets a lot of positive pub, so if you would like a few other perspectives take a look at Jason Hoffman of Joyent's blog post on Why EC2 isn't yet a platform for "normal" web applications and Hostingfu's
Short Comings of Amazon EC2
. Both are well worth reading and tell a much needed cautionary tale.

The upshot is batch operations clearly work well within EC2 and S3 (storage service), but the jury is still out on deploying large database centric websites completely within EC2. The important sticky issues seem to be: static IP addresses, load balancing/fail over, lack of data center redundancy, lack of custom OS building, and problematic persistent block storage for databases. A lack of large RAM and CPU machines has been solved with the new instance types.

Assuming you are OK with all these issues, will EC2 cost less? Cost isn't the only issue of course. If dynamically scaling VMs is a key feature, SQS (message queue service) looks attractive, or S3's endless storage are critical, then weight accordingly.

My two use cases are my VPS, for selfish reasons, and a quote from a leading service provider for a 6 node setup for a startup. Six nodes is small, but since the architecture featured horizontal scaling, the cost of expanding was pretty linear and incremental.

Here's a quick summary of Amazon's pricing:
  • Data transfer: The prices are $0.10 per GB - all data transfer in, $0.18 per GB - first 10 TB / month data transfer out
    $0.16 per GB - next 40 TB / month data transfer out, $0.13 per GB - data transfer out / month over 50 TB. You don't pay for data transfer between EC2 and S3, so that's an advantage of using S3 within EC2.
  • S3: $0.15 per GB-Month, $0.01 per 1,000 PUT or LIST requests, $0.01 per 10,000 GET and all other requests. I have no idea how many requests I would use.
  • Small Instance at 10 cents/hour:1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform. The CPU capacity is that of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
  • Large Instance at 40 cents/hour: 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform.
  • Extra Large Instance at 80 cents/hour: 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform.

    You don't have to run these numbers by hand. To calculate the Amazon costs I used their handy dandy calculator. When performing calculations, per Amazon, I used 732 hours per month.

    Single VPS Configuration



    I was very curious about the economics of moving this simple site(http://highscalability.com) from a single managed VPS to EC2.

    Currently my plan provides:
  • 1GB RAM (with no room for expansion).
  • 50 GB of storage. I use about 4 GB.
  • 800 GB montly transfer. Of which I use 1GB/month in and 10GB/month out.
  • 8 IP addresses. Very nice for virtual hosts.
  • 100Mbps uplink speed.
  • Very responsive support. Very poor system monitoring.
  • 1 VM backup image. Would prefer two.
  • The CPU usage is hard to characterize, but it's been more than sufficient for my needs.
  • Cost: $105 per month.

    From Amazon:
  • The small instance looks good to me. What I need is more memory, not more CPU, so that's attractive. VPS memory pricing is painfully high.
  • 10 GB storage, 1 GB transfer in, 10 GB transfer out, 2000 requests.
  • Cost: about $80 per month

    Will I switch? Probably not. I don't know how well Drupal runs in EC2/S3 and it's not really worth it for me to find out. Drupal isn't horizontally scalable so that feature of EC2 holds little attraction. But the extra RAM and affordable disk storage are attractive.

    So for very small deployments using your standard off the shelf software, there's no compelling reason to switch to EC2.

    Six Node Configuration for Startup



    This configuration is targeted at a real-life Web 2.0ish startup needing about 300GB of fast, highly available database storage. Currently there are no requirements for storing large quantities of BLOBs. There are 6 nodes overall: two database servers in failover configuration, two load balanced web servers, and two application servers.

    From the service provider:
  • Two database servers are $1500/month total for dual processor quad core Xeon 5310, 4GB of RAM, and 4x 300GB 15K SCSI HDD in RAID 10 configuration, with 5 IP addresses. 10 Mbps Public & Private Networks. Public Bandwidth 2000 GB Bandwidth.
  • The other 4 servers have 2GB RAM each, single Quad Core Xeon 5310, 1 X 73GB SAS 10k RPM, for about $250 each.
  • For backup the cost is 500GB $200/month.
  • I'm not including load balancer or firewall services as these don't apply to Amazon, which may be a negative depending on your thinking. Plus the provider as an excellent service and management infrastructure.
  • Cost: $2700/month.

    From Amazon:
  • Two extra large instances for the database servers. Your architecture here is more open and could take some rethinking. You could just rent 1 and bring another another on-line from the pool on failure, which would save about $500 a month. I'll assume we load balance read and write traffic here so we'll have two. Using one extra large instance is about the same price as two large instances.
  • Four small instances for the other servers. Here is another place the architecture could be rethought. It would be easy enough to buy one or two servers upfront and then add servers in response to demand. That might save about $140/month under low load conditions. Adding another 4 servers adds about $300.
  • 300 GB of storage. Doubling to $600 GB only adds about $50/month. If storing large amounts of data does become a requirement this could be a big win.
  • 200 GB transfer in, 1800 GB transfer out. This is a guesstimate. Doubling the numbers adds another $400.
  • 40,000 requests. No idea, but these are cheap so being wrong isn't that expensive.
  • Cost: about $1300/month using two large instance for the database.
  • Cost: about $1900/month using two extra large instances for the database.

    The cost numbers really stand out. You pay half for a similar setup and the cost of incrementally scaling along any dimension is relatively inexpensive. You could in fact start much smaller and much cheaper and simply pay as you grow. The comparison is not apples to apples however. All the potential problems with EC2 have to be factored in as well. As someone said, architectecting for EC2/S3 takes a different way of thinking about things. And that's really true. Many of the standard tricks don't apply anymore.

    Deploying customer facing production websites in a grid is not a well traveled path. If you don't want to walk the bleeding edge then EC2 may not be for you. For example, in the service provider scenario you have blisteringly fast disks in a very scalable RAID 10 setup. That will work. Now, how will your database work over S3? Is it even possible to deploy your database over S3 with confidence? Do you need to add a ton of caching nodes? Will you have to radically change your architecture in a way that doesn't fit your skill set or schedule? Will the extra care and monitoring needed by EC2 be unacceptable? Is the single data center model a problem? Does the lack of a hardware firewall and a load balancer seem like too big a weakness? Can you have any faith in Amazon as a grid provider?

    Only the shadow may know the answers to these questions, but the potential cost savings and the potential ease of scaling make the questions worth answering.

    Related Articles



  • Build an Infinitely Scalable Infrastructure for $100 Using Amazon
  • An Unorthodox Approach to Database Design : The Coming of the Shard
  • Reader Comments (18)

    I saw Amazon's new appliances earlier today, they look pretty impressive. Your application can be configured to bring nodes on and off-line as required, that's a big bonus if your application deals with uneven load. For example you could shut down half your servers overnight.

    I'm not clear on what the "instance storage" actually means though. You get 1 CPU, X ram, what does the storage do? Is it persistent? If you shut down the appliance, is that storage written to S3 or is it lost?

    http://www.callum-macdonald.com/" title="Callum" target="_blank">Callum

    December 31, 1999 | Unregistered Commenterchmac

    > If you shut down the appliance, is that storage written to S3

    Nope. It's just local disk that goes away when your host goes away. You have to take care of the persistence.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    Hey Todd, I don't think EC2 is out of beta just yet. S3 is out of beta, but not EC2.

    --Kent

    December 31, 1999 | Unregistered CommenterKent

    > I don't think EC2

    Whoops, good catch. Thanks.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    however with EC2 you have get sluggish disk performance and non-persistent storage. and if EC2 has a hiccup, poooo.. all your data is gone, classic disaster scenario.

    the prices you quoted for 4x 300GB 15K SCSI HDD in RAID 10 configuration are expensive for a reason. and dual cpu times quad core = 8 cores per CPU. it is more like the extra large instance on EC2.

    December 31, 1999 | Unregistered CommenterJon

    Todd -

    I'm an unabashed promoter of utility computing, but I think you've created an apples-vs-oranges comparison. EC2 isn't suited to running databases due to the lack of persistent storage. Beyond that, though, you've compared 8 core servers vs 4 core AMI instances. Large instances are listed as 8 ECU, but each ECU is 1/2 core.

    On a more curious note, exactly what is the use case for 1.75TB of non-persistent storage in the large AMI?

    December 31, 1999 | Unregistered CommenterBert Armijo

    > I'm an unabashed promoter of utility computing

    I am not much of a promoter, I am just trying to compare options for someone considering building a scalable website. I find it's better to take a real situation and see how it plays out.

    > EC2 isn't suited to running databases due to the lack of persistent storage

    Completely correct. Certainly it's not a match like coffee and chocolate, maybe it's more like oil and water with some vigorous shaking going on :-) For a typical MySQL installation a traditional SAN/NAS/etc solution is the easiest path. Though there is an S3 http://fallenpegasus.com/code/mysql-awss3">storage engine for MySQL, it's way too experimental for most.

    Another way to look it is to consider that LAMP is dead, long live LAMP, and it's time to reconsider how systems are architected up front. We tend to start small, get into trouble, throw a lot of cache at things, and then rethink the big picture. In grid land that process is reversed.

    A BigTable/Dynamo/key-value/proprieties approach makes a lot of sense on top of S3 whereas a RDBMS doesn't. Maybe it's the database that we should be looking to change?

    > You've compared 8 core servers vs 4 core AMI instances

    Yep, Jon mentioned that too. Thanks. It was a mistake on my part. I'll rerun the numbers, but I don't think it will change the overall result.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    I'm doing a new Web 2.0 startup and have looked at Mosso, Media Temple, Amazon and others for hosting. I've considered the grid approach and VPS, but most of these services are too limiting for me. Mosso doesn't even have shell access and Media Temple is stuck on MySQL 4.1.

    My architecture is MySQL 5, PHP5+APC, Squid, Apache2 and Memcached. None of these grids or VPS hosting providers support my needs. Besides, I don't see the point of these grids as they will get just as bogged down once more clients are on board. 200 websites on a single server = same problems as 5000 websites on a grid. High traffic sites have already been kicked off these so called grids.

    In the end, it looks like I'll be buying a dual quad-core server w/ 16 gb of ram and 2 tb of RAIDed disk space for around $4500 from Thinkmate or Silicon Mechanics, haven't decided on the vendor yet. If there is anyone else I should look at, let me know.

    That should scale up pretty nicely as the site gets underway. I can colo that server for $100 a month. If things work out, I'll scale out horizontally myself, if not, then I have a server that I can do something else with or sell.

    I realize that I could use services like Media Temple's DV Nitro, but that is $750 a month. If I did that, the first year costs would be $9000. I could buy two of my proposed servers for that. I just don't see any good service to use other than rolling my own.

    Any comments? Thoughts? Suggestions?

    December 31, 1999 | Unregistered CommenterEnzo

    > I could buy two of my proposed servers for that.

    Sounds very similar to some of the arguments that have ping ponged around my head :-) Since you've already picked your architecture then your approach makes sense. Going grid takes a different approach. The buy or rent obviously has pros and cons. If you have the cash to spend up front then you have more options. Many don't so have to conserve cash. If you do need to expand quickly do you have the cash to buy more boxes, including storage, switching, etc or could that cash go further to rent more machines to handle the load? How quickly can you bring those machines on-line? Do you want to admin and setup all those machines? If you have the cash, the skills, and the bandwidth, then DIY offers a lot of value for the money.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    A single dual quad w/ 16 gb of RAM will be the first phase. The second phase, if the site takes off, will be a new server for load balancing and two new web servers. I'll keep the original box for the db server. So I'll have four servers at the end of the second phase of growth. Third phase would probably be a separate Squid box and probably another web server depending on need. After that it would depend on need.

    The goal would be to have a single server suffice until the site is well under way. Once the initial server reaches its peak, then I would fork over the cash to purchase additional servers. After the third phase, I would probably hire or contract out the SA work.

    Given what I've read of the PlentyOfFish architecture, my approach seems pretty reasonable and probably the best path to take at this time.

    I'm just wondering if there's any new grid hosting services that I may be overlooking.

    December 31, 1999 | Unregistered CommenterEnzo

    And now, it's into "unlimited beta". Still no official SLAs, and no word on what Amazon will do when they grow into all that excess capacity they're reselling.

    The big change is that there's no more throttle on the number of developer signups they'll accept per day.

    December 31, 1999 | Unregistered CommenterMichael Nygard

    Enzo, have you looked at TGL (http://www.thegridlayer.com/products/virtual-private-datacenters.php) -- using their developer-class grids for your phase 1 and a full production grid for phase2/later?

    For an overview, see earlier post on High Scalability:
    http://www.highscalability.com/should-you-build-your-next-website-using-3teras-grid-os

    (full disclosure: I am with 3Tera - the company making the grid OS used by TGL to provide virtual private datacenters on dedicated hardware)

    -- Peter
    www.3tera.com

    December 31, 1999 | Unregistered CommenterPeterNic

    I've followed Amazon EC2/S3 with genuine interest for quite some time, but the fact that you don't have persist storage without using some sort of S3 "hack" and the fact that you don't have a static IP bothers me. It is of course still in beta, so things might improve..

    But just days ago, I discovered something called FlexiScale ( http://www.flexiscale.com/ ), and it looks very promising. Anyone here have any experience with them?

    - Christian

    December 31, 1999 | Unregistered Commentercfelde

    I contacted flexiscale as they look interesting. The problem is that are a little more expensive and they don't have data centers in the US.

    December 31, 1999 | Unregistered CommenterTodd Hoff

    Nice, Todd :) Looking forward to the response..

    December 31, 1999 | Unregistered Commentercfelde

    Amazon EC2 vs. hosting is indeed not an apples to apples comparison. I have not come across hosting providers that let me pay for capacity I use by the hour and by the gigabyte, respond to spikes in demand without singing up for monthly contracts, or give a small startup the ability to test their system on a 20-node cluster for a few hours just a sanity check before going live....
    As far as databases are concerned, Elastra (http://www.elastra.com) is currently in beta with customers successfully running MySQL, Postgres, and EnterpriseDB clusters, persistently storing their data on S3.

    December 31, 1999 | Unregistered CommenterKirill Sheynkman

    ....Could you just not use something like ElasticDrive and then store all your stuff on S3 like it was another hard drive?? Including your mySQL database.... I setup ElasticDrive on my EC2 instance and it worked great.

    Yes, I'd rather use the instance's space than S3 because I don't have many files as a developer...but the experience, the ability to grow, and host client's sites and NEVER worry about if I'm covered is great. I could simply cover the S3 costs in client's hosting costs...if you charge purely on the S3 transfer rate and storage costs, you're offering someone a WAY cheaper site than they could ever get hosted elsewhere...and probably better uptime...ok I know Amazon did have that hiccup recently =)

    So. The downsides I've come across as a small guy are the cost of ElasticDrive (I'm just using the 5GB free version, I'd ultimately need the unlimited if I was a company) and the persistence issue like everyone complains about. I do wish they held that info at least for 24 hours or something. Then ditch it...but what you have to realize is that they have to be careful with resources in order to offer that kind of pricing. Same reason why they don't like it when you eat up static IP addresses....BUT they do offer those "elastic" IPs for ya now too. So EC2 + S3 really is a great solution. EVEN for the small guy....if you're obsessive anyway. I guess I could get a $30 VPS...but I want more.

    December 31, 1999 | Unregistered CommenterTom

    Interesting article. In case you haven't seen it, check out a an article some guys at Microsoft wrote about scaling for a large number of clients quickly:

    http://www.usenix.org/events/usenix08/tech/full_papers/elson/elson_html/index.html

    December 31, 1999 | Unregistered CommenterRon Cormier

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>