Friday
Jun052009
HotPads Shows the True Cost of Hosting on Amazon

Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:
This a really good example mix of where many companies are or would like to be with their applications.
Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses like testing. And some servers aren't necessary anymore as EBS handles backups so database slave servers are no longer required.
There are lots more lessons like this that I've abstracted down below.
Site: http://hotpads.com - a map-based real estate search engine, listing homes for sale, apartments, condos, and rental houses.
Stats
Platform
Costs
* $150: 2 Small HAProxy Load Balancers - 2 for failover, these have the elastic IPs, round robin DNS point at the elastic IPs.
* $1,200: 3-5 Large Tomcat Web Servers - an array of 3 run at night and 5 during the day.
* $1,500: 5 Large Tomcat Job Servers
* $900: 1 X-Large 1 Large Index Server - used to power property search and have several GB of RAM for the JVM
* $1,200: 1 X-Large 2 Large MySQL masters
* $1,200: 1 X-Large 2 Large MySQL slaves
* $300: 1 Large Messaging Server ActiveMQ - will be replaced with SQS
* $300: 1 Large Map tile creation servers Tilecache
* $600: Development/testing/migration/ servers
Lessons Learned
* For a 67 KB object (600 px image) which is where the cost of putting an image into S3 equals the cost of storing it there and about equal the cost of storing it once.
* For a 6.7 KB object (15 px thumb nail) the put (small fee for putting an object into S3) cost is 10x the storage transfer costs.
* In April 330 GB of images downloaded at $.15/GB cost $49. 55mm GETs at $1/mm cost $55. 42mm PUTs at $1/1k cost $420!
* $100 download and GETs of maptiles.
* So S3 very cheap for larger files, watch out for lots of short lived small files.
* Makes frequently viewed listings faster.
* For infrequently viewed listings the CloudFront has to go to S3 to get the file the first time which means you have to pay twice for a file that will be viewed only once.
* Used on database servers because it's faster than local storage (especially for random writes), blocks of data redundant, and supports easy backups and versioning via cloning.
* Only 10% cost overhead.
* Allowed them to get rid of second set of slaves because the backups were so CPU intensive they had to have slaves to do the backups. EBS allows snapshots of running drives so the extra slaves are unnecessary.
* Databases are I/O bound and the CPU is vastly underutilized so there's extra capacity when you need it.
* 1 year for the cost of 6 months and guaranteed (denied one time) to get an instance.
* Con is tied to an instance type and they want more flexibility to choose instance types as their software changes and take advantage of new instance types as they are released.
Reader Comments (5)
Love the 67kb rule of thumb for S3 put/storage ratio. Good to know!
Wow. This is a good study and thanks for the data and details. We at http://oCricket.com started off with AWS from Ground Zero and plan to go all the way along.
I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps to manage S3 on Windows . It is a freeware.
We at http://www.binfire.com decided not to go with EC2 and S3. We are using our own servers. So far it has been more cost effective for us, but this article gave me a few details which I did not have before!
Thanks...
You should check out a company called OpSource. www.OpSource.net
They provide a total web operations solution for many of todays largest enterprise SaaS and On-Demand applications.
Amazon is great if your just starting off, but when you start to get serious and large enough you have to take in many other hidden costs not included here. For example, I don't see any information breaking down staff costs. Also, How does Amazon help in relation to optimization. They seem to allow you to throw a ton of hardware at a problem, but if the application had some suggested optimizations, perhaps you would pay even less long term.
I also ask about Compliance, Monitoring, and uptime.