Friday
Jun052009

HotPads Shows the True Cost of Hosting on Amazon

Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:

  • It gives real costs on on their servers, how many servers they have, what they are used for, and exactly how they use S2, EBS, CloudFront and other AWS services. This is great information for anybody trying to architect a system and wondering where to run it.
  • HotPads is a "real" application. It's a small company and at 4.5 million page-views/month it's large but not super large. It has custom server side components like indexing engines, image processing, and background database update engines for syncing new real estate data. And it also stores a lot of images and has low latency requirements.

    This a really good example mix of where many companies are or would like to be with their applications.

    Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses like testing. And some servers aren't necessary anymore as EBS handles backups so database slave servers are no longer required.

    There are lots more lessons like this that I've abstracted down below.

    Site: http://hotpads.com - a map-based real estate search engine, listing homes for sale, apartments, condos, and rental houses.

    Stats

  • 800,000 visits/month
  • 4.5 million page-views/month
  • 3.5 million real-estate listings updated daily

    Platform

  • Java
  • MySQL
  • AWS

    Costs

  • EC2 - $7400/month - run 20 of various size instances at anyone time. Most work is in the background processing of images, not web serving.
    * $150: 2 Small HAProxy Load Balancers - 2 for failover, these have the elastic IPs, round robin DNS point at the elastic IPs.
    * $1,200: 3-5 Large Tomcat Web Servers - an array of 3 run at night and 5 during the day.
    * $1,500: 5 Large Tomcat Job Servers
    * $900: 1 X-Large 1 Large Index Server - used to power property search and have several GB of RAM for the JVM
    * $1,200: 1 X-Large 2 Large MySQL masters
    * $1,200: 1 X-Large 2 Large MySQL slaves
    * $300: 1 Large Messaging Server ActiveMQ - will be replaced with SQS
    * $300: 1 Large Map tile creation servers Tilecache
    * $600: Development/testing/migration/ servers
  • S3 - $1500/month - few hundred million objects for files for maps and real-estate listing photos. 4TB of database backup stored as EBS diffs ($600/month).
  • Elastic Block Storage - $500/month
  • CloudFront - $460/month - is used to serve static files and map files throughout the world. It serves static files, map tiles, and listing photos.
  • Elastic IP Addresses - $8/month
  • RightScale - $500/month - used for management and deployment.

    Lessons Learned

  • Major reason for choosing EC2 was the cloud API which allows adding servers at any time. In their previous hosting service they had to prepay for a month at a time so they would order the minimum necessary to get by that month. That doesn't leave room for servers for development, test, preview servers for customers or making live database servers upgrades (which requires 2x servers)?
  • Overall cost is about the same as with previous hosting site but the overall speed of development and ease of management is night and day different. Getting more servers and lots more flexibility.
  • HotPads is a small company and doesn't think added trouble of colocation isn't worth it for them yet.
  • Advantage of Amazon over something like Google App Engine is that Amazon allows you to innovate by building your own services on your own machines.
  • S3 is better for larger objects because for small files that are not viewed often the cost of puts outweighs everything. Not a cache to use for short lived objects because the put costs start to dominate.
    * For a 67 KB object (600 px image) which is where the cost of putting an image into S3 equals the cost of storing it there and about equal the cost of storing it once.
    * For a 6.7 KB object (15 px thumb nail) the put (small fee for putting an object into S3) cost is 10x the storage transfer costs.
  • Costs have to figured into the algorithms you use.
    * In April 330 GB of images downloaded at $.15/GB cost $49. 55mm GETs at $1/mm cost $55. 42mm PUTs at $1/1k cost $420!
    * $100 download and GETs of maptiles.
    * So S3 very cheap for larger files, watch out for lots of short lived small files.
  • CloudFront is 10 times faster than S3 but is more expensive for infrequently viewed files.
    * Makes frequently viewed listings faster.
    * For infrequently viewed listings the CloudFront has to go to S3 to get the file the first time which means you have to pay twice for a file that will be viewed only once.
  • EBS
    * Used on database servers because it's faster than local storage (especially for random writes), blocks of data redundant, and supports easy backups and versioning via cloning.
    * Only 10% cost overhead.
    * Allowed them to get rid of second set of slaves because the backups were so CPU intensive they had to have slaves to do the backups. EBS allows snapshots of running drives so the extra slaves are unnecessary.
    * Databases are I/O bound and the CPU is vastly underutilized so there's extra capacity when you need it.
  • SimpleDB - not using, pretty proprietary. May be of value because you only pay for what you use given how under utilized your own database servers can be.
  • Reserved Instances
    * 1 year for the cost of 6 months and guaranteed (denied one time) to get an instance.
    * Con is tied to an instance type and they want more flexibility to choose instance types as their software changes and take advantage of new instance types as they are released.
  • Rather than having dedicated memcached machines they've scavenged 8 GB of memory from their existing servers.

    Related Sites

  • AWS Start-Up Event DC 2009: HotPads On AWS Slideshow.
  • Cloud Programming Directly Feeds Cost Allocation Back into Software Design
  • AWS Elastic Load Balancer Tutorial

  • Friday
    Jun052009

    SSL RPC API Scalability

    Hi all!

    So nice to start discussing cool things in this even cooler forum :)

    I am having a problem .. which i believe is already solved but i would love someone confirming actual experience with the same topic.

    We are building a client / server architecture, consisting of a web server part and many clients.
    Transport will be provided as either XML-RPC / SOAP / JSON or all at once.
    All of the communication has to be encrypted and passed within SSL3.

    We expect a high load when the application starts (> 2000 concurrent requests).
    Combine this with xml parsing for the rpc api, things really look ugly :)
    So it's a big mess :)

    It will not be that much database bound behind the api - mostly files will be transferred from the server to the clients and simple api for control.

    So it's pretty much a matter of 'what-to-do-with-ssl'.

    I was thinking of hardware - NetApp or a similar application accelerator.
    Can anyone give examples of a hardware piece that combines: Load balancer / SSL accelerator?

    I have also been reading about open source software Load Balancers but i really doubt it would meet the needs. Anyone having the same experience (or had) ? :)

    Thanks, all!

    Thursday
    Jun042009

    New Book: Even Faster Web Sites: Performance Best Practices for Web Developers

    Performance is critical to the success of any web site, and yet today's web applications push browsers to their limits with increasing amounts of rich content and heavy use of Ajax. In his new book Even Faster Web Sites: Performance Best Practices for Web Developers, Steve Souders, web performance evangelist at Google and former Chief Performance Yahoo!, provides valuable techniques to help you optimize your site's performance.

    Souders' previous book, the bestselling High Performance Web Sites, shocked the web development world by revealing that 80% of the time it takes for a web page to load is on the client side. In Even Faster Web Sites, Souders and eight expert contributors provide best practices and pragmatic advice for improving your site's performance in three critical categories:

    • JavaScript - Get advice for understanding Ajax performance, writing efficient JavaScript, creating responsive applications, loading scripts without blocking other components, and more.

    • Network - Learn to share resources across multiple domains, reduce image size without loss of quality, and use chunked encoding to render pages faster.

    • Browser - Discover alternatives to iframes, how to simplify CSS selectors, and other techniques.

    Speed is essential for today's rich media web sites and Web 2.0 applications. With this book, you'll learn how to shave precious seconds off your sites' load times and make them respond even faster.

    About the Author

    Steve Souders works at Google on web performance and open source initiatives. His book High Performance Web Sites explains his best practices for performance along with the research and real-world results behind them. Steve is the creator of YSlow, the performance analysis extension to Firebug. He is also co-chair of Velocity 2008, the first web performance conference sponsored by O'Reilly. He frequently speaks at such conferences as OSCON, Rich Web Experience, Web 2.0 Expo, and The Ajax Experience.

    Steve previously worked at Yahoo! as the Chief Performance Yahoo!, where he blogged about web performance on Yahoo! Developer Network. He was named a Yahoo! Superstar. Steve worked on many of the platforms and products within the company, including running the development team for My Yahoo!.

    Tuesday
    Jun022009

    GigaSpaces Launches a New Version of its Cloud Computing Framework

    This post include detailed on who is using the platform and how from Enterprise applicaitons, to ISV that are looking for SaaS enablement, through partners and solution providers that are looking for to gain a competitive advantage and deploy application in short time to market and small initial investment.

    Monday
    Jun012009

    Guess How Many Users it Takes to Kill Your Site?

    Update: Here's the first result. Good response time until 400 users. At 1,340 users the response time was 6 seconds. And at 2000 users the site was effectively did. An interesting point was that errors that could harm a site's reputation started at 1000 users. Cheers to the company that had the guts to give this a try.

    That which doesn't kill your site makes it stronger. Or at least that's the capacity planning strategy John Allspaw recommends (not really, but I'm trying to make a point here) in The Art of Capacity Planning:

    Using production traffic to define your resources ceilings in a controlled setting allows you to see firsthand what would happen when you run out of capacity in a particular resource. Of course I'm not suggesting that you run your site into the ground, but better to know what your real (not simulated) loads are while you're watching, than find out the hard way. In addition, a lot of unexpected systemic things can happen when load increases in a particular cluster or resource, and playing "find the butterfly effect" is a worthwhile exercise.

    The problem is how do you ever test to such a scale? That's where Randy Hayes of CapCal--a distributed performance testing system--comes in. Randy first contacted me asking for volunteers to try a test of a million users, which sounded like a great High Scalability sort of thing to do. Unfortunately he already found a volunteer so the idea now is to test how many users it takes to find a weakness in your site.

    If anyone wants test their system to the breaking point the process goes like this:

  • Guess how many users it will take to bring your average response time to two seconds.
  • Contact Randy at randy@capcal.com.
  • Download the CapCal client, record the test, and upload it to the server.
  • At a scheduled time the test will be run by ramping up virtual users until average response time >= 2 seconds
  • You will get a link to the results on the CapCal server. Here's an example result.
  • How close was your guess?
  • This cost will be whatever Amazon charges. An hour's worth of tests on virtual user counts up to 10,000 is about $45.

    In the past test generators were fun to write, but it was always difficult to get enough boxes to generate sufficient load. Maybe you remember installing test agents on people's work computers in cubeland so tests could be run over night when everyone was sleeping?

    The cloud has changed all that. Testing-as-a-Service is one very obvious and solid use of the cloud. You need load? We got your load right here. Spin up more machines and you can drive your site into oblivion, but not in a denial-of-service attack sort of way :-)

    Randy has a nice write up how their system works in CapCal Architecture and Background. It's similar in concept to other distributed testing frameworks you may have used, only this one operates in AWS and not on your own servers.

    Not everyone is Google or Yahoo with zillions of users to test their software against. If you are interested in testing your site please contact Randy and give it a go. And when you are done it would be fun to have an experience report here about what you learned and what changes you needed to make.
  • Monday
    Jun012009

    HotPads on AWS

    HotPads abandoned our managed hosting in December and took the leap over to EC2 and its siblings. The presentation has a lot of detail on costs and other things to watch out for, so if you're currently planning your "cloud" architecture, you'll find some of this really helpful.

    Click to read more ...

    Monday
    Jun012009

    Data grid comparison: Oracle Coherence vs Gigaspaces XAP

    A short summary of differences between Oracle Coherence and GigaSpaces XAP.

    Sunday
    May312009

    Need help on Site loading & database optimization - URGENT

    Hi Friends,

    I need some help in making site access fast. On an average my site has the traffic 2500 hits per day and on 16th May it had 60,000 hits. On this day site was loading very slow even it was getting time out. I also check out the processes running by using "top" command it was indicating mysql was taking too much load.

    There are around 166 tables (Including PHPBB forum) in my database. All contents on site are displayed by fetching it from database. I have also added indexing to respective tables where it is required. Plain PHP/HTML coding is used.

    Technology:

    PHP -- 5.2
    MYSQL -- 5.0
    Apache -- 2.0
    Linux

    Following is all the server details of my site:

    CPU : Single Socket Dual Core AMD Opteron 1212HE
    Memory: 2GB DDR RAM
    Hard Drive: 250GB SATA
    Ethernet: 100Mb Primary Ethernet Card

    (/var/log) # uname -a
    Linux 2.6.9-67.0.15.ELsmp #1 SMP Tue Apr 22 13:50:33 EDT 2008 i686 athlon i386 GNU/Linux

    kernel version:
    2.6.9-67.0.15.ELsmp

    (/var/log) # free -m
    total used free shared buffers cached
    Mem: 2026 1976 49 0 143 1474
    -/+ buffers/cache: 359 1667
    Swap: 1027 0 1027

    RAM: 2 G

    (/var/log) # df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda5 227G 20G 196G 10% /
    /dev/sda1 99M 12M 82M 13% /boot
    none 1014M 0 1014M 0% /dev/shm
    /dev/sda2 2.0G 196M 1.7G 11% /tmp

    Disk usage: 10% used/ 196 G available.

    Its an dedicated server and only 1 website is hosted.

    Can anybody please suggest how can I optimize site in more appropriate manner so that it will not go down if traffic increases on site.

    Thanks
    Sandy

    Sunday
    May312009

    Parallel Programming for real-world

    Multicore computers shift the burden of software performance from chip designers and architects to software developers.

    What is the parallel Computing ? and what the different between Multi-Threading and Concurrency and Parallelism ? and what is differences between task and data parallel ? and how we can use it ?

    Fundamental article into Parallel Programming...

    Friday
    May292009

    Is Eucalyptus ready to be your private cloud?


    Update:: Eucalyptus Goes Commercial with $5.5M Funding Round. This removes my objection that it's an academic project only. Go team go!

    Rich Wolski, professor of Computer Science at the University of California, Santa Barbara, gave a spirited talk on Eucalyptus to a large group of very interested cloudsters at the Eucalyptus Cloud Meetup. If Rich could teach computer science at every school the state of the computer science industry would be stratospheric. Rich is dynamic, smart, passionate, and visionary. It's that vision that prompted him to create Eucalyptus in the first place. Rich and his group are experts in grid and distributed computing, having a long and glorious history in that space. When he saw cloud computing on the rise he decided the best way to explore it was to implement what everyone accepted as a real cloud, Amazon's API. In a remarkably short time they implement Eucalyptus and have been improving it and tracking Amazon's changes ever since.

    The question I had going into the meetup was: should Eucalyptus be used to make an organization's private cloud? The short answer is no. Wait wait, it's now yes, see the update at the beginning of the article.

    The project is of high quality, the people are of the highest quality, but in the end Eucalyptus is a research project from a university. As an academic project Eucalyptus is subject to changes in funding and the research interests of the team. When funding sources dry up so does the project. If the team finds another research area more interesting, or if they get tired of chasing a continuous stream of new Amazon features, or no new grad students sign on, which will happen in a few years, then the project goes dark.

    Fears over continuity have at least two solutions: community support and commercial support. Eucalyptus could become community supported open source project. This is unlikely to happen though as it conflicts with the research intent of Eucalyptus. The Eucalyptus team plans to control the core for research purposes and encourage external development of add-on service like SQS. Eucalyptus won't go commercial as University projects must stay clear from commercial pretensions. Amazon is "no comment" on Eucalyptus so it's not clear what they would think of commercial development should it occur.

    Taken together these concerns imply Eucalyptus is not a good base for an enterprise quality private cloud. Which they readily admit. It's not enterprise ready Rich repeats. It's not that the quality isn't there. It is and will be. And some will certainly base their private cloud on Eucalyptus, but when making a decision of this type you have to be sure your cloud infrastructure will be around for the long haul. With Eucalyptus that is not necessarily the case. Eucalyptus is still a good choice for it's original research purpose, or as cheap staging platform for Amazon, or as base for temporary clouds, but as your rock solid private cloud infrastructure of the future Eucalyptus isn't the answer.

    The long answer is a little more nuanced and interesting.

    The primary purpose for Eucalyptus is research. It was never meant to be our little untethered private Amazon cloud. But if it works, why not?

    Eucalyptus is Not a Full Implementation of the Amazon Stack

    Eucalyptus implements most of EC2 and a little of S3. They hope to get community support for the rest. That of course makes Eucalyptus far less interesting as a development platform. But if your use for Eucalyptus is as an instant provisioning framework you are still in the game. Their emulation of EC2 is so good RightScale was able to operate on top of Eucalyptus. Impressive.

    But even in the EC2 arena I have to wonder for how long they'll track Amazon development. If you are a researcher implementing every new Amazon feature is going to get mighty old after a while. It will be time to move on and if you are dependent on Eucalyptus you are in trouble. Sure, you can move to Amazon but what about that $1 million data center buildout?

    Developing software not tied to the Amazon service stack then Eucalyptus would work great.

    As an Amazon developer I would want my code to work without too much trouble in both environments. Certainly you can mock the different services for testing or create a service layer to hide different implementations, but that's not ideal and makes Eucalyptus as an Amazon proxy less attractive.

    One of the uses for Eucalyptus is to make Amazon cheaper and easier by testing code locally without out having to deploy into Amazon all the time. Given the size of images the bandwidth and storage costs add up after a while, so this could make Eucalyptus a valuable part of the development process.

    Eucalyptus is Not as Scalable as Amazon

    No kidding. Amazon has an army of sysadmins, network engineers, and programmers to make their system work at such ginormous scales. Eucalyptus was built on smarts, grit and pizza. It will never scale as well as Amazon, but Eucalyptus is scalable to 256 nodes right now. Which is not bad.

    Rich thinks with some work they already know about it could scale to 5000 nodes. Not exactly Amazon scale, but good enough for many data center dreams.

    One big limit Eucalyptus has is the self-imposed requirement to work well in any environment. It's just a tarball you can install on top of any network. They rightly felt this was necessary for adoption. Saying to potential customers that you need to setup a special network before you can test this software tends to slow down adoption. By making Eucalyptus work as an overlay they soothed a lot of early adopter pain.

    But by giving up control of the machines, the OS, the disk, and the network they limited how scalable they can be. There's more to scalability than just software. Amazon has total control and that gives them power. Eucalyptus plans to make more invasive and more scalable options available in the future.

    Lacks Some Private Cloud Features

    Organizations interested in a private cloud are often interested in:

  • Control
  • Privacy and Security
  • Utility Chargeback System
  • Instant Provisioning Framework
  • Multi-tenancy
  • Temporary Infrastructure for Proof of Concept for "Real" Provisioning
  • Cloud Management Infrastructure

    Eucalyptus satisfies many of these needs, but a couple are left wanting:
  • The Utility Chargeback System allows companies to bill back departments for the resources they use and is a great way get around a rigid provisioning process and still provide accountability back to the budgeting process. Eucalyptus won't do this for you.
  • A first class Cloud Management Infrastructure is not part of Eucalyptus because it's not part of Amazon's API. Amazon doesn't expose their internal management process. Eucalyptus is adding some higher level management tools, but they'll be pretty basic.

    These features may or may not be important to you.

    Clouds vs Grids

    Endless pixels have been killed defining clouds, grids, and how they are different enough that there's really a whole new market to sell into. Rich actually makes a convincing argument that grids and clouds are different and do require a completely different infrastructure. The differences:

    Cloud

  • Full private cluster is provisioned
  • Individual user can only get a tiny fraction of the total resource pool
  • No support for cloud federation except through the client interface
  • Opaque with respect to resources

    Grid

  • Built so that individual users can get most, if not all of the resources in a single request
  • Middleware approach takes federation as a first principle
  • Resources are exposed, often as bare metal

    Related Articles

  • Get Off of My Cloud by M. Jagger and K. Richards.
  • Rich Wolski's Home Page
  • Enomaly
  • Nimbus