High Scalability -

33 Comments |

Permalink |

Ask 37signals: How do you process credit cards?

Microsoft

Tuesday

Jul282009

37signals Architecture

Tuesday, July 28, 2009 at 2:15PM

Update 7: Basecamp, now with more vroom. Basecamp application servers running Ruby code were upgraded and virtualization was removed. The result: A 66 % reduction in the response time while handling multiples of the traffic is beyond what I expected. They still use virtualization (Linux KVM), just less of it now.
Update 6: Things We’ve Learned at 37Signals. Themes: less is more; don't worry be happy.
Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
Update 2: customer support is handled in real-time using Campfire.
Update: highly useful information on creating a customer billing system.

In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

Site: http://www.37signals.com

Information Sources

Ask 37signals: Numbers?

Behind the scenes at 37signals: Support

Ask 37signals: Why did you restart Highrise?

Platform

Ruby on Rails

Memcached

Xen

MySQL

S3 for image storage

The Stats

30 servers ranging from single processor file servers to 8 CPU application servers for about 100 CPUs and 200GB of RAM.

Plan to diagonally scale by reducing the number of servers to 16 for about 92 CPU cores (each significantly faster than what are used today) and 230 GB of combined RAM.

Xen virtualization will be used to improve system management.

Basecamp (web based project management)
* 2,000,000 people with accounts
* 1,340,000 projects
* 13,200,000 to-do items
* 9,200,000 messages
* 12,200,000 comments
* 5,500,000 time tracking entries
* 4,000,000 milestones

Backpack (personal and small business information management)
* Just under 1,000,000 pages
* 6,800,000 to-do items
* 1,500,000 notes
* 829,000 photos
* 370,000 files

Overall storage stats (Nov 2007)
* 5.9 terabytes of customer-uploaded files
* 888 GB files uploaded (900,000 requests)
* 2 TB files downloaded (8,500,000 requests)

The Architecture

Memcached caching is used and they are looking to add more. Yields impressive performance results.

URL helper methods are used rather than building the URLs by hand.

Standard ActiveRecord built queries are used, but for performance reasons they will also "dig in and use" find_by_sql when necessary.

They fix Rails when they run into performance problems. It pays to be king :-)

Amazon’s S3 is used for storage of files upload by users. Extremely happy with results.

Credit Card Processing Process

Bill monthly. It makes credit card companies more comfortable because they won't be on the hook for a large chunk of change if your company goes out of business. Customers also like it better because it costs less up front and you don't need a contract. Just pay as long as you want the service.

Get a Merchant Account. One is needed to process credit cards. They use Chase Bank. Use someone you trust and later negotiate rates when you get enough volume that it matters.

Authorize.net is the gateway they use to process the credit card charge.

A custom built system handles the monthly billing. It runs each night and bills the appropriate people and records the result.

On success an invoice is sent via email.

On failure an explanation is sent to the customer.

If the card is declined three times the account is frozen until a valid card number is provided.

Error handling is critical because problems with charges are common. Freeze to fast is bad, freezing too slow is also bad.

All products are being converted to using a centralized billing service.

You need to be PCI DSS (Payment Card Industry Data Security Standard) compliant.

Use a gateway service that makes it so you don't have to store credit card numbers on your site. That makes your life easier because of the greater security. Some gateway services do have reoccurring billing so you don't have to do it yourself.

Customer Support

Campfire is used for customer service. Campfire is a web-based group chat tool, password-protectable, with chatting, file sharing, image previewing, and decision making.

Issues discussed are used to drive code changes and the subversion commit is shown in the conversation. Seems to skip a bug tracking system, which would make it hard to manage bugs and features in any traditional sense, ie, you can't track subversion changes back to a bug and you can't report what features and bugs are in a release.

Support can solve problems by customers uploading images, sharing screens, sharing files, and chatting in real-time.

Developers are always on within Campfire addressing problems in real-time with the customers.

Lessons Learned

Take a lesson from Amazon and build internal functions as services from the start. This make it easier to share them across all product lines and transparently upgrade features.

Don't store credit card numbers on your site. This greatly reduces your security risk.

Developers and customers should interact in real-time on a public forum. Customers get better service as developers handle issues as they come up in the normal flow of their development cycle. Several layers of the usual BS are removed. Developers learn what customers like and dislike which makes product development more agile. Customers can see the responsiveness of the company to customers by reading the interactions. This goes a long ways to give potential customers the confidence and the motivation to sign up.

Evolve your software by actual features needed by users instead of making up features someone might need someday. Otherwise you end up building something that nobody wants and won't work anyway.

Twitter API Traffic is 10x Twitter’s Site.

11 Comments |

Permalink |

MySQL,

RoR,

S3,

haproxy,

sharding

Saturday

Jun272009

Scaling Twitter: Making Twitter 10000 Percent Faster

Saturday, June 27, 2009 at 11:46PM

Update 6: Some interesting changes from Twitter's Evan Weaver: everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally.
Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level language, static typing, functional). Ruby is used on the front-end but wasn't performant or reliable enough for the back-end.
Update 4: Improving Running Components at Twitter by Evan Weaver. Tells how Twitter changed their infrastructure to go from handling 3 requests to 139 requests a second. They moved to a messaging model, asynchronous process, 3 levels of cache, and moved their middleware to a mixture C and Scala/JVM.
Update 3: Upgrading Twitter without service disruptions by Gojko Adzic. Lots of good updates on the new Twitter architecture.
Update 2: a commenter in Twitter Fails Macworld Keynote Test said this entry needs to be updated. LOL. My uneducated guess is it's not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. The predictability of this problem is debatable, but once you have it, it's hard to fix.
Update: Twitter releases Starling - light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter's backend, and is in production across Twitter's cluster.

Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months. Early design decisions that worked well in the small melted under the crush of new users chirping tweets to all their friends. Web darling Ruby on Rails was fingered early for the scaling problems, but Blaine Cook, Twitter's lead architect, held Ruby blameless:

For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.

If Ruby on Rails wasn't to blame, how did Twitter learn to scale ever higher and higher?

Update: added slides Small Talk on Getting Big. Scaling a Rails App & all that Jazz

Site: http://twitter.com

Information Sources

Scaling Twitter Video by Blaine Cook.

Scaling Twitter Slides

Good News blog post by Rick Denatale

Scaling Twitter blog post Patrick Joyce.

A Small Talk on Getting Big. Scaling a Rails App & all that Jazz - really cute dog picks

The Platform

Ruby on Rails

Erlang

MySQL

Mongrel - hybrid Ruby/C HTTP server designed to be small, fast, and secure

Munin

Nagios

Google Analytics

AWStats - real-time logfile analyzer to get advanced statistics

Memcached

The Stats

Over 350,000 users. The actual numbers are as always, very super super top secret.

600 requests per second.

Average 200-300 connections per second. Spiking to 800 connections per second.

MySQL handled 2,400 requests per second.

180 Rails instances. Uses Mongrel as the "web" server.

1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.

30+ processes for handling odd jobs.

8 Sun X4100s.

Process a request in 200 milliseconds in Rails.

Average time spent in the database is 50-100 milliseconds.

Over 16 GB of memcached.

The Architecture

Ran into very public scaling problems. The little bird of failure popped up a lot for a while.

Originally they had no monitoring, no graphs, no statistics, which makes it hard to pinpoint and solve problems. Added Munin and Nagios. There were difficulties using tools on Solaris. Had Google analytics but the pages weren't loading so it wasn't that helpful :-)

Use caching with memcached a lot.
- For example, if getting a count is slow, you can memoize the count into memcache in a millisecond.
- Getting your friends status is complicated. There are security and other issues. So rather than doing a query, a friend's status is updated in cache instead. It never touches the database. This gives a predictable response time frame (upper bound 20 msecs).
- ActiveRecord objects are huge so that's why they aren't cached. So they want to store critical attributes in a hash and lazy load the other attributes on access.
- 90% of requests are API requests. So don't do any page/fragment caching on the front-end. The pages are so time sensitive it doesn't do any good. But they cache API requests.

Messaging
- Use message a lot. Producers produce messages, which are queued, and then are distributed to consumers. Twitter's main functionality is to act as a messaging bridge between different formats (SMS, web, IM, etc).
- Send message to invalidate friend's cache in the background instead of doing all individually, synchronously.
- Started with DRb, which stands for distributed Ruby. A library that allows you to send and receive messages from remote Ruby objects via TCP/IP. But it was a little flaky and single point of failure.
- Moved to Rinda, which a shared queue that uses a tuplespace model, along the lines of Linda. But the queues are persistent and the messages are lost on failure.
- Tried Erlang. Problem: How do you get a broken server running at Sunday Monday with 20,000 users waiting? The developer didn't know. Not a lot of documentation. So it violates the use what you know rule.
- Moved to Starling, a distributed queue written in Ruby.
- Distributed queues were made to survive system crashes by writing them to disk. Other big websites take this simple approach as well.

SMS is handled using an API supplied by third party gateway's. It's very expensive.

Deployment
- They do a review and push out new mongrel servers. No graceful way yet.
- An internal server error is given to the user if their mongrel server is replaced.
- All servers are killed at once. A rolling blackout isn't used because the message queue state is in the mongrels and a rolling approach would cause all the queues in the remaining mongrels to fill up.

Abuse
- A lot of down time because people crawl the site and add everyone as friends. 9000 friends in 24 hours. It would take down the site.
- Build tools to detect these problems so you can pinpoint when and where they are happening.
- Be ruthless. Delete them as users.

Partitioning
- Plan to partition in the future. Currently they don't. These changes have been enough so far.
- The partition scheme will be based on time, not users, because most requests are very temporally local.
- Partitioning will be difficult because of automatic memoization. They can't guarantee read-only operations will really be read-only. May write to a read-only slave, which is really bad.

Twitter's API Traffic is 10x Twitter’s Site
- Their API is the most important thing Twitter has done.
- Keeping the service simple allowed developers to build on top of their infrastructure and come up with ideas that are way better than Twitter could come up with. For example, Twitterrific, which is a beautiful way to use Twitter that a small team with different priorities could create.

Monit is used to kill process if they get too big.

Lessons Learned

Talk to the community. Don't hide and try to solve all problems yourself. Many brilliant people are willing to help if you ask.

Treat your scaling plan like a business plan. Assemble a board of advisers to help you.

Build it yourself. Twitter spent a lot of time trying other people's solutions that just almost seemed to work, but not quite. It's better to build some things yourself so you at least have some control and you can build in the features you need.

Build in user limits. People will try to bust your system. Put in reasonable limits and detection mechanisms to protect your system from being killed.

Don't make the database the central bottleneck of doom. Not everything needs to require a gigantic join. Cache data. Think of other creative ways to get the same result. A good example is talked about in Twitter, Rails, Hammers, and 11,000 Nails per Second.

Make your application easily partitionable from the start. Then you always have a way to scale your system.

Realize your site is slow. Immediately add reporting to track problems.

Optimize the database.
- Index everything. Rails won't do this for you.
- Use explain to how your queries are running. Indexes may not be being as you expect.
- Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins.
- Avoid complex joins.
- Avoid scanning large sets of data.

Cache the hell out of everything. Individual active records are not cached, yet. The queries are fast enough for now.

Test everything.
- You want to know when you deploy an application that it will render correctly.
- They have a full test suite now. So when the caching broke they were able to find the problem before going live.

Long running processes should be abstracted to daemons.

Use exception notifier and exception logger to get immediate notification of problems so you can address the right away.

Don't do stupid things.
- Scale changes what can be stupid.
- Trying to load 3000 friends at once into memory can bring a server down, but when there were only 4 friends it works great.

Most performance comes not from the language, but from application design.

Turn your website into an open service by creating an API. Their API is a huge reason for Twitter's success. It allows user's to create an ever expanding and ecosystem around Twitter that is difficult to compete with. You can never do all the work your user's can do and you probably won't be as creative. So open you application up and make it easy for others to integrate your application with theirs.

For a discussion of partitioning take a look at Amazon Architecture, An Unorthodox Approach to Database Design : The Coming of the Shard, Flickr Architecture

The Mailinator Architecture has good strategies for abuse protection.

GoogleTalk Architecture addresses some interesting issues when scaling social networking sites.

Channel9 Interview with Markus Frind

76 Comments |

Permalink |

RoR,

Twitter

Friday

Jun262009

PlentyOfFish Architecture

Friday, June 26, 2009 at 3:58PM

Update 5: PlentyOfFish Update - 6 Billion Pageviews And 32 Billion Images A Month
Update 4: Jeff Atwood costs out Markus' scale up approach against a scale out approach and finds scale up wanting. The discussion in the comments is as interesting as the article. My guess is Markus doesn't want to rewrite his software to work across a scale out cluster so even if it's more expensive scale up works better for his needs.
Update 3: POF now has 200 million images and serves 10,000 images served per second. They'll be moving to a 250,000 IOPS RamSan to handle the load. Also upgraded to a core database machine with 512 GB of RAM, 32 CPU’s, SQLServer 2008 and Windows 2008.
Update 2: This seems to be a POF Peer1 love fest infomercial. It's pretty content free, but the production values are high. Lots of quirky sounds and fish swimming on the screen.
Update: by Facebook standards Read/WriteWeb says POF is worth a cool one billion dollars. It helps to talk like Dr. Evil when saying it out loud.

PlentyOfFish is a hugely popular on-line dating system slammed by over 45 million visitors a month and 30+ million hits a day (500 - 600 pages per second). But that's not the most interesting part of the story. All this is handled by one person, using a handful of servers, working a few hours a day, while making $6 million a year from Google ads. Jealous? I know I am. How are all these love connections made using so few resources?

Site: http://www.plentyoffish.com/

Information Sources

Blog of Markus Frind

Plentyoffish: 1-Man Company May Be Worth $1Billion

The Platform

Microsoft Windows

ASP.NET

IIS

Akamai CDN

Foundry ServerIron Load Balancer

The Stats

PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.

POF has one single employee: the founder and CEO Markus Frind.

Makes up to $10 million a year on Google ads working only two hours a day.

30+ Million Hits a Day (500 - 600 pages per second).

1.1 billion page views and 45 million visitors a month.

Has 5-10 times the click through rate of Facebook.

A top 30 site in the US based on Competes Attention metric, top 10 in Canada and top 30 in the UK.

2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.

3 DB servers. No data on their configuration.

Approaching 64,000 simultaneous connections and 2 million page views per hour.

Internet connection is a 1Gbps line of which 200Mbps is used.

1 TB/day serving 171 million images through Akamai.

6TB storage array to handle millions of full sized images being uploaded every month to the site.

What's Inside

Revenue model has been to use Google ads. Match.com, in comparison, generates $300 million a year, primarily from subscriptions. POF's revenue model is about to change so it can capture more revenue from all those users. The plan is to hire more employees, hire sales people, and sell ads directly instead of relying solely on AdSense.

With 30 million page views a day you can make good money on advertising, even a 5 - 10 cents a CPM.

Akamai is used to serve 100 million plus image requests a day. If you have 8 images and each takes 100 msecs you are talking a second load just for the images. So distributing the images makes sense.

10’s of millions of image requests are served directly from their servers, but the majority of these images are less than 2KB and are mostly cached in RAM.

Everything is dynamic. Nothing is static.

All outbound Data is Gzipped at a cost of only 30% CPU usage. This implies a lot of processing power on those servers, but it really cuts bandwidth usage.

No caching functionality in ASP.NET is used. It is not used because as soon as the data is put in the cache it's already expired.

No built in components from ASP are used. Everything is written from scratch. Nothing is more complex than a simple if then and for loops. Keep it simple.

Load balancing
- IIS arbitrarily limits the total connections to 64,000 so a load balancer was added to handle the large number of simultaneous connections. Adding a second IP address and then using a round robin DNS was considered, but the load balancer was considered more redundant and allowed easier swap in of more web servers. And using ServerIron allowed advanced functionality like bot blocking and load balancing based on passed on cookies, session data, and IP data.
- The Windows Network Load Balancing (NLB) feature was not used because it doesn't do sticky sessions. A way around this would be to store session state in a database or in a shared file system.
- 8-12 NLB servers can be put in a farm and there can be an unlimited number of farms. A DNS round-robin scheme can be used between farms. Such an architecture has been used to enable 70 front end web servers to support over 300,000 concurrent users.
- NLB has an affinity option so a user always maps to a certain server, thus no external storage is used for session state and if the server fails the user loses their state and must relogin. If this state includes a shopping cart or other important data, this solution may be poor, but for a dating site it seems reasonable.
- It was thought that the cost of storing and fetching session data in software was too expensive. Hardware load balancing is simpler. Just map users to specific servers and if a server fails have the user log in again.
- The cost of a ServerIron was cheaper and simpler than using NLB. Many major sites use them for TCP connection pooling, automated bot detection, etc. ServerIron can do a lot more than load balancing and these features are attractive for the cost.

Has a big problem picking an ad server. Ad server firms want several hundred thousand a year plus they want multi-year contracts.

In the process of getting rid of ASP.NET repeaters and instead uses the append string thing or response.write. If you are doing over a million page views a day just write out the code to spit it out to the screen.

Most of the build out costs went towards a SAN. Redundancy at any cost.

Growth was through word of mouth. Went nuts in Canada, spread to UK, Australia, and then to the US.

Database
- One database is the main database.
- Two databases are for search. Load balanced between search servers based on the type of search performed.
- Monitors performance using task manager. When spikes show up he investigates. Problems were usually blocking in the database. It's always database issues. Rarely any problems in .net. Because POF doesn't use the .net library it's relatively easy to track down performance problems. When you are using many layers of frameworks finding out where problems are hiding is frustrating and hard.
- If you call the database 20 times per page view you are screwed no matter what you do.
- Separate database reads from writes. If you don't have a lot of RAM and you do reads and writes you get paging involved which can hang your system for seconds.
- Try and make a read only database if you can.
- Denormalize data. If you have to fetch stuff from 20 different tables try and make one table that is just used for reading.
- One day it will work, but when your database doubles in size it won't work anymore.
- If you only do one thing in a system it will do it really really well. Just do writes and that's good. Just do reads and that's good. Mix them up and it messes things up. You run into locking and blocking issues.
- If you are maxing the CPU you've either done something wrong or it's really really optimized. If you can fit the database in RAM do it.

The development process is: come up with an idea. Throw it up within 24 hours. It kind of half works. See what user response is by looking at what they actually do on the site. Do messages per user increase? Do session times increase? If people don't like it then take it down.

System failures are rare and short lived. Biggest issues are DNS issues where some ISP says POF doesn't exist anymore. But because the site is free, people accept a little down time. People often don't notice sites down because they think it's their problem.

Going from one million to 12 million users was a big jump. He could scale to 60 million users with two web servers.

Will often look at competitors for ideas for new features.

Will consider something like S3 when it becomes geographically load balanced.

Lessons Learned

You don't need millions in funding, a sprawling infrastructure, and a building full of employees to create a world class website that handles a torrent of users while making good money. All you need is an idea that appeals to a lot of people, a site that takes off by word of mouth, and the experience and vision to build a site without falling into the typical traps of the trade. That's all you need :-)

Necessity is the mother of all change.

When you grow quickly, but not too quickly you have a chance grow, modify, and adapt.

RAM solves all problems. After that it's just growing using bigger machines.

When starting out keep everything as simple as possible. Nearly everyone gives this same advice and Markus makes a noticeable point of saying everything he does is just obvious common sense. But clearly what is simple isn't merely common sense. Creating simple things is the result of years of practical experience.

Keep database access fast and you have no issues.

A big reason POF can get away with so few people and so little equipment is they use a CDN for serving large heavily used content. Using a CDN may be the secret sauce in a lot of large websites. Markus thinks there isn't a single site in the top 100 that doesn’t use a CDN. Without a CDN he thinks load time in Australia would go to 3 or 4 seconds because of all the images.

Advertising on Facebook yielded poor results. With 2000 clicks only 1 signed up. With a CTR of 0.04% Facebook gets 0.4 clicks per 1000 ad impressions, or .4 clicks per CPM. At 5 cent/CPM = 12.5 cents a click, 50 cent/CPM = $1.25 a click. $1.00/CPM = $2.50 a click. $15.00/CPM = $37.50 a click.

It's easy to sell a few million page views at high CPM’s. It's a LOT harder to sell billions of page views at high CPM’s, as shown by Myspace and Facebook.

The ad-supported model limits your revenues. You have to go to a paid model to grow larger. To generate 100 million a year as a free site is virtually impossible as you need too big a market.

Growing page views via Facebook for a dating site won't work. Having a visitor on you site is much more profitable. Most of Facebook's page views are outside the US and you have to split 5 cent CPM’s with Facebook.

Co-req is a potential large source of income. This is where you offer in your site's sign up to send the user more information about mortgages are some other product.

You can't always listen to user responses. Some users will always love new features and others will hate it. Only a fraction will complain. Instead, look at what features people are actually using by watching your site.

MySpace also uses Windows to run their site.

Markus Frind's posts on Webmaster World.

And the Money Comes Rolling In by Max Chafkin

How I started A Dating Empire by Markus Frind

Thanks to Erik Osterman for recommending profiling PlentyOfFish.

67 Comments |

Permalink |

.Net,

Cloud Programming Directly Feeds Cost Allocation Back into Software Design

Windows

Friday

Jun052009

HotPads Shows the True Cost of Hosting on Amazon

Friday, June 5, 2009 at 3:08AM

Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:

It gives real costs on on their servers, how many servers they have, what they are used for, and exactly how they use S2, EBS, CloudFront and other AWS services. This is great information for anybody trying to architect a system and wondering where to run it.

HotPads is a "real" application. It's a small company and at 4.5 million page-views/month it's large but not super large. It has custom server side components like indexing engines, image processing, and background database update engines for syncing new real estate data. And it also stores a lot of images and has low latency requirements.

This a really good example mix of where many companies are or would like to be with their applications.

Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses like testing. And some servers aren't necessary anymore as EBS handles backups so database slave servers are no longer required.

There are lots more lessons like this that I've abstracted down below.

Site: http://hotpads.com - a map-based real estate search engine, listing homes for sale, apartments, condos, and rental houses.

Stats

800,000 visits/month

4.5 million page-views/month

3.5 million real-estate listings updated daily

Platform

Java

MySQL

AWS

Costs

EC2 - $7400/month - run 20 of various size instances at anyone time. Most work is in the background processing of images, not web serving.
* $150: 2 Small HAProxy Load Balancers - 2 for failover, these have the elastic IPs, round robin DNS point at the elastic IPs.
* $1,200: 3-5 Large Tomcat Web Servers - an array of 3 run at night and 5 during the day.
* $1,500: 5 Large Tomcat Job Servers
* $900: 1 X-Large 1 Large Index Server - used to power property search and have several GB of RAM for the JVM
* $1,200: 1 X-Large 2 Large MySQL masters
* $1,200: 1 X-Large 2 Large MySQL slaves
* $300: 1 Large Messaging Server ActiveMQ - will be replaced with SQS
* $300: 1 Large Map tile creation servers Tilecache
* $600: Development/testing/migration/ servers

S3 - $1500/month - few hundred million objects for files for maps and real-estate listing photos. 4TB of database backup stored as EBS diffs ($600/month).

Elastic Block Storage - $500/month

CloudFront - $460/month - is used to serve static files and map files throughout the world. It serves static files, map tiles, and listing photos.

Elastic IP Addresses - $8/month

RightScale - $500/month - used for management and deployment.

Lessons Learned

Major reason for choosing EC2 was the cloud API which allows adding servers at any time. In their previous hosting service they had to prepay for a month at a time so they would order the minimum necessary to get by that month. That doesn't leave room for servers for development, test, preview servers for customers or making live database servers upgrades (which requires 2x servers)?

Overall cost is about the same as with previous hosting site but the overall speed of development and ease of management is night and day different. Getting more servers and lots more flexibility.

HotPads is a small company and doesn't think added trouble of colocation isn't worth it for them yet.

Advantage of Amazon over something like Google App Engine is that Amazon allows you to innovate by building your own services on your own machines.

S3 is better for larger objects because for small files that are not viewed often the cost of puts outweighs everything. Not a cache to use for short lived objects because the put costs start to dominate.
* For a 67 KB object (600 px image) which is where the cost of putting an image into S3 equals the cost of storing it there and about equal the cost of storing it once.
* For a 6.7 KB object (15 px thumb nail) the put (small fee for putting an object into S3) cost is 10x the storage transfer costs.

Costs have to figured into the algorithms you use.
* In April 330 GB of images downloaded at $.15/GB cost $49. 55mm GETs at $1/mm cost $55. 42mm PUTs at $1/1k cost $420!
* $100 download and GETs of maptiles.
* So S3 very cheap for larger files, watch out for lots of short lived small files.

CloudFront is 10 times faster than S3 but is more expensive for infrequently viewed files.
* Makes frequently viewed listings faster.
* For infrequently viewed listings the CloudFront has to go to S3 to get the file the first time which means you have to pay twice for a file that will be viewed only once.

EBS
* Used on database servers because it's faster than local storage (especially for random writes), blocks of data redundant, and supports easy backups and versioning via cloning.
* Only 10% cost overhead.
* Allowed them to get rid of second set of slaves because the backups were so CPU intensive they had to have slaves to do the backups. EBS allows snapshots of running drives so the extra slaves are unnecessary.
* Databases are I/O bound and the CPU is vastly underutilized so there's extra capacity when you need it.

SimpleDB - not using, pretty proprietary. May be of value because you only pay for what you use given how under utilized your own database servers can be.

Reserved Instances
* 1 year for the cost of 6 months and guaranteed (denied one time) to get an instance.
* Con is tied to an instance type and they want more flexibility to choose instance types as their software changes and take advantage of new instance types as they are released.

Rather than having dedicated memcached machines they've scavenged 8 GB of memory from their existing servers.

AWS Elastic Load Balancer Tutorial

5 Comments |

Permalink |

Heroku Open Source Plugins etc

amazon

Friday

May152009

Wolfram|Alpha Architecture

Friday, May 15, 2009 at 1:13AM

Making the world's knowledge computable

Today's Wolfram|Alpha is the first step in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone. You enter your question or calculation, and Wolfram|Alpha uses its built-in algorithms and growing collection of data to compute the answer.

Answer Engine vs Search Engine

When Wolfram|Alpha launches later today, it will be one of the most computationally intensive websites on the internet. The Wolfram|Alpha computational knowledge engine is an "answer engine" that is able to produce answers to various questions such as

What is the GDP of France?
Weather is Springfield when David Ortiz was born
33 g of gold
LDL vs. serum potassium 150 smoker male age 40
life expectancy male age 40 finland
highschool teacher median wage

Wolfram|Alpha excels at different areas like mathematics, statistics, physics, engineering, astronomy, chemistry, life sciences, geology, business and finance as demonstrated by Steven Wolfram in his Introduction screencast.

The Stats

Abour 10,000 CPU cores at launch
10+ trillion of pieces of data
50,000+ types of algorithms
Able to handle about 175 million queries per day
5+ million lines of symbolic Mathematica code

The Computers Powering Computable Knowledge

There is no way to know exactly how much traffic to expect, especially during the initial period immediately following the launch, but the Wolfram|Alpha team is working hard to put reasonable capacity in place. As Stephen writes in the Wolfram|Alpha blog Alpha will run in 5 distributed colocation facilities. What computing power have they gathered in these facilities for launch day? Two supercomputers, just about 10,000 processor cores, hundreds of terabytes of disks, a heck of a lot of bandwidth, and what seems like enough air conditioning for the Sahara to host a ski resort. One of their launch partners, R Systems, created the world’s 44th largest supercomputer (per the June 2008 TOP500 list - it is listed as 66th per the latest Top500 list). They call it the R Smarr. It will be running Wolfram|Alpha on launch day! R Smarr has a Sum Rmax of 39580 GFlops using Dell DCS CS23-SH, QC HT 2.8 GHz computers, 4608 cores, 65536 GB of RAM and Infiniband interconnect. Dell is another of the launch partners with a data center full of quad-board, dual-processor, quad-core Harpertown servers. What does it all add up to? The ability to handle 175 million queries (yielding maybe a billion) per day—over 5 billion queries (encompassing around 30 billion calculations) per month.

The Launch of Wolfram|Alpha

Watch a live webcast of the Wolfram|Alpha system being brought online for the first time on

Friday, May 15, beginning at 7pm CST

The First Killer App of The New Kind of Science

The Genius behind Wolfram|Alpha is Stephen Wolfram. He is best know for his ambitious projects: Mathematica and A New Kind of Science (NKS). May 14, 2009 marks the 7th anniversary of the publication of his book A New Kind of Science. Stephen explains is his blog post: But for me the biggest thing that’s happened this year is the emergence of Wolfram|Alpha. Wolfram|Alpha is, I believe, going to be the first killer app of NKS.

Status

That it should be possible to build Wolfram|Alpha as it exists today in the first decade of the 21st century was far from obvious. And yet there is much more to come. As of now, Wolfram|Alpha contains 10+ trillion of pieces of data, 50,000+ types of algorithms and models, and linguistic capabilities for 1000+ domains. Built with Mathematica—which is itself the result of more than 20 years of development at Wolfram Research—Wolfram|Alpha's core code base now exceeds 5 million lines of symbolic Mathematica code. Running on supercomputer-class compute clusters, Wolfram|Alpha makes extensive use of the latest generation of web and parallel computing technologies, including webMathematica and gridMathematica.

How Mathematica Made Wolfram|Alpha Possible?

Wolfram|Alpha is a major software engineering development to make all systematic knowledge immediately computable by anyone. It is developed and deployed entirely with Mathematica—in fact, Mathematica has uniquely made Wolfram|Alpha possible. Here's why.

Computational knowledge and intelligence
High-performance enterprise deployment
One coherent architecture
Smart method selection
Dynamic report generation
Database connectivity
Built-in, computable data
High-level programming language
Efficient text processing and linguistic analysis
Wide-ranging, automated visualization capabilities
Automated importing
Development environment

Information Sources

The Secret behind the Computational Engine in Wolfram|Alpha
More info expected after the launch... stay tuned!

Congratulations Stephen!

Click to read more ...

geekr |

12 Comments |

Permalink |

Grid,

alpha,

wolfram

Friday

Apr242009

Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

Friday, April 24, 2009 at 2:09AM

Update 4: Heroku versus GAE & GAE/J

Update 3: Heroku has gone live!. Congratulations to the team. It's difficult right now to get a feeling for the relative cost and reliability of Heroku, but it's an impressive accomplishment and a viable option for people looking for a delivery platform.

Update 2: Heroku Architecture. A great interactive presentation of the Heroku stack. Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache. Then requests are injected into an Erlang based routing mesh that balances requests across a grid of dynos. Dynos are your application "VMs" that implement application specific behaviors. Dynos themselves are a stack of: POSIX, Ruby VM, App Server, Rack, Middleware, Framework, Your App. Applications can access PostgreSQL. Memcached is used as an application caching layer.

Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.

Adam Wiggins of Heroku presented at the lollapalooza that was the Cloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.

For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...

I must confess to having an inherent love of Heroku's idea because I had a similar notion many moons ago, but the trendy language of the time was Perl instead of Rails. At the time though it just didn't make sense. The economics of creating your own "cloud" for such a different model wasn't there. It's amazing the niches utility computing will seed, fertilize, and help grow. Even today when using Eclipse I really wish it was hosted in the cloud and I didn't have to deal with all its deployment headaches. Firefox based interfaces are pretty impressive these days. Why not?

Adam views their stack as:
1. Developer Tools
2. Application Management
3. Cluster Management
4. Elastic Compute Cloud

At the top level developers see a control panel that lets them edit code, deploy code, interact with the database, see logs, and so on. Your website is live from the first moment you start writing code. It's a powerful feeling to write normal code, see it run immediately, and know it will scale without further effort on your part. Now, will you be able toss your Facebook app into the Heroku engine and immediately handle a deluge of 500 million hits a month? It will be interesting to see how far a generic scaling model can go without special tweaking by a certified scaling professional. Elastra has the same sort of issue.

Underneath Heroku makes sure all the software components work together in Lennon-McCartney style harmony. They take care (or will take care of) starting and stopping VMs, deploying to those VMs, billing, load balancing, scaling, storage, upgrades, failover, etc. The dynamic nature of Ruby and the development and deployment infrastructure of Rails is what makes this type of hosting possible. You don't have to worry about builds. There's a great infrastructure for installing packages and plugins. And the big hard one of database upgrades is tackled with the new migrations feature.

A major issue in the Rails world is versioning. Given the precambrian explosion of Rails tools, how does Heroku make sure all the various versions of everything work together? Heroku sees this as their big value add. They are in charge of making sure everything works together. We see a lot companies on the web taking on the role of curator ([1], [2], [3]). A curator is a guardian or an overseer. Of curators Steve Rubel says: They acquire pieces that fit within the tone, direction and - above all - the purpose of the institution. They travel the corners of the world looking for "finds." Then, once located, clean them up and make sure they are presentable and offer the patron a high quality experience. That's the role Heroku will play for their deployable Rails environment.

With great automated power comes great restrictions. And great opportunity. Curating has a cost for developers: flexibility. The database they support is Postgres. Out of luck if you wan't MySQL. Want a different Ruby version or Rails version? Not if they don't support it. Want memcache? You just can't add it yourself. One forum poster wanted, for example, to use the command line version of ImageMagick but was told it wasn't installed and use RMagick instead. Not the end of the world. And this sort of curating has to be done to keep a happy and healthy environment running, but it is something to be aware of.

The upside of curation is stuff will work. And we all know how hard it can be to get stuff to work. When I see an EC2 AMI that already has most of what I need my heart goes pitter patter over the headaches I'll save because someone already did the heavy curation for me. A lot of the value in services like rPath offers, for example, is in curation. rPath helps you build images that work, that can be deployed automatically, and can be easily upgraded. It can take a big load off your shoulders.

There's a lot of competition for Heroku. Mosso has a hosting system that can do much of what Heroku wants to do. It can automatically scale up at the webserver, data, and storage tiers. It supports a variery of frameworks, including Rails. And Mosso also says all you have to do is load and go.

3Tera is another competitor. As one user said: It lets you visually (through a web ui) create "applications" based on "appliances". There is a standard portfolio of prebuilt applications (SugarCRM, etc.) and templates for LAMP, etc. So, we build our application by taking a firewall appliance, a CentOS appliance, a gateway, a MySql appliance, glue them together, customize them, and then create our own template. You can specify down to the appliance level, the amount of cpu, memory, disk, and bandwidth each are assigned which let's you scale up your capacity simply by tweaking values through the UI. We can now deploy our Rails/Java hosted offering for new customers in about 20 minutes on our grid. AppLogic has automatic failover so that if anything goes wrong, it reploys your application to a new node in your grid and restarts it. It's not as cheap as EC2, but much more powerful. True, 3Tera won't help with your application directly, but most of the hard bits are handled.

RightScale is another company that combines curation along with load balancing, scaling, failover, and system management.

What differentiates Heroku is their web based IDE that allows you to focus solely on the application and ignore the details. Though now that they have a command line based interface as well, it's not as clear how they will differentiate themselves from other offerings.

The hosting model has a possible downside if you want to do something other than straight web hosting. Let's say you want your system to insert commercials into podcasts. That sort of large scale batch logic doesn't cleanly fit into the hosting model. A separate service accessed via something like a REST interface needs to be created. Possibly double the work. Mosso suffers from this same concern. But maybe leaving the web front end to Heroku is exactly what you want to do. That would leave you to concentrate on the back end service without worrying about the web tier. That's a good approach too.

Heroku is just getting started so everything isn't in place yet. They've been working on how to scale their own infrastructure. Next is working on scaling user applications beyond starting and stopping mongrels based on load. They aren't doing any vertical scaling of the database yet. They plan on memcaching reads, implementing read-only slaves via Slony, and using the automatic partitioning features built into Postgres 8.3. The idea is to start a little smaller with them now and grow as they grow. By the time you need to scale bigger they should have the infrastructure in place.

One concern is that pricing isn't nailed down yet, but my gut says it will be fair. It's not clear how you will transfer an existing database over, especially from a non-Postgres database. And if you use the web IDE I wonder how you will normal project stuff like continuous integration, upgrades, branching, release tracking, and bug tracking? Certainly a lot of work to do and a lot of details to work out, but I am sure it's nothing they can't handle.

Heroku Rails Podcast

How Digg.com uses the LAMP stack to scale upward

8 Comments |

Permalink |

AWS,

RoR,

cloud

Tuesday

Apr072009

Six Lessons Learned Deploying a Large-scale Infrastructure in Amazon EC2

Tuesday, April 7, 2009 at 3:47AM

Lessons learned from OpenX's large-scale deployment to Amazon EC2:

Expect failures; what's more, embrace them

Fully automate your infrastructure deployments

Design your infrastructure so that it scales horizontally

Establish clear measurable goals

Be prepared to quickly identify and eliminate bottlenecks

Play wack-a-mole for a while, until things get stable

Click to read more ...

General Chicken |

Digg Architecture

Saturday, April 4, 2009 at 2:46AM

Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow). Update 3:: Scaling Digg and Other Web Applications. Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information. Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades. Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month? Site: http://digg.com

Information Sources

How Digg Works by Digg

Digg PHP's Scalability and Performance

Platform

MySQL

Linux

PHP

Lucene

Python

APC PHP Accelerator

MCache

Gearman - job scheduling system

MogileFS - open source distributed filesystem

Apache

Memcached

The Stats

Started in late 2004 with a single Linux server running Apache 1.3, PHP 4, and MySQL. 4.0 using the default MyISAM storage engine

Over 22 million users.

230 million plus page views per month

26 million unique visitors per month

Several billion page views per month

None of the scaling challenges faced had anything to do with PHP. The biggest issues faced were database related.

Dozens of web servers.

Dozens of DB servers.

Six specialized graph database servers to run the Recommendation Engine.

Six to ten machines that serve files from MogileFS.

What's Inside

Specialized load balancer appliances monitor the application servers, handle failover, constantly adjust the cluster according to health, balance incoming requests and caching JavaScript, CSS and images. If you don't have the fancy load balancers take a look at Linux Virtual Server and Squid as a replacement.

Requests are passed to the Application Server cluster. Application servers consist of: Apache+PHP, Memcached, Gearman and other daemons. They are responsible for making coordinating access to different services (DB, MogileFS, etc) and creating the response sent to the browser.

Uses a MySQL master-slave setup. - Four master databases are partitioned by functionality: promotion, profiles, comments, main. Many slave databases hang off each master. - Writes go to the masters and reads go to the slaves. - Transaction-heavy servers use the InnoDB storage engine. - OLAP-heavy servers use the MyISAM storage engine. - They did not notice a performance degradation moving from MySQL 4.1 to version 5. - The schema is denormalized more than "your average database design." - Sharding is used to break the database into several smaller ones.

Digg's usage pattern makes it easier for them to scale. Most people just view the front page and leave. Thus 98% of Digg's database accesses are reads. With this balance of operations they don't have to worry about the complex work of architecting for writes, which makes it a lot easier for them to scale.

They had problems with their storage system telling them writes were on disk when they really weren't. Controllers do this to improve the appearance of their performance. But what it does is leave a giant data integrity whole in failure scenarios. This is really a pretty common problem and can be hard to fix, depending on your hardware setup.

To lighten their database load they used the APC PHP accelerator MCache.

Memcached is used for caching and memcached servers seemed to be spread across their database and application servers. A specialized daemon monitors connections and kills connections that have been open too long.

You can configure PHP not parse and compile on each load using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator. On a page's first load the PHP code is compiles so any subsequent page loads are very fast.

MogileFS, a distributed file system, serves story icons, user icons, and stores copies of each story’s source. A distributed file system spreads and replicates files across a lot of disks which supports fast and scalable file access.

A specialized Recommendation Engine service was built to act as their distributed graph database. Relational databases are not well structured for generating recommendations so a separate service was created. LinkedIn did something similar for their graph.

Lessons Learned

The number of machines isn't as important what the pieces are and how they fit together.

Don't treat the database as a hammer. Recommendations didn't fit will with the relational model so they made a specialized service.

Tune MySQL through your database engine selection. Use InnoDB when you need transactions and MyISAM when you don't. For example, transactional tables on the master can use MyISAM for read-only slaves.

At some point in their growth curve they were unable to grow by adding RAM so had to grow through architecture.

People often complain Digg is slow. This is perhaps due to their large javascript libraries rather than their backend architecture.

One way they scale is by being careful of which application they deploy on their system. They are careful not to release applications which use too much CPU. Clearly Digg has a pretty standard LAMP architecture, but I thought this was an interesting point. Engineers often have a bunch of cool features they want to release, but those features can kill an infrastructure if that infrastructure doesn't grow along with the features. So push back until your system can handle the new features. This goes to capacity planning, something the Flickr emphasizes in their scaling process.

You have to wonder if by limiting new features to match their infrastructure might Digg lose ground to other faster moving social bookmarking services? Perhaps if the infrastructure was more easily scaled they could add features faster which would help them compete better? On the other hand, just adding features because you can doesn't make a lot of sense either.

The data layer is where most scaling and performance problems are to be found and these are language specific. You'll hit them using Java, PHP, Ruby, or insert your favorite language here.

* LinkedIn Architecture * Live Journal Architecture * Flickr Architecture * An Unorthodox Approach to Database Design : The Coming of the Shard

Ebay Architecture

Click to read more ...

45 Comments |

Permalink |