High Scalability -

Entries in Windows (3)

Friday

Jun262009

PlentyOfFish Architecture

Friday, June 26, 2009 at 3:58PM

Update 5: PlentyOfFish Update - 6 Billion Pageviews And 32 Billion Images A Month
Update 4: Jeff Atwood costs out Markus' scale up approach against a scale out approach and finds scale up wanting. The discussion in the comments is as interesting as the article. My guess is Markus doesn't want to rewrite his software to work across a scale out cluster so even if it's more expensive scale up works better for his needs.
Update 3: POF now has 200 million images and serves 10,000 images served per second. They'll be moving to a 250,000 IOPS RamSan to handle the load. Also upgraded to a core database machine with 512 GB of RAM, 32 CPU’s, SQLServer 2008 and Windows 2008.
Update 2: This seems to be a POF Peer1 love fest infomercial. It's pretty content free, but the production values are high. Lots of quirky sounds and fish swimming on the screen.
Update: by Facebook standards Read/WriteWeb says POF is worth a cool one billion dollars. It helps to talk like Dr. Evil when saying it out loud.

PlentyOfFish is a hugely popular on-line dating system slammed by over 45 million visitors a month and 30+ million hits a day (500 - 600 pages per second). But that's not the most interesting part of the story. All this is handled by one person, using a handful of servers, working a few hours a day, while making $6 million a year from Google ads. Jealous? I know I am. How are all these love connections made using so few resources?

Site: http://www.plentyoffish.com/

Information Sources

Channel9 Interview with Markus Frind

Blog of Markus Frind

Plentyoffish: 1-Man Company May Be Worth $1Billion

The Platform

Microsoft Windows

ASP.NET

IIS

Akamai CDN

Foundry ServerIron Load Balancer

The Stats

PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.

POF has one single employee: the founder and CEO Markus Frind.

Makes up to $10 million a year on Google ads working only two hours a day.

30+ Million Hits a Day (500 - 600 pages per second).

1.1 billion page views and 45 million visitors a month.

Has 5-10 times the click through rate of Facebook.

A top 30 site in the US based on Competes Attention metric, top 10 in Canada and top 30 in the UK.

2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.

3 DB servers. No data on their configuration.

Approaching 64,000 simultaneous connections and 2 million page views per hour.

Internet connection is a 1Gbps line of which 200Mbps is used.

1 TB/day serving 171 million images through Akamai.

6TB storage array to handle millions of full sized images being uploaded every month to the site.

What's Inside

Revenue model has been to use Google ads. Match.com, in comparison, generates $300 million a year, primarily from subscriptions. POF's revenue model is about to change so it can capture more revenue from all those users. The plan is to hire more employees, hire sales people, and sell ads directly instead of relying solely on AdSense.

With 30 million page views a day you can make good money on advertising, even a 5 - 10 cents a CPM.

Akamai is used to serve 100 million plus image requests a day. If you have 8 images and each takes 100 msecs you are talking a second load just for the images. So distributing the images makes sense.

10’s of millions of image requests are served directly from their servers, but the majority of these images are less than 2KB and are mostly cached in RAM.

Everything is dynamic. Nothing is static.

All outbound Data is Gzipped at a cost of only 30% CPU usage. This implies a lot of processing power on those servers, but it really cuts bandwidth usage.

No caching functionality in ASP.NET is used. It is not used because as soon as the data is put in the cache it's already expired.

No built in components from ASP are used. Everything is written from scratch. Nothing is more complex than a simple if then and for loops. Keep it simple.

Load balancing
- IIS arbitrarily limits the total connections to 64,000 so a load balancer was added to handle the large number of simultaneous connections. Adding a second IP address and then using a round robin DNS was considered, but the load balancer was considered more redundant and allowed easier swap in of more web servers. And using ServerIron allowed advanced functionality like bot blocking and load balancing based on passed on cookies, session data, and IP data.
- The Windows Network Load Balancing (NLB) feature was not used because it doesn't do sticky sessions. A way around this would be to store session state in a database or in a shared file system.
- 8-12 NLB servers can be put in a farm and there can be an unlimited number of farms. A DNS round-robin scheme can be used between farms. Such an architecture has been used to enable 70 front end web servers to support over 300,000 concurrent users.
- NLB has an affinity option so a user always maps to a certain server, thus no external storage is used for session state and if the server fails the user loses their state and must relogin. If this state includes a shopping cart or other important data, this solution may be poor, but for a dating site it seems reasonable.
- It was thought that the cost of storing and fetching session data in software was too expensive. Hardware load balancing is simpler. Just map users to specific servers and if a server fails have the user log in again.
- The cost of a ServerIron was cheaper and simpler than using NLB. Many major sites use them for TCP connection pooling, automated bot detection, etc. ServerIron can do a lot more than load balancing and these features are attractive for the cost.

Has a big problem picking an ad server. Ad server firms want several hundred thousand a year plus they want multi-year contracts.

In the process of getting rid of ASP.NET repeaters and instead uses the append string thing or response.write. If you are doing over a million page views a day just write out the code to spit it out to the screen.

Most of the build out costs went towards a SAN. Redundancy at any cost.

Growth was through word of mouth. Went nuts in Canada, spread to UK, Australia, and then to the US.

Database
- One database is the main database.
- Two databases are for search. Load balanced between search servers based on the type of search performed.
- Monitors performance using task manager. When spikes show up he investigates. Problems were usually blocking in the database. It's always database issues. Rarely any problems in .net. Because POF doesn't use the .net library it's relatively easy to track down performance problems. When you are using many layers of frameworks finding out where problems are hiding is frustrating and hard.
- If you call the database 20 times per page view you are screwed no matter what you do.
- Separate database reads from writes. If you don't have a lot of RAM and you do reads and writes you get paging involved which can hang your system for seconds.
- Try and make a read only database if you can.
- Denormalize data. If you have to fetch stuff from 20 different tables try and make one table that is just used for reading.
- One day it will work, but when your database doubles in size it won't work anymore.
- If you only do one thing in a system it will do it really really well. Just do writes and that's good. Just do reads and that's good. Mix them up and it messes things up. You run into locking and blocking issues.
- If you are maxing the CPU you've either done something wrong or it's really really optimized. If you can fit the database in RAM do it.

The development process is: come up with an idea. Throw it up within 24 hours. It kind of half works. See what user response is by looking at what they actually do on the site. Do messages per user increase? Do session times increase? If people don't like it then take it down.

System failures are rare and short lived. Biggest issues are DNS issues where some ISP says POF doesn't exist anymore. But because the site is free, people accept a little down time. People often don't notice sites down because they think it's their problem.

Going from one million to 12 million users was a big jump. He could scale to 60 million users with two web servers.

Will often look at competitors for ideas for new features.

Will consider something like S3 when it becomes geographically load balanced.

Lessons Learned

You don't need millions in funding, a sprawling infrastructure, and a building full of employees to create a world class website that handles a torrent of users while making good money. All you need is an idea that appeals to a lot of people, a site that takes off by word of mouth, and the experience and vision to build a site without falling into the typical traps of the trade. That's all you need :-)

Necessity is the mother of all change.

When you grow quickly, but not too quickly you have a chance grow, modify, and adapt.

RAM solves all problems. After that it's just growing using bigger machines.

When starting out keep everything as simple as possible. Nearly everyone gives this same advice and Markus makes a noticeable point of saying everything he does is just obvious common sense. But clearly what is simple isn't merely common sense. Creating simple things is the result of years of practical experience.

Keep database access fast and you have no issues.

A big reason POF can get away with so few people and so little equipment is they use a CDN for serving large heavily used content. Using a CDN may be the secret sauce in a lot of large websites. Markus thinks there isn't a single site in the top 100 that doesn’t use a CDN. Without a CDN he thinks load time in Australia would go to 3 or 4 seconds because of all the images.

Advertising on Facebook yielded poor results. With 2000 clicks only 1 signed up. With a CTR of 0.04% Facebook gets 0.4 clicks per 1000 ad impressions, or .4 clicks per CPM. At 5 cent/CPM = 12.5 cents a click, 50 cent/CPM = $1.25 a click. $1.00/CPM = $2.50 a click. $15.00/CPM = $37.50 a click.

It's easy to sell a few million page views at high CPM’s. It's a LOT harder to sell billions of page views at high CPM’s, as shown by Myspace and Facebook.

The ad-supported model limits your revenues. You have to go to a paid model to grow larger. To generate 100 million a year as a free site is virtually impossible as you need too big a market.

Growing page views via Facebook for a dating site won't work. Having a visitor on you site is much more profitable. Most of Facebook's page views are outside the US and you have to split 5 cent CPM’s with Facebook.

Co-req is a potential large source of income. This is where you offer in your site's sign up to send the user more information about mortgages are some other product.

You can't always listen to user responses. Some users will always love new features and others will hate it. Only a fraction will complain. Instead, look at what features people are actually using by watching your site.

MySpace also uses Windows to run their site.

Markus Frind's posts on Webmaster World.

And the Money Comes Rolling In by Max Chafkin

How I started A Dating Empire by Markus Frind

Thanks to Erik Osterman for recommending profiling PlentyOfFish.

Todd Hoff |

67 Comments |

Permalink |

Print Article

Email Article

.Net,

Example,

Windows

Thursday

Feb122009

MySpace Architecture

Thursday, February 12, 2009 at 1:28AM

Update:Presentation: Behind the Scenes at MySpace.com. Dan Farino, Chief Systems Architect at MySpace shares details of some of MySpace's cool internal operations tools. MySpace.com is one of the fastest growing site on the Internet with 65 million subscribers and 260,000 new users registering each day. Often criticized for poor performance, MySpace has had to tackle scalability issues few other sites have faced. How did they do it? Site: http://myspace.com

Information Sources

Presentation: Behind the Scenes at MySpace.com

Inside MySpace.com

Platform

ASP.NET 2.0

Windows

IIS

SQL Server

What's Inside?

300 million users.

Pushes 100 gigabits/second to the internet. 10Gb/sec is HTML content.

4,500+ web servers windows 2003/IIS 6.0/APS.NET.

1,200+ cache servers running 64-bit Windows 2003. 16GB of objects cached in RAM.

500+ database servers running 64-bit Windows and SQL Server 2005.

MySpace processes 1.5 Billion page views per day and handles 2.3 million concurrent users during the day

Membership Milestones: - 500,000 Users: A Simple Architecture Stumbles - 1 Million Users:Vertical Partitioning Solves Scalability Woes - 3 Million Users: Scale-Out Wins Over Scale-Up - 9 Million Users: Site Migrates to ASP.NET, Adds Virtual Storage - 26 Million Users: MySpace Embraces 64-Bit Technology

500,000 accounts was too much load for two web servers and a single database.

At 1-2 Million Accounts - They used a database architecture built around the concept of vertical partitioning, with separate databases for parts of the website that served different functions such as the log-in screen, user profiles and blogs. - The vertical partitioning scheme helped divide up the workload for database reads and writes alike, and when users demanded a new feature, MySpace would put a new database online to support it. - MySpace switched from using storage devices directly attached to its database servers to a storage area network (SAN), in which a pool of disk storage devices are tied together by a high-speed, specialized network, and the databases connect to the SAN. The change to a SAN boosted performance, uptime and reliability.

At 3 Million Accounts - the vertical partitioning solution didn't last because they replicated some horizontal information like user accounts across all vertical slices. With so many replications one would fail and slow down the system. - individual applications like blogs on sub-sections of the Web site would grow too large for a single database server - Reorganized all the core data to be logically organized into one database - split its user base into chunks of 1 million accounts and put all the data keyed to those accounts in a separate instance of SQL Server

9 Million–17 Million Accounts - Moved to ASP.NET which used less resources than their previous architecture. 150 servers running the new code were able to do the same work that had previously required 246. - Saw storage bottlenecks again. Implementing a SAN had solved some early performance problems, but now the Web site's demands were starting to periodically overwhelm the SAN's I/O capacity—the speed with which it could read and write data to and from disk storage. - Hit limits with the 1 million-accounts-per-database division approach as these limits were exceeded. - Moved to a virtualized storage architecture where the entire SAN is treated as one big pool of storage capacity, without requiring that specific disks be dedicated to serving specific applications. MySpace now standardized on equipment from a relatively new SAN vendor, 3PARdata

Added a caching tier—a layer of servers placed between the Web servers and the database servers whose sole job was to capture copies of frequently accessed data objects in memory and serve them to the Web application without the need for a database lookup.

26 Million Accounts - Moved to 64-bit SQL server to work around their memory bottleneck issues. Their standard database server configuration uses 64 GB of RAM.

Horizontally Federated Database. Databases are partition by purpose. Have profile, email databases etc. Partition is based on user range. 1 Million users live in each database. So you have Profile1, Profile2 all the way up to Profile300 as they have 300 million users.

Doesn't use ASP cache because they don't have a high enough hit rate on the front-end. The middle tier cache does have a high hit rate.

Failure isolation. Segment requests into web server by database. Allow only 7 threads per database. So if the database is slow only those threads will slowdown and the traffic in the other threads will flow.

Operations

PerfCollector. Centralized collection of performance data via UDP. More reliable than Windows and allows any client to connect and see stats.

Web Based Stack Dump Tool. Can right-click on a problem server and get stack dump of the .Net managed threads. Used to have to RDC into system and attach a debugger and 1/2 later get an answer. Slow, nonscalable, and tedious. Not just a stack dump, gives a lot of context about what the thread is doing. Troubleshooting is easier because you can see 90 threads are blocked on a database so the database may be down.

Web Base Heap Dump Tool. Dumps all memory allocations. Very useful for developers. Save hours of doing it by hand.

Profiler. Traces a request from start to finish and produces a report. See URL, methods, status, everything that will help you identify a slow request. Looks at lock contentions, are a lot of exceptions being thrown, anything that might be interesting. Very light weight. It's running on one box in every VIP (group of 100 servers) in production. Samples 1 thread every 10 seconds. Always tracing in background.

Powershell. Microsoft's new shell that runs in process and pass objects between commands versus parsing text output. MySpace develops a lot of commandlets to support operations.

Developed their own asynchronous communication technology to get around windows networking problems and treat servers as a group. Can ship a .cs file, compile it, run it, and ship the response back.

Codespew. Pushes code updates on their communication technology. Used to do 5 code pushes a day, now down to 1 a week.

Lessons Learned

You can build big websites using Microsoft tech.

A cache should have been used from the beginning.

The cache is a better place to store transitory data that doesn't need to be recorded in a database, such as temporary files created to track a particular user's session on the Web site.

Built in OS features to detect denial of service attacks can cause inexplicable failures.

Distribute your data to geographically diverse data centers to handle power failures.

Consider using virtualized storage/clustered file systems from the start. It allows you to massively parallelize IO access while being able to add disk as needed without any reorganization needed.

Develop tools that work in a production environment. Can't simulate everything in test environment. The scale and variety of uses APIs are put to can't be simulated in QA during testing. Legitimate users and hackers will run into corner cases that weren't hit in testing, though QA will find most of the problems.

Throw hardware at problems. Easier than changing their backend software to a new way of doing things. The example is they add a new database server for every million users. It might be more efficient to change their approach to more efficiently use the database hardware, but it's easier just to add servers. For now.

Click to read more ...

Todd Hoff |

17 Comments |

Permalink |

Print Article

Email Article

.Net,

Example,

Windows,

operations

Tuesday

May272008

eBay Architecture

Tuesday, May 27, 2008 at 8:13AM

Update 2: EBay's Randy Shoup spills the secrets of how to service hundreds of millions of users and over two billion page views a day in Scalability Best Practices: Lessons from eBay on InfoQ. The practices: Partition by Function, Split Horizontally, Avoid Distributed Transactions, Decouple Functions Asynchronously, Move Processing To Asynchronous Flows, Virtualize At All Levels, Cache Appropriately. Update: eBay Serves 5 Billion API Calls Each Month. Aren't we seeing more and more traffic driven by mashups composed on top of open APIs? APIs are no longer a bolt on, they are your application. Architecturally that argues for implementing your own application around the same APIs developers and users employ. Who hasn't wondered how eBay does their business? As one of the largest most loaded websites in the world, it can't be easy. And the subtitle of the presentation hints at how creating such a monster system requires true engineering: Striking a balance between site stability, feature velocity, performance, and cost. You may not be able to emulate how eBay scales their system, but the issues and possible solutions are worth learning from. Site: http://ebay.com

Information Sources

The eBay Architecture - Striking a balance between site stability, feature velocity, performance, and cost.

Podcast: eBay’s Transactions on a Massive Scale

Dan Pritchett on Architecture at eBay interview by InfoQ

Platform

Java

Oracle

WebSphere, servlets

Horizontal Scaling

Sharding

Mix of Windows and Unix

What's Inside?

This information was adapted from Johannes Ernst's Blog

The Stats

On an average day, it runs through 26 billion SQL queries and keeps tabs on 100 million items available for purchase.

212 million registered users, 1 billion photos

1 billion page views a day, 105 million listings, 2 petabytes of data, 3 billion API calls a month

Something like a factor of 35 in page views, e-mails sent, bandwidth from June 1999 to Q3/2006.

99.94% availability, measured as "all parts of site functional to everybody" vs. at least one part of a site not functional to some users somewhere

The database is virtualized and spans 600 production instances residing in more than 100 server clusters.

15,000 application servers, all J2EE. About 100 groups of functionality aka "apps". Notion of a "pool": "all the machines that deal with selling"...

The Architecture

Everything is planned with the question "what if load increases by 10x". Scaling only horizontal, not vertical: many parallel boxes.

Architectures is strictly divided into layers: data tier, application tier, search, operations,

Leverages MSXML framework for presentation layer (even in Java)

Oracle databases, WebSphere Java (still 1.3.1)

Split databases by primary access path, modulo on a key.

Every database has at least 3 on-line databases. Distributed over 8 data centers

Some database copies run 15 min behind, 4 hours behind

Databases are segmented by function: user, item account, feedback, transaction, over 70 in all.

No stored procedures are used. There are some very simple triggers.

Move cpu-intensive work moved out of the database layer to applications applications layer: referential integrity, joins, sorting done in the application layer! Reasoning: app servers are cheap, databases are the bottleneck.

No client-side transactions. no distributed transactions

J2EE: use servlets, JDBC, connection pools (with rewrite). Not much else.

No state information in application tier. Transient state maintained in cookie or scratch database.

App servers do not talk to each other -- strict layering of architecture

Search, in 2002: 9 hours to update the index running on largest Sun box available -- not keeping up.

Average item on site changes its search data 5 times before it is sold (e.g. price), so real-time search results are extremely important.

"Voyager": real-time feeder infrastructure built by eBay.. Uses reliable multicast from primary database to search nodes, in-memory search index, horizontal segmentation, N slices, load-balances over M instances, cache queries.

Lessons Learned

Scale Out, Not Up – Horizontal scaling at every tier. – Functional decomposition.

Prefer Asynchronous Integration – Minimize availability coupling. – Improve scaling options.

Virtualize Components – Reduce physical dependencies. – Improve deployment flexibility.

Design for Failure – Automated failure detection and notification. – “Limp mode” operation of business features.

Move work out of the database into the applications because the database is the bottleneck. Ebay does this in the extreme. We see it in other architecture using caching and the file system, but eBay even does a lot of traditional database operations in applications (like joins).

Use what you like and toss what you don't need. Ebay didn't feel compelled to use full blown J2EE stack. They liked Java and Servlets so that's all they used. You don't have to buy into any framework completely. Just use what works for you.

Don't be afraid to build solutions that meet and evolve with your needs. Every off the shelf solution will fail you at some point. You have to go the rest of the way on your own.

Operational controls become a larger and larger part of scalability as you grow. How do you upgrade, configure, and monitor thousands of machines will running a live system?

Architectures evolve. You need to be able to change, refine, and develop your new system while keeping your existing site running. That's the primary challenge of any growing website.

It's a mistake to worry too much about scalability from the start. Don't suffer from paralysis by analysis and worrying about traffic that may never come.

It's also a mistake not to worry about scalability at all. You need to develop an organization capable of dealing with architecture evolution. Understand you are never done. Your system will always evolve and change. Build those expectations and capabilities into your business from the start. Don't let people and organizations be why your site fails. Many people will think the system should be perfect from the start. It doesn't work that way. A good system is developed overtime in response to real issues and concerns. Expect change and adapt to change.

Click to read more ...

Todd Hoff |

14 Comments |

Permalink |

Print Article

Email Article

Example,

Java,

Oracle,

Sun,

Windows