High Scalability -

Entries in PHP (14)

Wednesday

Aug222007

Wikimedia architecture

Wednesday, August 22, 2007 at 9:56AM

Wikimedia is the platform on which Wikipedia, Wiktionary, and the other seven wiki dwarfs are built on. This document is just excellent for the student trying to scale the heights of giant websites. It is full of details and innovative ideas that have been proven on some of the most used websites on the internet. Site: http://wikimedia.org/

Information Sources

Wikimedia architecture

http://meta.wikimedia.org/wiki/Wikimedia_servers

scale-out vs scale-up in the from Oracle to MySQL blog.

Platform

Apache

Linux

MySQL

PHP

Squid

LVS

Lucene for Search

Memcached for Distributed Object Cache

Lighttpd Image Server

The Stats

8 million articles spread over hundreds of language projects (english, dutch, ...)

10th busiest site in the world (source: Alexa)

Exponential growth: doubling every 4-6 months in terms of visitors / traffic / servers

30 000 HTTP requests/s during peak-time

3 Gbit/s of data traffic

3 data centers: Tampa, Amsterdam, Seoul

350 servers, ranging between 1x P4 to 2x Xeon Quad-Core, 0.5 - 16 GB of memory

managed by ~ 6 people

3 clusters on 3 different continents

The Architecture

Geographic Load Balancing, based on source IP of client resolver, directs clients to the nearest server cluster. Statically mapping IP addresses to countries to clusters

HTTP reverse proxy caching implemented using Squid, grouped by text for wiki content and media for images and large static files.

55 Squid servers currently, plus 20 waiting for setup.

1,000 HTTP requests/s per server, up to 2,500 under stress

~ 100 - 250 Mbit/s per server

~ 14 000 - 32 000 open connections per server

Up to 40 GB of disk caches per Squid server

Up to 4 disks per server (1U rack servers)

8 GB of memory, half of that used by Squid

Hit rates: 85% for Text, 98% for Media, since the use of CARP.

PowerDNS provides geographical distribution.

In their primary and regional data center they build text and media clusters built on LVS, CARP Squid, Cache Squid. In the primary datacenter they have the media storage.

To make sure the latest revision of all pages are served invalidation requests are sent to all Squid caches.

One centrally managed & synchronized software installation for hundreds of wikis.

MediaWiki scales well with multiple CPUs, so we buy dual quad-core servers now (8 CPU cores per box)

Hardware shared with External Storage and Memcached tasks

Memcached is used to cache image metadata, parser data, differences, users and sessions, and revision text. Metadata, such as article revision history, article relations (links, categories etc.), user accounts and settings are stored in the core databases

Actual revision text is stored as blobs in External storage

Static (uploaded) files, such as images, are stored separately on the image server - metadata (size, type, etc.) is cached in the core database and object caches

Separate database per wiki (not separate server!)

One master, many replicated slaves

Read operations are load balanced over the slaves, write operations go to the master

The master is used for some read operations in case the slaves are not yet up to date (lagged)

External Storage - Article text is stored on separate data storage clusters, simple append-only blob storage. Saves space on expensive and busy core databases for largely unused data - Allows use of spare resources on application servers (2x 250-500 GB per server) - Currently replicated clusters of 3 MySQL hosts are used; this might change in the future for better manageability

Lessons Learned

Focus on architecture, not so much on operations or nontechnical stuff.

Sometimes caching costs more than recalculating or looking up at the data source...profiling!

Avoid expensive algorithms, database queries, etc.

Cache every result that is expensive and has temporal locality of reference.

Focus on the hot spots in the code (profiling!).

Scale by separating: - Read and write operations (master/slave) - Expensive operations from cheap and more frequent operations (query groups) - Big, popular wikis from smaller wikis

Improve caching: temporal and spatial locality of reference and reduces the data set size per server

Text is compressed and only revisions between articles are stored.

Simple seeming library calls like using stat to check for a file's existence can take too long when loaded.

Disk seek I/O limited, the more disk spindles, the better!

Scale-out using commodity hardware doesn't require using cheap hardware. Wikipedia's database servers these days are 16GB dual or quad core boxes with 6 15,000 RPM SCSI drives in a RAID 0 setup. That happens to be the sweet spot for the working set and load balancing setup they have. They would use smaller/cheaper systems if it made sense, but 16GB is right for the working set size and that drives the rest of the spec to match the demands of a system with that much RAM. Similarly the web servers are currently 8 core boxes because that happens to work well for load balancing and gives good PHP throughput with relatively easy load balancing.

It is a lot of work to scale out, more if you didn't design it in originally. Wikipedia's MediaWiki was originally written for a single master database server. Then slave support was added. Then partitioning by language/project was added. The designs from that time have stood the test well, though with much more refining to address new bottlenecks.

Anyone who wants to design their database architecture so that it'll allow them to inexpensively grow from one box rank nothing to the top ten or hundred sites on the net should start out by designing it to handle slightly out of date data from replication slaves, know how to load balance to slaves for all read queries and if at all possible to design it so that chunks of data (batches of users, accounts, whatever) can go on different servers. You can do this from day one using virtualisation, proving the architecture when you're small. It's a LOT easier than doing it while load is doubling every few months!

Click to read more ...

Todd Hoff |

13 Comments |

Permalink |

Print Article

Email Article

Apache,

Example,

Geo-distributed Clusters,

LVS,

Linux,

Lucene,

MySQL,

PHP,

Squid

Thursday

Jul262007

Product: eAccelerator a PHP Accelerator

Thursday, July 26, 2007 at 5:58AM

eAccelerator is a free open-source PHP accelerator, optimizer, and dynamic content cache. It increases the performance of PHP scripts by caching them in their compiled state, so that the overhead of compiling is almost completely eliminated. It also optimizes scripts to speed up their execution. eAccelerator typically reduces server load and increases the speed of your PHP code by 1-10 times.

Click to read more ...

Todd Hoff |

ThemBid Architecture

Thursday, July 26, 2007 at 5:27AM

ThemBid provides a market where people needing work done broadcast their request and accept bids from people competing for the job. Unlike many of the sites profiled at HighScalability, ThemBid is not in the popular press as often as Paris Hilton. It's not a media darling or a giant of the industry. But what I like is they have a strategy, a point-of-view for building websites and were gracious enough to share very detailed instructions on how to go about building a website. They even delve into actual installation details of the various software packages they use. Anyone can benefit by taking a look at their work. Site: http://www.thembid.com/

Information Sources

Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd

Platform

Linux (Ubuntu)

Symfony

Lighttpd

PHP

eAccelerator

Eclipse

Munin

AWStats

What's Inside?

The Stats

Started work in December of 2006 and had a full demo by March 2007.

One developer/sys admin worked with a part-time graphics designer.

Targeted a few thousand users after launch.

The Architecture

Hardware. Dual core server with 2GB RAM

Storage. 2 x 36SCSI 10K RPM on RAID1.

Data Center. They went with with Layeredtech for the managed server because of past positive experiences.

Development Environment. Ubuntu and Eclipse.

OS. They chose the server distribution of Ubuntu because that's what they use on the client side and Ubuntu supports "simpler installation and easier maintenance than typical IT deployments."

Web Server. Lighttpd is used to handle static content and forward the dynamic PHP page requests to FastCGI.

Database. MySQL. When growth is necessary the idea is to move to a master-slave arrangement and them maybe MySQL cluster.

Web Framework. Went with PHP because they knew it and other successful sites like Digg and Yahoo successfully deploy PHP. They chose Symfony as there framework because of its nice documentation and active development community. And Yahoo also uses Symfony. It's a decision that has worked well for them.

PHP Cache. eAccelerator is used to compile and cache PHP scripts.

Object and Content Cache. The plan is to cache a lot of content. For a bid site like theirs this makes sense. Many of the pieces are used over and over again so putting them in memory will speed up the entire system and take pressure off the database and the IO system. Initially the used a SQLite cache on top of of a memory based file system. This choice was because it was supported by Symfony. When a memcached plugin is available they'll try that.

Client Side Cache. Lighttp's mod_expire module is used to prevent Javascript, style sheets, and images that rarely change from being uncessarily redownloaded by the browser.

Monitoring. Munin is used to monitor their resource usage. It's as simple as visiting "yoursite.com/status" to see what's going on.

Log Analysis. AWStats is used to track hits and types of requests. This information can be used to target bottlenecks.

Scalability Plan. - Use Munin to tell when to think about upgrading. When your growth trend will soon cross your resources trend, it's time to do something. - Move MySQL to a separate server. This frees up resources (CPU, disk, memory). What you want to run on this server depend on its capabilities. Maybe run a memcached server on it. - Move to a distributed memory cache using memcached. - Add a MySQL master/slave configuration. - If more webservers are needed us LVS on the front end as a load balancer.

Future Directions. Work on fault tolerance.

Lessons Learned

It's possible to create a nice site fairly quickly with just a few people using commonly available low cost tools. And your system will be solid and powerful. No cut corners.

Use feedback from your system to know what needs optimizing and when it's time to scale.

Good documentation and an active community draw people. These are very attractive qualities for people making decisions about what to use. It's hard to go with a tool chain when it looks like you may get stuck in the future with no way out and no help. If you make tools make them dead easy to understand, learn, use, and deploy.

Stick with the familiar. It may not be optimal, it may not be the best, but it's more important that you get started and make progress. You don't want to delay releasing your site so you can learn a completely different tool chain that may make your life somewhat easier and in some projected future. The future is now.

Use what works for other people. The fact that Yahoo and Digg use PHP is a good recommendation. Certainly PHP is not the only way to build a site, but it does cut your risk level and help you sleep at night. It also means there's an active community that can help you when you have problems.

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Linux,

MySQL,

PHP,

Symfony,

lighttpd

Wednesday

Jul112007

Friendster Architecture

Wednesday, July 11, 2007 at 3:18PM

Friendster is one of the largest social network sites on the web. it emphasizes genuine friendships and the discovery of new people through friends. Site: http://www.friendster.com/

Information Sources

Friendster - Scaling for 1 Billion Queries per day

Platform

MySQL

Perl

PHP

Linux

Apache

What's Inside?

Dual x86-64 AMD Opterons with 8 GB of RAM

Faster disk (SAN)

Optimized indexes

Traditional 3-tier architecture with hardware load balancer in front of the databases

Clusters based on types: ad, app, photo, monitoring, DNS, gallery search DB, profile DB, user infor DB, IM status cache, message DB, testimonial DB, friend DB, graph servers, gallery search, object cache.

Lessons Learned

No persistent database connections.

Removed all sorts.

Optimized indexes

Don’t go after the biggest problems first

Optimize without downtime

Split load

Moved sorting query types into the application and added LIMITS.

Reduced ranges

Range on primary key

Benchmark -> Make Change -> Benchmark -> Make Change (Cycle of Improvement)

Stabilize: always have a plan to rollback

Work with a team

Assess: Define the issues

A key design goal for the new system was to move away from maintaining session state toward a stateless architecture that would clean up after each request

Rather than buy big, centralized boxes, [our philosophy] was about buying a lot of thin, cheap boxes. If one fails, you roll over to another box.

Click to read more ...

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Example,

Linux,

MySQL,

PHP,

Perl