Entries in Perl (9)

Monday
Mar302009

Ebay history and architecture

Ebay[1] Starts in 1995, initial name AuctionWeb (V1) : - very simple architecture - based on perl - no database, for data persistence they used plain files Because of rapid growth they needed to improve their architecture and so V2 (clever name) was born: - replaced perl with C/C++ - started using a database in a master-slave configuration - C++ back-end - XSLT front-end Any request will lead to an XML file being created in C++ and the XLST processor will transform that into html. *pretty sophisticated architecture for the 90s, XLST was cutting-edge back then* ebay v2 That hold out pretty well for a while but in the late 90s ebay experienced an exponential growth. They started having some trouble with outages and needed improvements, so V3 was developed: - based on java - search engine still used C++ - proof that relational databases can scale (aggressive caching) - developed a messaging layer for making a lot of asyncronious calls, they actually ended up being sued because of the delay in which images appear on the site after an item gets posted :-) - although they switched to java the basic principle of generating xml files for each request was still used. ebay v3 Combining the need for a multilingual website with the boom of AJAX technologies and flash applications they started to doubt their XSL system and moved on to what became in 2006 V4 of ebay: - remove everything that could be replaced with java, Ebay loves Java Everything is Java (Code): - Image - Java class - Link - Java class - Javascript - Java classes - Content - Java classes => lot's of code to write, they're using Eclipse for developing. ebay v4 For more details check: Eclipse at Ebay Tailoring Eclipse to the eBay architecture [1] Images and ideas/info are from the links above.

Click to read more ...

Sunday
Feb242008

Yandex Architecture

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:

  • 3.5 billion pages in the search index.
  • Over several thousand servers.
  • 35 million searches a day.
  • Several data centers around Russia.
  • Two-layer architecture.
  • The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
  • Languages used: c++, perl, some java.
  • FreeBSD is used as their server OS.
  • $72 million in revenue in 2006.

    Click to read more ...

  • Tuesday
    Nov132007

    Flickr Architecture

    Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers.

    Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it?

    Site: http://www.flickr.com

    Information Sources

  • Flickr and PHP (an early document)
  • Capacity Planning for LAMP
  • Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall.
  • Building Scalable Web Sites by Cal Henderson from Flickr.
  • Database War Stories #3: Flickr by Tim O'Reilly
  • Cal Henderson's Talks. A lot of useful PowerPoint presentations.

    Platform

    Click to read more ...

  • Monday
    Nov122007

    Slashdot Architecture - How the Old Man of the Internet Learned to Scale

    Slashdot effect: overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org

    Information Sources

  • Slashdot's Setup, Part 1- Hardware
  • Slashdot's Setup, Part 2- Software
  • History of Slashdot Part 3- Going Corporate
  • The History of Slashdot Part 4 - Yesterday, Today, Tomorrow

    The Platform

  • MySQL
  • Linux (CentOS/RHEL)
  • Pound
  • Apache
  • Perl
  • Memcached
  • LVS

    The Stats

  • Started building the system in 1999.
  • 5.5 million user visits per month.
  • 7,000 comments are added every day.
  • Over 9 million pages views daily.
  • Over 21 million comments.
  • Average monthly bandwidth usage is around 40-50 mbit/sec.
  • For the same story Kottke.org found Slashdot delivered 4 times more users than Digg. So Slashdot ain't dead yet.
  • From The History of Slashdot Part 4: On [September 11th] the mainstream news websites buckled under the loads, and although we had to turn off logging, we managed to stay up, sharing news in a time where it was often difficult to get. That was the day where the team of engineers that make this site happen pulled together and did the impossible, forcing our limited little hardware cluster to handle traffic that was probably triple or quadruple a normal day.

    The Hardware Architecture

  • Data center design is similar to all the other SourceForge, Inc. sites and has proven to scale well.
  • Two Active-Active gigabit uplinks.
  • A pair of Cisco 7301s serve as gateway/border routers. Perform some basic filtering. Filtering is tiered to spread the load.
  • Foundry BigIron 8000s act as core switches/routers.
  • Foundry FastIron 9604s are used as switches for some racks.
  • A pair of Rackable System (1Us; P4 Xeon 2.66Gz, 2G RAM, 2x80GB IDE, running CentOS and LVS) serve as load balancing firewalls, distributing traffic to web servers. BIG-IP F5's are being deployed in their new datacenter.
  • All servers are at least RAID 1.
  • 16 web servers: - Running Red Hat 9. - Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives. - Two serve static content: javascript, images and the front page for non logged-in users. - Four serve the front page to logged in users - 10 handle comment pages. - Host roles are changed in response to load. - All NFS mounts are in read-only mode.
  • NFS server is a Rackable 2U with 2 Xeon 2.4Ghz processors, 2GB of RAM, and 4x36GB 15K RPM SCSI drives.
  • 7 database servers: - All run CentOS 4. - 2 in a Master-master configuration: -- Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI -- One master is the write only database. -- One master is the read only database. -- They can failover at any time and switch roles. - 2 reader databases: -- Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drive -- Each syncs from one of the master databases. -- Can add more to scale, but plenty fast enough for now. - 3 miscellaneous databases -- Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives -- Accesslog writer and accesslog reader. Separate databases are used because moderation and stats require a lot of CPU time for computation. -- Search database.

    The Software Architecture

  • Logged in and non-logged in users are treated differently. - Non-logged in user see the same page. This page is a static page that is updated every couple of minutes. - Logged in users have custom options which can't be cached so generating pages for these users take more resources.
  • 6 pound servers (1 for SSL) are used as reverse proxies: - If a request can't be handled it is forwarded on to a web server. - Pound servers are run on the same machines as the web servers. - They are distributed for load balancing and redundancy. - SSL is handled by the pound server so the web server doesn't need to support SSL.
  • 16 apache web servers (version 1.3): - Software is mounted from /usr/local on the read-only NFS server. - The images are kept simple. All that is compiled in is: -- mod_perl -- lingerd to free up RAM during delivery. -- mod_auth_useragent to block bots. - 1 For SSL. - 2 for static (.shtml) requests. - 4 for the dynamic homepage. - 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl). - 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose).
  • Reasons for segregating apache servers to different roles: - Isolate the servers in case there are performance problems or a DDoS attack on a specific page. The rest of the system will function even when one part is failing. - For efficiency reasons like httpd-level caching and MaxClients tuning. The web server can be tuned differently for each role. MaxClients is set to 5-15 for dynamic web servers and 25 for static servers. The bottleneck is CPU, not RAM so if requests aren't process quickly then something's wrong and queuing more requests won't help the CPU process them any faster.
  • Using read-only mounted has contributed to the robustness of the system. Tasks that write to /usr/local, for example, to update index.html every second, run on the NFS server.
  • Use their own SQL API built on top of DBD::mysql and DBI.pm.
  • A huge performance boost was provided by caching users, stories, and comment text using memcached.
  • Most data access is through get and set methods written custom for each data type and through methods that perform one specific update or select.
  • The Multiple-master replication architecture allows keeping the site fully live even during blocking queries like ALTER TABLE.
  • Multi-pass log processing is to detect abuse and picking which users get mod points.
  • The moderation system was created in response to spam. It was just a few friends at first and then a lot of friends. This didn't scale. So the 'mod points' system was introduced so that any user who contributed to the system could moderate the system.
  • Active users are banned to protect from excessive usage from bots.

    Lessons Learned

  • The most creatively satisfying period was when money was tight, the group was small, and everyone was helping everyone else with anything that needed to be done.
  • Don't waste your time optimizing code because you are too cheap to buy more machines. Buy the hardware and spend your time working on features.
  • Sell out to a large corporation and you lose control. There's continual pressure to go to the dark side of creating new products, blending in advertiser supplied content, and serving giant ads.
  • Say no to the forces that want you to become just like everyone else. Though many competitors have come and gone, Slashdot is still around because they: continue to maintain editorial independence, moderate advertising quantity with a clear distinction between advertising and content, and of course, that we continue to select the right stories to appeal to our existing audience... not to spend our time courting other audiences that would only dilute the discussions that bring so many of you here day after day.
  • Segregate servers into different policy domains so you can optimize their configuration.
  • Optimizing usually means caching, caching, caching.
  • Tables not fully, but mostly normalized. This improves performance in most cases.
  • Over the last seven years the process of developing database backed websites has changed: The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.

    Click to read more ...

  • Sunday
    Oct282007

    Scaling Early Stage Startups

    Mark Maunder of No VC Required--who advocates not taking VC money lest you be turned into a frog instead of the prince (or princess) you were dreaming of--has an excellent slide deck on how to scale an early stage startup. His blog also has some good SEO tips and a very spooky widget showing the geographical location of his readers. Perfect for Halloween! What is Mark's other worldly scaling strategies for startups? Site: http://novcrequired.com/

    Information Sources

  • Slides from Seattle Tech Startup Talk.
  • Scaling Early Stage Startups blog post by Mark Maunder.

    The Platform

  • Linxux
  • An ISAM type data store.
  • Perl
  • Httperf is used for benchmarking.
  • Websitepulse.com is used for perf monitoring.

    The Architecture

  • Performance matters because being slow could cost you 20% of your revenue. The UIE guys disagree saying this ain't necessarily so. They explain their reasoning in Usability Tools Podcast: The Truth About Page Download Time. The idea is: "There was still another surprising finding from our study: a strong correlation between perceived download time and whether users successfully completed their tasks on a site. There was, however, no correlation between actual download time and task success, causing us to discard our original hypothesis. It seems that, when people accomplish what they set out to do on a site, they perceive that site to be fast." So it might be a better use of time to improve the front-end rather than the back-end.
  • MySQL was dumped because of performance problems: MySQL didn't handle a high number of writes and deletes on large tables, writes blow away the query cache, large numbers of small tables (over 10,000) are not well supported, uses a lot of memory to cache indexes, maxed out at 200 concurrent read/write queuries per second with over 1 million records.
  • For data storage they evolved to a fixed length ISAM like record scheme that allows seeking directly to the data. Still uses file level locking and its benchmarked at 20,000+ concurrent reads/writes/deletes. Considering moving to BerkelyDB which is a very highly performing and is used by many large websites, especially when you primarily need key-value type lookups. I think it might be interesting to store json if a lot of this data ends up being displayed on the web page.
  • Moved to httpd.prefork for Perl. That with no keepalive on the application servers uses less RAM and works well.

    Lessons Learned

  • Configure your DB and web server correctly. MySQL and Apache's memory usage can easily spiral out of control which leads gridingly slow performance as swapping increases. Here are a few resources for helping with configuration issues.
  • Serve only the users you care about. Block content theives that crawl your site using a lot of valuable resources for nothing. Monitor the number of content pages they fetch per minute. If a threshold is exceeded and then do a reverse lookup on their IP address and configure your firewall to block them.
  • Cache as much DB data and static content as possible. Perl's Cache::FileCache was used to cache DB data and rendered HTML on disk.
  • Use two different host names in URLs to enable browser clients to load images in parallele.
  • Make content as static as possible Create a separate Image and CSS server to serve the static content. Use keepalives on static content as static content uses little memory per thread/process.
  • Leave plenty of spare memory. Spare memory allows Linux to use more memory fore file system caching which increased performance about 20 percent.
  • Turn Keepalive off on your dynamic content. Increasing http requests can exhaust the thread and memory resources needed to serve them.
  • You may not need a complex RDBMS for accessing data. Consider a lighter weight database BerkelyDB.

    Click to read more ...

  • Tuesday
    Sep182007

    Amazon Architecture

    This is a wonderfully informative Amazon update based on Joachim Rohde's discovery of an interview with Amazon's CTO. You'll learn about how Amazon organizes their teams around services, the CAP theorem of building scalable systems, how they deploy software, and a lot more. Many new additions from the ACM Queue article have also been included. Amazon grew from a tiny online bookstore to one of the largest stores on earth. They did it while pioneering new and interesting ways to rate, review, and recommend products. Greg Linden shared is version of Amazon's birth pangs in a series of blog articles Site: http://amazon.com

    Information Sources

  • Early Amazon by Greg Linden
  • How Linux saved Amazon millions
  • Interview Werner Vogels - Amazon's CTO
  • Asynchronous Architectures - a nice summary of Werner Vogels' talk by Chris Loosley
  • Learning from the Amazon technology platform - A Conversation with Werner Vogels
  • Werner Vogels' Weblog - building scalable and robust distributed systems

    Platform

  • Linux
  • Oracle
  • C++
  • Perl
  • Mason
  • Java
  • Jboss
  • Servlets

    The Stats

  • More than 55 million active customer accounts.
  • More than 1 million active retail partners worldwide.
  • Between 100-150 services are accessed to build a page.

    The Architecture

  • What is it that we really mean by scalability? A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.
  • The big architectural change that Amazon made was to move from a two-tier monolith to a fully-distributed, decentralized, services platform serving many different applications.
  • Started as one application talking to a back end. Written in C++.
  • It grew. For years the scaling efforts at Amazon focused on making the back-end databases scale to hold more items, more customers, more orders, and to support multiple international sites. In 2001 it became clear that the front-end application couldn't scale anymore. The databases were split into small parts and around each part and created a services interface that was the only way to access the data.
  • The databases became a shared resource that made it hard to scale-out the overall business. The front-end and back-end processes were restricted in their evolution because they were shared by many different teams and processes.
  • Their architecture is loosely coupled and built around services. A service-oriented architecture gave them the isolation that would allow building many software components rapidly and independently.
  • Grew into hundreds of services and a number of application servers that aggregate the information from the services. The application that renders the Amazon.com Web pages is one such application server. So are the applications that serve the Web-services interface, the customer service application, and the seller interface.
  • Many third party technologies are hard to scale to Amazon size. Especially communication infrastructure technologies. They work well up to a certain scale and then fail. So they are forced to build their own.
  • Not stuck with one particular approach. Some places they use jboss/java, but they use only servlets, not the rest of the J2EE stack.
  • C++ is uses to process requests. Perl/Mason is used to build content.
  • Amazon doesn't like middleware because it tends to be framework and not a tool. If you use a middleware package you get lock-in around the software patterns they have chosen. You'll only be able to use their software. So if you want to use different packages you won't be able to. You're stuck. One event loop for messaging, data persistence, AJAX, etc. Too complex. If middleware was available in smaller components, more as a tool than a framework, they would be more interested.
  • The SOAP web stack seems to want to solve all the same distributed systems problems all over again.
  • Offer both SOAP and REST web services. 30% use SOAP. These tend to be Java and .NET users and use WSDL files to generate remote object interfaces. 70% use REST. These tend to be PHP or PERL users.
  • In either SOAP or REST developers can get an object interface to Amazon. Developers just want to get job done. They don't care what goes over the wire.
  • Amazon wanted to build an open community around their services. Web services were chosed because it's simple. But hat's only on the perimeter. Internally it's a service oriented architecture. You can only access the data via the interface. It's described in WSDL, but they use their own encapsulation and transport mechanisms.
  • Teams are Small and are Organized Around Services - Services are the independent units delivering functionality within Amazon. It's also how Amazon is organized internally in terms of teams. - If you have a new business idea or problem you want to solve you form a team. Limit the team to 8-10 people because communication hard. They are called two pizza teams. The number of people you can feed off two pizzas. - Teams are small. They are assigned authority and empowered to solve a problem as a service in anyway they see fit. - As an example, they created a team to find phrases within a book that are unique to the text. This team built a separate service interface for that feature and they had authority to do what they needed. - Extensive A/B testing is used to integrate a new service . They see what the impact is and take extensive measurements.
  • Deployment - They create special infrastructure for managing dependencies and doing a deployment. - Goal is to have all right services to be deployed on a box. All application code, monitoring, licensing, etc should be on a box. - Everyone has a home grown system to solve these problems. - Output of deployment process is a virtual machine. You can use EC2 to run them.
  • Work From the Customer Backwards to Verify a New Service is Worth Doing - Work from the customer backward. Focus on value you want to deliver for the customer. - Force developers to focus on value delivered to the customer instead of building technology first and then figuring how to use it. - Start with a press release of what features the user will see and work backwards to check that you are building something valuable. - End up with a design that is as minimal as possible. Simplicity is the key if you really want to build large distributed systems.
  • State Management is the Core Problem for Large Scale Systems - Internally they can deliver infinite storage. - Not all that many operations are stateful. Checkout steps are stateful. - Most recent clicked web page service has recommendations based on session IDs. - They keep track of everything anyway so it's not a matter of keeping state. There's little separate state that needs to be kept for a session. The services will already be keeping the information so you just use the services.
  • Eric Brewer's CAP Theorem or the Three properties of Systems - Three properties of a system: consistency, availability, tolerance to network partitions. - You can have at most two of these three properties for any shared-data system. - Partitionability: divide nodes into small groups that can see other groups, but they can't see everyone. - Consistency: write a value and then you read the value you get the same value back. In a partitioned system there are windows where that's not true. - Availability: may not always be able to write or read. The system will say you can't write because it wants to keep the system consistent. - To scale you have to partition, so you are left with choosing either high consistency or high availability for a particular system. You must find the right overlap of availability and consistency. - Choose a specific approach based on the needs of the service. - For the checkout process you always want to honor requests to add items to a shopping cart because it's revenue producing. In this case you choose high availability. Errors are hidden from the customer and sorted out later. - When a customer submits an order you favor consistency because several services--credit card processing, shipping and handling, reporting--are simultaneously accessing the data.

    Lessons Learned

  • You must change your mentality to build really scalable systems. Approach chaos in a probabilistic sense that things will work well. In traditional systems we present a perfect world where nothing goes down and then we build complex algorithms (agreement technologies) on this perfect world. Instead, take it for granted stuff fails, that's reality, embrace it. For example, go more with a fast reboot and fast recover approach. With a decent spread of data and services you might get close to 100%. Create self-healing, self-organizing lights out operations.
  • Create a shared nothing infrastructure. Infrastructure can become a shared resource for development and deployment with the same downsides as shared resources in your logic and data tiers. It can cause locking and blocking and dead lock. A service oriented architecture allows the creation of a parallel and isolated development process that scales feature development to match your growth.
  • Open up you system with APIs and you'll create an ecosystem around your application.
  • Only way to manage as large distributed system is to keep things as simple as possible. Keep things simple by making sure there are no hidden requirements and hidden dependencies in the design. Cut technology to the minimum you need to solve the problem you have. It doesn't help the company to create artificial and unneeded layers of complexity.
  • Organizing around services gives agility. You can do things in parallel is because the output is a service. This allows fast time to market. Create an infrastructure that allows services to be built very fast.
  • There's bound to be problems with anything that produces hype before real implementation
  • Use SLAs internally to manage services.
  • Anyone can very quickly add web services to their product. Just implement one part of your product as a service and start using it.
  • Build your own infrastructure for performance, reliability, and cost control reasons. By building it yourself you never have to say you went down because it was company X's fault. Your software may not be more reliable than others, but you can fix, debug, and deployment much quicker than when working with a 3rd party.
  • Use measurement and objective debate to separate the good from the bad. I've been to several presentations by ex-Amazoners and this is the aspect of Amazon that strikes me as uniquely different and interesting from other companies. Their deep seated ethic is to expose real customers to a choice and see which one works best and to make decisions based on those tests. Avinash Kaushik calls this getting rid of the influence of the HiPPO's, the highest paid people in the room. This is done with techniques like A/B testing and Web Analytics. If you have a question about what you should do code it up, let people use it, and see which alternative gives you the results you want.
  • Create a frugal culture. Amazon used doors for desks, for example.
  • Know what you need. Amazon has a bad experience with an early recommender system that didn't work out: "This wasn't what Amazon needed. Book recommendations at Amazon needed to work from sparse data, just a few ratings or purchases. It needed to be fast. The system needed to scale to massive numbers of customers and a huge catalog. And it needed to enhance discovery, surfacing books from deep in the catalog that readers wouldn't find on their own."
  • People's side projects, the one's they follow because they are interested, are often ones where you get the most value and innovation. Never underestimate the power of wandering where you are most interested.
  • Involve everyone in making dog food. Go out into the warehouse and pack books during the Christmas rush. That's teamwork.
  • Create a staging site where you can run thorough tests before releasing into the wild.
  • A robust, clustered, replicated, distributed file system is perfect for read-only data used by the web servers.
  • Have a way to rollback if an update doesn't work. Write the tools if necessary.
  • Switch to a deep services-based architecture (http://webservices.sys-con.com/read/262024.htm).
  • Look for three things in interviews: enthusiasm, creativity, competence. The single biggest predictor of success at Amazon.com was enthusiasm.
  • Hire a Bob. Someone who knows their stuff, has incredible debugging skills and system knowledge, and most importantly, has the stones to tackle the worst high pressure problems imaginable by just leaping in.
  • Innovation can only come from the bottom. Those closest to the problem are in the best position to solve it. any organization that depends on innovation must embrace chaos. Loyalty and obedience are not your tools.
  • Creativity must flow from everywhere.
  • Everyone must be able to experiment, learn, and iterate. Position, obedience, and tradition should hold no power. For innovation to flourish, measurement must rule.
  • Embrace innovation. In front of the whole company, Jeff Bezos would give an old Nike shoe as "Just do it" award to those who innovated.
  • Don't pay for performance. Give good perks and high pay, but keep it flat. Recognize exceptional work in other ways. Merit pay sounds good but is almost impossible to do fairly in large organizations. Use non-monetary awards, like an old shoe. It's a way of saying thank you, somebody cared.
  • Get big fast. The big guys like Barnes and Nobel are on your tail. Amazon wasn't even the first, second, or even third book store on the web, but their vision and drive won out in the end.
  • In the data center, only 30 percent of the staff time spent on infrastructure issues related to value creation, with the remaining 70 percent devoted to dealing with the "heavy lifting" of hardware procurement, software management, load balancing, maintenance, scalability challenges and so on.
  • Prohibit direct database access by clients. This means you can make you service scale and be more reliable without involving your clients. This is much like Google's ability to independently distribute improvements in their stack to the benefit of all applications.
  • Create a single unified service-access mechanism. This allows for the easy aggregation of services, decentralized request routing, distributed request tracking, and other advanced infrastructure techniques.
  • Making Amazon.com available through a Web services interface to any developer in the world free of charge has also been a major success because it has driven so much innovation that they couldn't have thought of or built on their own.
  • Developers themselves know best which tools make them most productive and which tools are right for the job.
  • Don't impose too many constraints on engineers. Provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, allow teams to function as independently as possible.
  • Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools. Have many support tools that are of a self-help nature. Support an environment around the service development that never gets in the way of the development itself.
  • You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.
  • Developers should spend some time with customer service every two years. Their they'll actually listen to customer service calls, answer customer service e-mails, and really understand the impact of the kinds of things they do as technologists.
  • Use a "voice of the customer," which is a realistic story from a customer about some specific part of your site's experience. This helps managers and engineers connect with the fact that we build these technologies for real people. Customer service statistics are an early indicator if you are doing something wrong, or what the real pain points are for your customers.
  • Infrastructure for Amazon, like for Google, is a huge competitive advantage. They can build very complex applications out of primitive services that are by themselves relatively simple. They can scale their operation independently, maintain unparalleled system availability, and introduce new services quickly without the need for massive reconfiguration.

    Click to read more ...

  • Monday
    Aug202007

    TypePad Architecture

    TypePad is considered the largest paid blogging service in the world. After experience problems because of their meteoric growth, they eventually transitioned to an architecture patterned after their sister company, LiveJournal. Site: http://www.typepad.com/

    The Platform

  • MySQL
  • Memcached
  • Perl
  • MogileFS
  • Apache
  • Linux

    The Stats

  • As of 2005 TypePad sends 250mbps of traffic using multiple network pipes for 3TB of traffic a day. They were growing by 10-20% each month. I was unable to find more recent statistics.

    The Architecture

  • Original Architecture: - Single server running Linux, Apache, Postgres, Perl, mod_perl - Storage was NFS on a filer.
  • A Devastating Crash Caused a New Direction - A RAID controller failed and spewed data across all RAID disks. - The database was corrupted and the backups were corrupted. - Their redundant filers suffered from "split brain" syndrome.
  • They move to LiveJournal Architecture type architecture which isn't surprising since TypePad and LiveJounral are both owned by Six Apart. - Replicated MySQL clusters partitioned by ID. - A global DB generated globally unique sequence numbers and mapped users to partitions. - Other data was mapped by role.
  • Highly Available Database Configuration: - A master-master MySQL replication model is used. - The Linux clustering heartbeat was used to failover using virtual IP addresses.
  • MogileFS is used to serve images.
  • Perlbal is used as reverse proxy and to load balance requests.
  • A reliable, asynchronous job dispatch system called TheSchwartz is used to support moblogging, adding comments, future publishing, cache invalidation, and publishing.
  • Memcached is used to store counts, sets, stats, and heavyweight data.
  • Migration from the old architecture to the new architecture was tricky: - All users were migrated over without service interruption. - Postgres was removed. - During the migration images were served from NFS and MogileFS.
  • Benefits of their new architecture: - Can easily add new machines and adjust workload. - More highly available and is cheaply scalable

    Lessons Learned

  • Small details are important.
  • Every mistake is a learning experience.
  • Success requires coordination and cooperation.

    Related Articles

  • LiveJournal Architecture.
  • Linux High Availability.

    Click to read more ...

  • Wednesday
    Jul112007

    Friendster Architecture

    Friendster is one of the largest social network sites on the web. it emphasizes genuine friendships and the discovery of new people through friends. Site: http://www.friendster.com/

    Information Sources

  • Friendster - Scaling for 1 Billion Queries per day

    Platform

  • MySQL
  • Perl
  • PHP
  • Linux
  • Apache

    What's Inside?

  • Dual x86-64 AMD Opterons with 8 GB of RAM
  • Faster disk (SAN)
  • Optimized indexes
  • Traditional 3-tier architecture with hardware load balancer in front of the databases
  • Clusters based on types: ad, app, photo, monitoring, DNS, gallery search DB, profile DB, user infor DB, IM status cache, message DB, testimonial DB, friend DB, graph servers, gallery search, object cache.

    Lessons Learned

  • No persistent database connections.
  • Removed all sorts.
  • Optimized indexes
  • Don’t go after the biggest problems first
  • Optimize without downtime
  • Split load
  • Moved sorting query types into the application and added LIMITS.
  • Reduced ranges
  • Range on primary key
  • Benchmark -> Make Change -> Benchmark -> Make Change (Cycle of Improvement)
  • Stabilize: always have a plan to rollback
  • Work with a team
  • Assess: Define the issues
  • A key design goal for the new system was to move away from maintaining session state toward a stateless architecture that would clean up after each request
  • Rather than buy big, centralized boxes, [our philosophy] was about buying a lot of thin, cheap boxes. If one fails, you roll over to another box.

    Click to read more ...

  • Tuesday
    Jul102007

    mixi.jp  Architecture

    Mixi is a fast growing social networking site in Japan. They provide services like: diary, community, message, review, and photo album. Having a lot in common with LiveJournal they also developed many of the same approaches. Their write up on how they scaled their system is easily one of the best out there. Site: http://mixi.jp

    Information Sources

  • mixi.jp - scaling out with open source

    Platform

  • Linux
  • Apache
  • MySQL
  • Perl
  • Memcached
  • Squid
  • Shard

    What's Inside?

  • They grew to approximately 4 million users in two years and add over 15,000 new users/day.
  • Ranks 35th on Alexa and 3rd in Japan.
  • More than 100 MySQL servers
  • Add more than 10 servers/month
  • Use non-persistent connections.
  • Diary traffic is 85% read and 15% write.
  • Message traffic is is 75% read and 25% write.
  • Ran into replication performance problems so they had to split the database.
  • Considered splitting vertically by user or splitting horizontally by table type.
  • The ended up partitioning by table type and user. So all the messages for a group of users would be assigned to a particular database. Partitioning key is used to decide in which database data should be stored.
  • For caching they use memcached with 39 machines x 2 GB memory.
  • Stores more than 8 TB of images with about 23 GB added per day.
  • MySQL is only used to store metadata about the images, not the images themselves.
  • Images are either frequently accessed or rarely accessed.
  • Frequently accessed images are cached using Squid on multiple machines.
  • Rarely accessed images are served from the file system. There's no profit in caching them.

    Lessons Learned

  • When using dynamic partitioning it's difficult to pick keys and algorithms for where data should be stored.
  • Once you partition data you can no longer do joins and you have to open a lot of connections to different databases to merge the data back together.
  • It's hard to add new hosts and rearrange data when you partition. For example, let's say your partitioning algorithm stores all the messages for users 1-N on host 1. Now let's say host 1 becomes overburdened and you want to repartition users across more hosts. This is very difficult to do.
  • By using distributed memory caching they rarely hit the DB and there average page load time is about .02 seconds. This reduces the problems associated with partitioning.
  • You will often have to develop strategies based on the type of content. For example, image will be treated differently than short text posts.
  • Social networking sites are very time oriented, so it might be useful to partition data by time as well as user and type.

    Click to read more ...