High Scalability -

Joan Geoghegan |

1 Comment |

Permalink |

Example,

MySQL

Monday

Jun252012

StubHub Architecture: The Surprising Complexity Behind the World’s Largest Ticket Marketplace

Monday, June 25, 2012 at 9:15AM

StubHub is an interesting architecture to take a look at because, as market makers for tickets, they are in a different business than we normally get to consider.

StubHub is surprisingly large, growing at 20% a year, serving 800K complex pages per hour, selling 5 million tickets per year, and handling 2 million API calls per hour.

And the ticket space is surprisingly rich in complexity. StubHub's traffic is tricky. It's bursty, centering around unpredictable game outcomes, events, schedules, and seasons. There’s a lot of money involved. There are a lot of different actors involved. There are a lot of complex business processes involved. And StubHub has several complementary but very different parts of their business: they have an ad server component serving ads to sites like ESPN, a rich interactive UI, and a real-time ticket market component.

Most interesting to me is how StubHub is bringing into the digital realm the once quintessentially high-touch physical world of tickets, point-of-sale systems, FedEx delivery, buyers and sellers, and money. They are making it happen with deep electronic integration into organizations (like Major League Baseball) and a Lifecycle Bus that moves complex business processes out of the application space.

It's an interesting problem made more complex by having to move forward while dealing with legacy systems built when getting business building features out the door was the priority. Let's see how StubHub makes it all work...

5 Comments |

Permalink |

Example

Wednesday

Jun202012

iDoneThis - Scaling an Email-based App from Scratch

Wednesday, June 20, 2012 at 9:15AM

This is a guest post by Rodrigo Guzman, CTO of iDoneThis, which makes status reporting happen at your company with the lightest possible touch.

iDoneThis is a simple management application that emails your team at the end of every day to ask, "What'd you get done today?" Just reply with a few lines of what you got done. The following morning everyone on your team gets a digest with what the team accomplished the previous day to keep everyone in the loop and kickstart another awesome day.

Before we launched, we built iDoneThis over a weekend in the most rudimentary way possible. I kid you not, we sent the first few batches of daily emails using the BCC field of a Gmail inbox. The upshot is that we’ve had users on the site from Day 3 of its existence on.

We’ve gone from launch in January 2011 when we sent hundreds of emails out per day by hand to sending out over 1 million emails and handling over 200,000 incoming emails per month. In total, customers have recorded over 1.7 million dones.

Stats

5 Comments |

Permalink |

Example

Monday

May282012

The Anatomy of Search Technology: Crawling using Combinators

Monday, May 28, 2012 at 9:15AM

This is the second guest post (part 1, part 3) of a series by Greg Lindahl, CTO of blekko, the spam free search engine. Previously, Greg was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

What's so hard about crawling the web?

Web crawlers have been around as long as the Web has -- and before the web, there were crawlers for gopher and ftp. You would think that 25 years of experience would render crawling a solved problem, but the vast growth of the web and new inventions in the technology of webspam and other unsavory content results in a constant supply of new challenges. The general difficulty of tightly-coupled parallel programming also rears its head, as the web has scaled from millions to 100s of billions of pages.

Existing Open-Source Crawlers and Crawls

2 Comments |

Permalink |

Example

Monday

May212012

Pinterest Architecture Update - 18 Million Visitors, 10x Growth,12 Employees, 410 TB of Data

Monday, May 21, 2012 at 9:15AM

There has been an update on Pinterest: Pinterest growth driven by Amazon cloud scalability since our last post: A Short on the Pinterest Stack for Handling 3+ Million Users.

With Pinterest we see a story very similar to that of Instagram. Huge growth, lots of users, lots of data, with remarkably few employees, all on the cloud.

While it's true that both Pinterest and Instagram are not making great advances in science and technology, that is more indicator of the easy power of today's commodity environments rather than a sign of Silicon Valley's lack of innovation. The numbers are so huge and the valuations are so high we naturally want some sort of fundamental technological revolution to underlie their growth. The revolution is more subtle. It really is just that easy to attain such growth these days, if you can execute on the right idea. Get used to it. This is the new normal.

Here's what Pinterest looks like today:

15 Comments |

Permalink |

Example

Wednesday

Apr252012

The Anatomy of Search Technology: blekko’s NoSQL database

Wednesday, April 25, 2012 at 9:15AM

This is a guest post (part 2, part 3) by Greg Lindahl, CTO of blekko, the spam free search engine that had over 3.5 million unique visitors in March. Greg Lindahl was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

Imagine that you're crazy enough to think about building a search engine. It's a huge task: the minimum index size needed to answer most queries is a few billion webpages. Crawling and indexing a few billion webpages requires a cluster with several petabytes of usable disk -- that's several thousand 1 terabyte disks -- and produces an index that's about 100 terabytes in size.

Serving query results quickly involves having most of the index in RAM or on solid state (flash) disk. If you can buy a server with 100 gigabytes of RAM for about $3,000, that's 1,000 servers at a capital cost of $3 million, plus about $1 million per year of server co-location cost (power/cooling/space.) The SSD alternative requires fewer servers, but serves a lot fewer queries per second, because SSDs are much slower than RAM.

You might think that Amazon's AWS cloud would be a great way to reduce the cost of starting a search engine. It isn't, for 4 main reasons:

11 Comments |

Permalink |

Example

Monday

Apr162012

Instagram Architecture Update: What’s new with Instagram?

Monday, April 16, 2012 at 9:15AM

The fascination over Instagram continues and fortunately we have several new streams of information to feed the insanity. So consider this article an update to The Instagram Architecture Facebook Bought For A Cool Billion Dollars, based primarily on Scaling Instagram, a slide deck for an AirBnB tech talk given by Instagram co-founder, Mike Krieger. Several other information sources, listed at the bottom of the article, were also used.

Unfortunately we just have a slide deck, so the connective tissue of the talk is missing, but it’s still very interesting, in the same spirit of wisdom presentations we often see after developers come up for air after spending significant time spent in the trenches.

If you expect to dive deep into the technological details and find a billion reasons why Instagram was acquired, you will be disappointed. That magic can be found in the emotional investment in the relationship between all of the users and the product, not in the bits about how they bytes are managed.

So what’s new with Instagram?

3 Comments |

Permalink |

Example

Monday

Apr022012

YouPorn - Targeting 200 Million Views a Day and Beyond

Monday, April 2, 2012 at 9:15AM

Erick Pickup, lead developer at YouPorn.com, presented their architecture in a talk titled Building a Website To Scale given at the ConFoo conference. As you might expect, YouPorn is a beast, streaming three full DVDs of video every second, handing 300K queries every second, and generating up to 15GBs of log data per hour.

Unfortunately, all we have are the slides of the talk, so this article isn’t as technical as I might like, there’s no visibility at all on the video handling for example, but we do get some interesting details.

The most interesting takeway is that YouPorn is a pretty conventional LAMP stack, with a NoSQL twist as Redis now replaces MySQL in the live datapath. Reminds me a little of YouTube in its simplicity.

The second most interesting takeaway was the great switchover. Common wisdom says never rewrite, but in 2011 YouPorn rewrote their entire site to use PHP + Redis instead of a complex Perl + MySQL based architecture. And by all accounts the switchover went well. The site is 10% faster and they moved over 6 years of legacy data with no down time.

Read on to learn more about the YouPorn architecture...

6 Comments |

Permalink |

Example

Monday

Mar262012

7 Years of YouTube Scalability Lessons in 30 Minutes

Monday, March 26, 2012 at 9:15AM

If you started out building a dating site and instead ended up building a video sharing site (YouTube) that handles 4 billion views a day, then it’s just possible you learned something along the way. And indeed, Mike Solomon, one of the original engineers at YouTube, did learn a lot and he has given a talk about it at PyCon: Scalability at YouTube.

This isn’t an architecture driven talk where we are led through a description of how a lot of boxes connect to each other. Mike could give that sort of talk. He has worked on building YouTube’s servlet infrastructure, video indexing feature, video transcoding system, their full text search, a CDN, and much more. But instead, he’s taken a step back, took a long look around at what time has wrought, and shared some deep lessons, obviously hard won from experience.

The key takeaway away of the talk for me was doing a lot with really simple tools. While many teams are moving on to more complex ecosystems, YouTube really does keep it simple. They program primarily in Python, use MySQL as their database, they’ve stuck with Apache, and even new features for such a massive site start as a very simple Python program.

That doesn’t mean YouTube doesn’t do cool stuff, they do, but what makes everything work together is more a philosophy or a way of doing things than technological hocus pocus. What made YouTube into one of the world’s largest websites? Read on and see...

14 Comments |

Permalink |

Example,

Strategy

Monday

Mar192012

LinkedIn: Creating a Low Latency Change Data Capture System with Databus

Monday, March 19, 2012 at 9:15AM

This is a guest post by Siddharth Anand, a senior member of LinkedIn's Distributed Data Systems team.

Over the past 3 years, I've had the good fortune to work with many emerging NoSQL products in the context of supporting the needs of a high-traffic, customer facing web site.

In 2010, I helped Netflix to successfully transition its web scale use-cases from Oracle to SimpleDB, AWS' hosted database service. On completion of that migration, we started a second migration, this time from SimpleDB to Cassandra. The first transition was key to our move from our own data center to AWS' cloud. The second was key to our expansion from one AWS Region to multiple geographically-distributed Regions -- today Netflix serves traffic out of two AWS Regions, one in Virginia, the other in Ireland (F1). Both of these transitions have been successful, but have involved integration pain points such as the creation of database replication technology.

In December 2011, I moved to LinkedIn's Distributed Data Systems (DDS) team. DDS develops data infrastructure, including but not limited to, NoSQL databases and data replication systems. LinkedIn, no stranger to building and open-sourcing innovative projects, is doubling down on NoSQL to accelerate its business -- DDS is developing a new NoSQL database called Espresso (R1), a topic for a future post.

Having observed two high-traffic web companies solve similar problems, I cannot help but notice a set of wheel-reinventions. Some of these problems are difficult and it is truly unfortunate for each company to solve its problems separately. At the same time, each company has had to solve these problems due to an absence of a reliable open-source alternative. This clearly has implications for an industry dominated by fast-moving start-ups that cannot build 50-person infrastructure development teams or dedicate months away from building features.

Change Data Capture Systems

10 Comments |

Permalink |