« Sponsored Post: Torbit, Infragistics, Velocity, Reality Check Network, Gigaspaces, AiCache, Logic Monitor, Attribution Modeling, New Relic, AppDynamics, CloudSigma, ManageEnine, Site24x7 | Main | Stuff The Internet Says On Scalability For May 18, 2012 »
Monday
May212012

Pinterest Architecture Update - 18 Million Visitors, 10x Growth,12 Employees, 410 TB of Data

There has been an update on Pinterest: Pinterest growth driven by Amazon cloud scalability since our last post: A Short on the Pinterest Stack for Handling 3+ Million Users.

With Pinterest we see a story very similar to that of Instagram. Huge growth, lots of users, lots of data, with remarkably few employees, all on the cloud.

While it's true that both Pinterest and Instagram are not making great advances in science and technology, that is more indicator of the easy power of today's commodity environments rather than a sign of Silicon Valley's lack of innovation. The numbers are so huge and the valuations are so high we naturally want some sort of fundamental technological revolution to underlie their growth. The revolution is more subtle. It really is just that easy to attain such growth these days, if you can execute on the right idea. Get used to it. This is the new normal.

Here's what Pinterest looks like today: 

  • 80 million objects stored in S3 with 410 terabytes of user data, 10x what they had in August. EC2 instances have grown by 3x.  Around $39K fo S3 and $30K for EC2.
  • 12 employees as of last December. Using the cloud a site can grow dramatically while maintaining a very small team. Looks like 31 employees as of now.
  • Pay for what you use saves money. Most traffic happens in the afternoons and evenings, so they reduce the number of instances at night by 40%. At peak traffic  $52 an hour is spent on EC2 and at night, during off peak, the spend is as little as $15 an hour.
  • 150 EC2 instances in the web tier
  • 90 instances for in-memory caching, which removes database load
  • 35 instances used for internal purposes
  • 70 master databases with a parallel set of backup databases in different regions around the world for redundancy
  • Written in Python and Django 
  • Sharding is used, a database is split when it reaches 50% of capacity, allows easy growth and gives sufficient IO capacity
  • ELB is used to load balance across instances. The ELB API makes it easy to move instances in and out of production.
  • One of the fastest growing sites in history. Cites AWS for making it possible to handle 18 million visitors in March, a 50% increase from the previous month, with very little IT infrastructure.
  • The cloud supports easy and low cost experimenation. New services can be tested without buying new servers, no big up front costs.
  • Hadoop-based Elastic Map Reduce is used for data analysis and costs only a few hundred dollars a month.

Related Articles

 

Reader Comments (15)

What is the database used for sharding? 70 Masters seems to be higher. Are they in VM? What is the cost involved in keeping the 400+tb data?

May 21, 2012 | Unregistered CommenterMani

"Sites AWS for making it possible"

*Cites

May 21, 2012 | Unregistered CommenterAndyT

I assume you meant *cites instead of "Sites AWS for making" on the 3rd to last bullet.

Always interesting to see this stuff, 410TB is a huge amount of S3 data!

May 21, 2012 | Unregistered CommenterJustin Steele

So, $31K+ / month for storage (and that's reduced redundancy), ~$30K / month for EC2 computation?

May 21, 2012 | Unregistered CommenterChris Brown

So their Amazon bill is roughly $60K/month for EC2 instances and 410TB of storage (not even trying to guess bandwidth) and they have at least another $100K in payroll. They have as far as I can see $0 in revenue. What exactly is the model here?

May 21, 2012 | Unregistered CommenterJoe

What is the latest thought on EC2 having pretty bad I/O performance? Does adding more instances make up for this cost-effectively? Thanks.

May 21, 2012 | Unregistered Commenterbasis

To improve I/O performance, create multiple volumes and stripe them with software RAID.

May 21, 2012 | Unregistered Commenterdcolon

Is the ~$69K for storage, per month or per year?

May 22, 2012 | Unregistered CommenterChatz

@Joe: Pinterest has experimented with affiliate links in the past, so they've taken in some negligible revenue, but they seem to still be actively figuring it out.

May 22, 2012 | Unregistered CommenterLuigi Montanez

> Here's what Pinterst looks like today:

** Pinterest

May 22, 2012 | Unregistered CommenterA

Our private cloud team took a look at the above configuration and prices. Based on our experience, private cloud infrastructure would provide a ROI in approximately 20 months. When moving from Amazon to private cloud, we also typically see a 3 to 1 consolidation of instances after migration due to performance increases in I/O and compute power.

May 22, 2012 | Unregistered CommenterRedapt

So you split up the db it hits 50% of the current server. What do you mean by that? Memory or?

May 24, 2012 | Unregistered CommenterThierry

90% of that 410TB likely infringes copyrights. They have absolutely no right to copy fullsize photos to their servers.

June 1, 2012 | Unregistered CommenterPaul

Are 18M unique visitors?

September 18, 2012 | Unregistered CommenterRBAL

They have 152 employes now, What a growth!

July 28, 2013 | Unregistered CommenterCesar

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>