Entries by HighScalability Team (1576)

Friday
Feb192010

Twitter’s Plan to Analyze 100 Billion Tweets

If Twitter is the “nervous system of the web” as some people think, then what is the brain that makes sense of all those signals (tweets) from the nervous system? That brain is the Twitter Analytics System and Kevin Weil, as Analytics Lead at Twitter, is the homunculus within in charge of figuring out what those over 100 billion tweets (approximately the number of neurons in the human brain) mean.

Twitter has only 10% of the expected 100 billion tweets now, but a good brain always plans ahead. Kevin gave a talk, Hadoop and Protocol Buffers at Twitter, at the Hadoop Meetup, explaining how Twitter plans to use all that data to an answer key business questions.

What type of questions is Twitter interested in answering? Questions that help them better understand Twitter. Questions like:

Click to read more ...

Tuesday
Feb162010

Seven Signs You May Need a NoSQL Database

While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database...

Click to read more ...

Monday
Feb152010

Scaling Ambition at StackOverflow

Joel Spolsky and Jeff Atwood are raising VC money for StackOverflow. This is interesting for three reasons: 1) Joel has always seemed like a keep it small and grow organically type of guy, so this is a big step in a different direction. 2) It means they think there's a very big market in the Q&A space and they mean to capture as much as the market as possible. 3) Most importantly for this blog, Joel gives some good advice on when to stay fresh and local and when it's time to jump for the brass ring, scale up your ambition, and go for VC money. Please see Joel's blog post for the details, but here's when to go VC:

Click to read more ...

Monday
Feb152010

The Amazing Collective Compute Power of the Ambient Cloud

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

Earlier we talked about how a single botnet could harness more compute power than our largest super computers. Well, that's just the start of it. The amount of computer power available to the Ambient Cloud will be truly astounding.

2 Billion Personal Computers

Click to read more ...

Friday
Feb122010

Hot Scalability Links for February 12, 2010

  1. My Life With Hbase by Lars George. The hardscabble tale of Hbase's growth from infancy to maturity. A very good introduction and overview of Hbase.
  2. NoSQL Alternatives -- Common Principles and Patterns for Building Scalable Applications. Explore the common principles behind the major NOSQL alternatives and how they compared with traditional database approach in terms of consistency, transaction and query semantics. We will also explore how we can make the transition between the two models smoothers through the support of standard interfaces such as JPA.
  3. Moore’s Law: The Future of Cloud Computing from the Bottom Up. Will Intel's 48 mega core chip change the world or be just another Spruce Goose?
  4. Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters. It's much cheaper to own when you have a large relatively fixed size cluster and can find really cheap labor to maintain it all.
  5. A cloud in a plug - brilliant. A tiny, low-power, low-cost home server and NAS device powered by Tonido software that allows you to access your apps, files, music and media from anywhere.
  6. Seeking A Database That Doesn't Suck by Pixy Misa. Quick recap of databases that suck - or at least, suck for my purposes - and some that I'm still investigating.

Click to read more ...

Monday
Feb082010

How FarmVille Scales to Harvest 75 Million Players a Month

Several readers had follow-up questions in response to this article. Luke's responses can be found in How FarmVille Scales - The Follow-up.

If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn. I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time.

How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say...

Click to read more ...

Thursday
Feb042010

Hot Scalability Links for February 4, 2010

Lots of cool stuff happening this week...

  1. Voldemort gets rebalancing. It's one thing to shard data to scale, it's a completely different level of functionality to manage those shards intelligently. Voldemort has stepped up by adding advanced rebalancing functionality: Dynamic addition of new nodes to the cluster; Deletion of nodes from cluster; Load balancing of data inside a cluster.
  2. Microsoft Finally Opens Azure for Business. Out of the blue Microsoft opens up their platform as a service service. Good to have more competition and we'll keep an eye out for experience reports.
  3. New details on LinkedIn architecture by Greg Linden. LinkedIn appears to only use caching minimally, preferring to spend their efforts and machine resources on making sure they can recompute computations quickly than on hiding poor performance behind caching layers.
  4. The end of SQL and relational databases?  by David Intersimone. For new projects, I believe, we have genuine non-relational alternatives on the table (pun intended).
  5. HipHop for PHP: Move Fast. When you make millions of widgets saving pennies per widget quickly adds up to real money. Facebook released HipHop, a PHP compiler, aimed at shaving off cycle of CPU and bytes of memory in production of their social widgets. 

Click to read more ...

Wednesday
Feb032010

NoSQL Means Never Having to Store Blobs Again

Morgan Tocker has an awesome article and comment thread in the MySQL Performance Blog about When should you store serialized objects in the database? Before the NoSQL age is was very common to simulate schemalessness by storing blobs in MySQL. Sharding was implemented by running multiple MySQL instances and spreading writes across them. While not ideal for the purpose, developers felt comfortable with MySQL. They knew how to install it, back it up, replicate it, in short:  they knew how to make it work. Yet they also needed to store objects without the penalty of joins. Searches and aggregate queries were handled by indexes kept in separate tables, this offloaded the fast path to objects.

This all made perfect sense. Usually we just want stuff to work and going with what you know is often the best path to that goal. And what we have known is MySQL. All the different pros and cons of this approach are covered wonderfully in the post.

But the world has changed.

Click to read more ...

Monday
Feb012010

What Will Kill the Cloud?

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

If datacenters are the new castles, then what will be the new gunpowder? As soon as gunpowder came on the scene, castles, which are defensive structures, quickly became the future's cold, drafty hotels. Gunpowder fueled cannon balls make short work of castle walls.

There's a long history of "gunpowder" type inventions in the tech industry. PCs took out the timeshare model. The cloud is taking out the PC model. There must be something that will take out the cloud.

Right now it's hard to believe the cloud will one day be no more. They seem so much the future, but something will transcend the cloud.

Click to read more ...

Wednesday
Jan272010

Hot Scalability Links for January 28 2010

  1. Google's Research Areas of Interest: Building scalable, robust cluster applications. At Google we see distributed systems as a technology in its infancy, with huge gaps in the supporting research  that represent some of the most important problems in the space. Here are some examples: Resource sharing, Balancing cost, performance, and reliability, Self-maintaining systems. 
  2. Amazon SimpleDB: A Simple Way to Store Complex Data by Paul Tremblett. The most effective way I have found to understand SimpleDB is to think about it in terms of something else we all use and understand -- a spreadsheet.
  3. Rackspace Cloud Servers versus Amazon EC2: Performance Analysis. The Bitsource conducted a review of the two cloud computing platforms, Rackspace Cloud Servers and Amazon Elastic Compute Cloud (EC2), to get a general idea of overall system performance.
  4. Private Clouds Are Not The Future by Jame Hamilton. Private clouds are better than nothing but an investment in a private cloud is an investment in a temporary fix that will only slow the path to the final destination: shared clouds.
  5. What is the right way to measure scale? by Daniel Abadi. So which scales better? Is using the number of nodes a better proxy than size of data? Hadoop can “scale” to 3800 nodes. So far, all we know is that Greenplum can “scale” to 96 nodes. Can it handle more nodes?

Click to read more ...