Entries by HighScalability Team (1576)

Friday

Jun182010

Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic

Friday, June 18, 2010 at 7:17AM

The Declarative Imperative: Experiences and Conjectures in Distributed Logic is written by UC Berkeley's Joseph Hellerstein for a keynote speech he gave at PODS. The video version of the talk is here. You may have heard about Mr. Hellerstein through the Berkeley Orders Of Magnitude project (BOOM), whose purpose is to help people build systems that are OOM (orders of magnitude) bigger than are building today, with OOM less effort than traditional programming methodologies. A noble goal which may be why BOOM was rated as a top 10 emerging technology for 2010 by MIT Technology Review. Quite an honor.

The motivation for the talk is a familiar one: it's a dark period for computer programming and if we don't learn how to write parallel programs the children of Moore's law will destroy us all. We have more and more processors, yet we are stuck on figuring out how the average programmer can exploit them. The BOOM solution is the Bloom language which is based on Dedalus:

Click to read more ...

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

BigData,

Paper,

boom

Wednesday

Jun162010

Hot Scalability Links for June 16, 2010

Wednesday, June 16, 2010 at 7:58AM

You're Doing it Wrong by Poul-Henning Kamp. Don't look so guilty, he's not talking about you know what, he's talking about writing high-performance server programs: Not just wrong as in not perfect, but wrong as in wasting half, or more, of your performance. What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.
A Microsoft Windows Azure primer: the basics by Peter Bright. Nice article explaining the basics of Azure and how it compares to Google and Amazon.
A call to change the name from NoSQL to Postmodern Databases. Interesting idea, but the problem is the same one I have for Postmodern Art, when is it? I always feel like I'm in the post-post modern period, yet for art it's really in the early 1900s. Let's save future developers from this existential time crisis.
Constructions from Dots and Lines by Marko A. Rodriguez, Peter Neubauer. Delightful yet in-depth explanation of the complex world of graph data structures. To make use of the graphs beyond simply representing their explicit structure, graph traversal frameworks and algorithms have been developed in order to shape graphs by driving the evolution of the entities that they model—e.g. humans and their relationships to one another and the objects of their world

Click to read more ...

HighScalability Team |

Paper: Propagation Networks: A Flexible and Expressive Substrate for Computation

Wednesday, June 9, 2010 at 6:46AM

Alexey Radul in his fascinating 174 page dissertation Propagation Networks: A Flexible and Expressive Substrate for Computation, offers to help us break free of the tyranny of linear time by arranging computation as a network of autonomous but interconnected machines. We can do this by organizing computation as a network of interconnected machines of some kind, each of which is free to run when it pleases, propagating information around the network as proves possible. The consequence of this freedom is that the structure of the aggregate does not impose an order of time. The abstract from his thesis is:

Click to read more ...

HighScalability Team |

3 Comments |

Permalink |

Print Article

Email Article

BigData,

Paper

Tuesday

Jun082010

Twitter has a big hairy audacious goal of reaching one billion users by 2013. Three forces stand against Twitter. The world will end in 2012. But let's be optimistic and assume we'll make it. Next is Facebook. Currently Facebook is the user leader with over 400 million users. Will Facebook stumble or will they rocket to one billion users before Twitter? And lastly, there's Twitter's "low" starting point and "slow" growth rate. Twitter currently has 106 million registered users and adds about 300,000 new users a day. That doesn't add up to a billion in three years. Twitter needs to triple the number of registered users they add per day. How will Twitter reach its goal of over one billion users served?

Click to read more ...

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

Friday

Jun042010

Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator

Friday, June 4, 2010 at 8:14AM

Isn't the secret to fast, scalable websites to cache everything? Caching, if not the secret sauce of many a website, is it at least a popular condiment. But not so fast says Peter Zaitsev in Beyond great cache hit ratio. The point Peter makes is that we read about websites like Amazon and Facebook that can literally make hundreds of calls to satisfy a user request. Even if you have an awesome cache hit ratio, pages can still be slow because making and processing all those requests takes time. The solution is to remove requests all together. You do this by caching larger blocks so you have to make fewer requests.

The post has a lot of good advice worth reading: 1) Make non cacheable blocks as small as possible, 2) Maximize amount of uses of the cache item, 3) Control invalidation, 4) Multi-Get.

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Strategy

Thursday

Jun032010

Hot Scalability Links for June 3, 2010

Thursday, June 3, 2010 at 7:57AM

How Big is a Yottabyte? Not so big that the NSA can't hope to store it says CrunchGear: There are a thousand gigabytes in a terabyte, a thousand terabytes in a petabyte, a thousand petabytes in an exabyte, a thousand exabytes in a zettabyte, and a thousand zettabytes in a yottabyte. In other words, a yottabyte is 1,000,000,000,000,000GB.
The CMS data aggregation system. The Large Hadron Collider project is using MongoDB as a cache. Here we discuss a new data aggregation system which consumes, indexes and delivers information from different relational and non-relational data sources to answer cross data-service queries and explore meta-data associated with petabytes of experimental data.
Google I/O 2010 Videos are up available (many of them anyway). You might be particularly interested in Google Storage for Developers, Building high-throughput data pipelines with Google App Engine, Batch data processing with App Engine, BigQuery and Prediction APIs, Measure in milliseconds redux: Meet Speed Tracer
Scale at Facebook by Director of Engineering, Aditya Agarwal. You can't scale Facebook using traditional horizontal partitioning. People make friends across many networks. Every new user can potentially access any other user. There's no way to cut the data to effectively partition the data such that access is within that particular partition.

Click to read more ...

HighScalability Team |

Get Your High Scalability Fix at Digg

Interested in working on cutting-edge high-scale infrastructure at Digg? We're making a big investment in scaling and have committed to the NoSQL (Not only SQL) path with Cassandra. We're using other open-source infrastructure to help us scale including Hadoop, RabbitMQ, Zookeeper, Thrift, HDFS and Lucene. We're rewriting Digg from the ground up and we need amazing developers to join our world-class team. If you think you are up for the challenge, or you know someone who might be, take a look at our jobs page for more information.

HighScalability Team |

Strategy: Rule of 3 Admins to Save Your Sanity

Tuesday, May 25, 2010 at 7:47AM

The idea came up in this Hacker News thread, commenting on a 37signals interview, that having three system administrators is the minimum optimal number of admins. Everyone wants to lower their costs by having each admin administer a lot of machines. The problem is when you have fewer than three admins you can never get a break from the constant corrosive pressure of always being on call. When every moment of your life you are dreading the next emergency, it eats at you. Having three admins solves that problem. With three admins you can:

Go on a real vacation. The two remaining admins can switch off being on call.
Not be on call all the time.

A larger shop will naturally have more admins so it's not as big an issue, but at smaller shops trying to minimize head count, carrying three admins (or people in those roles) might be something to consider.

HighScalability Team |

6 Comments |

Permalink |

Print Article

Email Article

Strategy

Thursday

May202010

Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

Thursday, May 20, 2010 at 6:43AM

In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

MySQL,

Strategy