« Blog: MySQL Performance Blog - Everything about MySQL Performance. | Main | Web Analytics: An Hour a Day »
Monday
Jul162007

Paper: Replication Under Scalable Hashing

Replication Under Scalable Hashing: A Family of Algorithms for Scalable
Decentralized Data Distribution

From the abstract:
Typical algorithms for decentralized data distribution
work best in a system that is fully built before it first used;
adding or removing components results in either extensive
reorganization of data or load imbalance in the system.
We have developed a family of decentralized algorithms,
RUSH (Replication Under Scalable Hashing), that
maps replicated objects to a scalable collection of storage
servers or disks. RUSH algorithms distribute objects to
servers according to user-specified server weighting. While
all RUSH variants support addition of servers to the system,
different variants have different characteristics with
respect to lookup time in petabyte-scale systems, performance
with mirroring (as opposed to redundancy codes),
and storage server removal. All RUSH variants redistribute
as few objects as possible when new servers are
added or existing servers are removed, and all variants
guarantee that no two replicas of a particular object are
ever placed on the same server. Because there is no central
directory, clients can compute data locations in parallel,
allowing thousands of clients to access objects on thousands
of servers simultaneously.

Related Articles

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>