« Strategy: Send XHR Request on Lost Focus Instead of For Every Character | Main | How Scalable are Single Page Ajax Apps? »
Thursday
Oct182007

another approach to replication

File replication based on erasure codes can reduce total replicas size 2 times and more.

Reader Comments (4)

The thing I thought that was really clever about the Google File System paper was that they didn't use erasure coding for replication. Copy-based replication is simpler and you can read any replica without having to know about any other replica. With storage being so cheap these days, I think you'd really have to think about why you were using an erasure code-based replication over a simple copy-based one.

People should look more at erasure coding for bulk transfer and distribution of data, though. There it could be really awesome, although I think there are some patents out on the "rateless" variants of such.

December 31, 1999 | Unregistered CommenterToby DiPasquale

I guess GFS already chooses which replica is better, near . So every time it knows about all replicas for a certain file. The major question here i think is if Google pushes as many harddrives into one box as possible , if yes then the method might help to reduce count of boxes, clusters, power usage,replication bandwith , if not then bottleneck is somewhere else i.e. bandwidth or computing power and it is unlikely suitable.

btw. I did not find any real implementation of ereasure codes replications , just some pdfs about how it might be useful

December 31, 1999 | Unregistered CommenterAnonymous

Parity is an erasure code; thus, all RAID is based on erasure code replication.

December 31, 1999 | Unregistered CommenterToby DiPasquale

I am not sure about the technology behind it (erasure codes, schmerasure codes!), but a year or so ago I met with the folks at Cleversafe who have both a commercial and open-source offering. Check it out here:

http://www.cleversafe.org/dispersed-storage

According to their CTO, they take the original data and split it up into 11 slices, each slice about 10% of the original data. For retrieval, it is sufficient to have 6 of the 11 slices accessible (i.e., 5 can be down). An added security benefit is that the slices are prepared in a way that none of the slices, if captured separately, carries any recognizable data.

Regards,
-- Peter
www.3tera.com

December 31, 1999 | Unregistered CommenterPeterNic

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>