High Scalability -

DIY,

storage

Thursday

Feb052009

Beta testers wanted for ultra high-scalability/performance clustered object storage system designed for web content delivery

Thursday, February 5, 2009 at 3:05AM

DataDirect Networks (www.ddn.com) is searching for beta testers for our exciting new object-based clustered storage system. Does this sound like you? * Need to store millions to hundreds of billions of files * Want to use one big file system but can't because no single file system scales big enough * Running out of inodes * Have to constantly tweak file systems to perform better * Need to replicate content to more than one data center across geographies * Have thumbnail images or other small files that wreak havoc on your file and storage systems * Constantly tweaking and engineering around performance and scalability limits * No storage system delivers enough IOPS to serve your content * Spend time load balancing the storage environment * Want a single, simple way to manage all this data If this sounds like you, please contact me at jgoldstein@ddn.com. DataDirect Networks is a 10-year old, well-established storage systems company specializing in Extreme Storage environments. We've deployed both the largest and the fastest storage/file systems on the planet - currently running at over 250GB/s. Our upcoming product is going to change the way storage is deployed for scalable web content and we're seeking testers who can throw their most challenging problems at our new system. It's time for something better and we're going to deliver it.

jshgldstn |

1 Comment |

Permalink |

CDN,

Cluster File System,

General Discussion,

cluster,

cluster storage system,

storage

Tuesday

Dec092008

Rules of Thumb in Data Engineering

Tuesday, December 9, 2008 at 9:12PM

This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You. Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV.

geekr |

1 Comment |

Permalink |

cache,

storage

Sunday

Oct262008

Should you use a SAN to scale your architecture?

Sunday, October 26, 2008 at 1:52AM

This is a question everyone must struggle with when building out their datacenter. Storage choices are always the ones I have the least confidence in. David Marks in his blog You Can Change It Later! asks the question Should I get a SAN to scale my site architecture? and answers no. A better solution is to use commodity hardware, directly attach storage on servers, and partition across servers to scale and for greater availability. David's reasoning is interesting:

A SAN creates a SPOF (single point of failure) that is dependent on a vendor to fly and fix when there's a problem. This can lead to long down times during this outage you have no access to your data at all.

Using easily available commodity hardware minimizes risks to your company, it's not just about saving money. Zooming over to Fry's to buy emergency equipment provides the kind of agility startups need in order to respond quickly to ever changing situations. It's hard to beat the power and flexibility (backups, easy to add storage, mirroring, etc) of a good SAN, but Mark makes a good case.

Todd Hoff |

8 Comments |

Permalink |

Strategy,

storage

Tuesday

Oct142008

Implementing the Lustre File System with Sun Storage: High Performance Storage for High Performance Computing

Tuesday, October 14, 2008 at 11:08AM

Much of the focus of high performance computing (HPC) has centered on CPU performance. However, as computing requirements grow, HPC clusters are demanding higher rates of aggregate data throughput. Today's clusters feature larger numbers of nodes with increased compute speeds. The higher clock rates and operations per clock cycle create increased demand for local data on each node. In addition, InfiniBand and other high-speed, low-latency interconnects increase the data throughput available to each node. Traditional shared file systems such as NFS have not been able to scale to meet this growing demand for data throughput on HPC clusters. Scalable cluster file systems that can provide parallel data access to hundreds of nodes and petabytes of storage are needed to provide the high data throughput required by large HPC applications, including manufacturing, electronic design, and research. This paper describes an implementation of the Sun Lustre file system as a scalable storage cluster using Sun Fire servers, high-speed/low-latency InfiniBand interconnects, and additional networking and storage devices. Furthermore, this paper explores the use of the Sun Lustre file system at a shared government and education research site, including configuration information and details on testing that was performed on-site to evaluate the performance of Sun's scalable storage solution.

peterb_2008 |

Sun Storage and Archive Solution for HPC

Tuesday, October 14, 2008 at 11:04AM

When designing data storage solutions for High Performance Computing (HPC) environments, IT architects strive to balance complex and often conflicting requirements. The need to manage a skyrocketing amount of data, along with the goals of controlling cost and immediate data availability, can make it difficult to meet HPC application demands within the constraints of today's IT budgets. To help customers address an almost bewildering set of architectural challenges, Sun has developed the Sun Storage and Archive Solution for HPC, a reference architecture that can be easily customized to meet specific application goals and business requirements. This article is intended for IT managers and storage architects familiar with HPC applications and data requirements in the organization. It assumes that the audience has a technical background and some familiarity with issues surrounding the task of configuring systems and storage.

peterb_2008 |

Wuala - P2P Online Storage Cloud

Sunday, August 17, 2008 at 9:39PM

How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network. The goal of Wua.la is to provide distributed online storage that is:

large
scalable
reliable
secure

by harnessing the idle resources of participating computers. This challenge is an old dream of computer science. In fact as Andrew Tanenbaum wrote in 1995: "The design of a world-wide, fully transparent distributed filesystem fot simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader" After three years of research and development at at ETH Zurich, the Swiss Federal Institute of Technology on a distributed storage system, Caleido is ready to unveil the result: Wuala. Wuala is a new way of storing, sharing, and publishing files on the internet. It enables its users to trade parts of their local storage for online storage and it allows us to provide a better service for free. In this Google Tech Talk, Dominik will explain what Wuala is and how it works, and he will also show a demo. The availability problem is solved by redundancy (just like in Google File System). However simple replication techniques would result in too much overhead because of the low availability of the nodes. Instead Wuala employs erasure coding and splits the data into small pieces. Optimal erasure codes produce n/r fragments where any n fragments is sufficient to recover the original message. These pieces are then distributed in the P2P network providing good availability at a reasonable overhead. The P2P network consists of client, storage and routing nodes. The Wuala architecture uses a mix of regular and random graphs to optimize routing. Dominik also explains how Wuala architecture is designed to provide security and fairness. Wuala employs the 128 bit AES algorithm for encryption and the 2048 bit RSA algorithm for authentication. If you're interested in how Wuala manages encryption, have a look at their publication on Cryptree. They have also implemented distributed reputation audit and maintenance functions. Check out the Tech Talk! It is worth the time!

geekr |

1 Comment |

Permalink |

P2P,

cloud,

high availability,

security,

storage

Monday

Oct012007

SmugMug Found their Perfect Storage Array

Monday, October 1, 2007 at 2:42PM

SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300. His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash may not be the dream solution it has been thought to be. StorageMojo talks about this in Flash vs disk at DISKCON 2007.

Todd Hoff |

3 Comments |

Permalink |