Cheap storage: how backblaze takes matters in hand

Blackblaze blogs about how they built their own storage infrastructure on the cheap to run their cloud backup service. This episode: the hardware.
Sorry, just a link this time.
Blackblaze blogs about how they built their own storage infrastructure on the cheap to run their cloud backup service. This episode: the hardware.
Sorry, just a link this time.
DataDirect Networks (www.ddn.com) is searching for beta testers for our exciting new object-based clustered storage system. Does this sound like you? * Need to store millions to hundreds of billions of files * Want to use one big file system but can't because no single file system scales big enough * Running out of inodes * Have to constantly tweak file systems to perform better * Need to replicate content to more than one data center across geographies * Have thumbnail images or other small files that wreak havoc on your file and storage systems * Constantly tweaking and engineering around performance and scalability limits * No storage system delivers enough IOPS to serve your content * Spend time load balancing the storage environment * Want a single, simple way to manage all this data If this sounds like you, please contact me at jgoldstein@ddn.com. DataDirect Networks is a 10-year old, well-established storage systems company specializing in Extreme Storage environments. We've deployed both the largest and the fastest storage/file systems on the planet - currently running at over 250GB/s. Our upcoming product is going to change the way storage is deployed for scalable web content and we're seeking testers who can throw their most challenging problems at our new system. It's time for something better and we're going to deliver it.
This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You. Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV.
This is a question everyone must struggle with when building out their datacenter. Storage choices are always the ones I have the least confidence in. David Marks in his blog You Can Change It Later! asks the question Should I get a SAN to scale my site architecture? and answers no. A better solution is to use commodity hardware, directly attach storage on servers, and partition across servers to scale and for greater availability. David's reasoning is interesting:
Much of the focus of high performance computing (HPC) has centered on CPU performance. However, as computing requirements grow, HPC clusters are demanding higher rates of aggregate data throughput. Today's clusters feature larger numbers of nodes with increased compute speeds. The higher clock rates and operations per clock cycle create increased demand for local data on each node. In addition, InfiniBand and other high-speed, low-latency interconnects increase the data throughput available to each node. Traditional shared file systems such as NFS have not been able to scale to meet this growing demand for data throughput on HPC clusters. Scalable cluster file systems that can provide parallel data access to hundreds of nodes and petabytes of storage are needed to provide the high data throughput required by large HPC applications, including manufacturing, electronic design, and research. This paper describes an implementation of the Sun Lustre file system as a scalable storage cluster using Sun Fire servers, high-speed/low-latency InfiniBand interconnects, and additional networking and storage devices. Furthermore, this paper explores the use of the Sun Lustre file system at a shared government and education research site, including configuration information and details on testing that was performed on-site to evaluate the performance of Sun's scalable storage solution.
When designing data storage solutions for High Performance Computing (HPC) environments, IT architects strive to balance complex and often conflicting requirements. The need to manage a skyrocketing amount of data, along with the goals of controlling cost and immediate data availability, can make it difficult to meet HPC application demands within the constraints of today's IT budgets. To help customers address an almost bewildering set of architectural challenges, Sun has developed the Sun Storage and Archive Solution for HPC, a reference architecture that can be easily customized to meet specific application goals and business requirements. This article is intended for IT managers and storage architects familiar with HPC applications and data requirements in the organization. It assumes that the audience has a technical background and some familiarity with issues surrounding the task of configuring systems and storage.
How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network. The goal of Wua.la is to provide distributed online storage that is:
SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300. His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash may not be the dream solution it has been thought to be. StorageMojo talks about this in Flash vs disk at DISKCON 2007.