Wednesday
Jun222011
It's the Fraking IOPS - 1 SSD is 44,000 IOPS, Hard Drive is 180

Planning your next buildout and thinking SSDs are still far in the future? Still too expensive, too low density. Hard disks are cheap, familiar, and store lots of stuff. In this short and entertaining video Wikia's Artur Bergman wants to change your mind about SSDs. SSDs are for today, get with the math already.
Here's Artur's logic:
- Wikia is all SSD in production. The new Wikia file servers have a theoretical read rate of ~10GB/sec sequential, 6GB/sec random and 1.2 million IOPs. If you can't do math or love the past, you love spinning rust. If you are awesome you love SSDs.
- SSDs are cheaper than drives using the most relevant metric: $/GB/IOPS. 1 SSD is 44,000 IOPS and one hard drive is 180 IOPS. Need 1 SSD instead of 50 hard drives.
- With 8 million files there's a 9 minute fsck. Full backup in 12 minutes (X-25M based).
- 4 GB/sec random read average latency 1 msec.
- 2.2 GB/sec random write average latency 1 msec.
- 50TBs of SSDs in one machine for $80,000. With the densities most products can skip sharding completely.
- Joins are slow because random access disk IO slow. Not true with SSDs. Joins will perform well.
- Best way to save power because you need fewer CPUs.
- Recommends starting small, with Intel 320s. Don't need fancy high end cards. 40K IOPS goes a long ways. $1000 for 600 GB.
Here's the video:
Well worth watching. The simplicity of not having to fight IO can be a real win. Some people have claimed SSDs aren't reliable, others claim they are. And if you need a lot CPU processing you'll those machines anyway so centralizing storage on fast SSDs may not be that big a win. The point here is that's it probably high time to consider SSD based architectures.
Related Articles
- Can flash SSDs be trusted? by Robin Harris. TL;DR: Yes.
Reader Comments (15)
Artur is not the first person I've seen become infatuated with SSDs, and I doubt he'll be the last. The profanity and the "can't do math" vs. "awesome" bit probably appeal to the other hipster pricks in the audience, but it doesn't make his arguments more convincing.
(1) You don't need that many IOPS for all of your data, so "SSD for everything" is a waste of money. Just a little bit of intelligence about which storage to use for which data can go a long way.
(2) Joins are slow because random access *in general* is slow, and single servers have limits that are soon reached with few SSDs. If you have multiple nodes for either performance or availability reasons, then your joins are going to be slow because of *network* latency even if disk latency drops to zero. This is true even for RAM latencies, and it remains true for SSDs.
(3) SSDs also have interesting write-block/erase-block boundary issues that affect performance over and above what we're already used to dealing with for disk blocks. Any *real* performance guru might have mentioned that.
(4) He doesn't even mention the longevity issue. If you treat SSDs as consumables, with a rigorous program of monitoring and replacement (good luck even keeping track of how many write cycles have occurred), then you can avoid the nastier failure issues . . . but those cost-per-whatever numbers don't look so good any more.
(5) Fsck is fast? Thank the folks who improved that code (I'm not one but I know who they are) because they had as much to do with it as SSDs did.
SSDs are great for warm data. They key is that they should as much as possible only be used for warm data - hot data should go in RAM and cold data should go on spinning/sliding media. It's not *that* hard to approximate such a pattern, and it's much more cost-effective than just dumbly slapping SSDs into everything. Over time, we might even develop algorithms that do this autonomously and semi-effectively, in contrast to the current crop of hybrid drives and auto-tiering drivers that burn write/erase cycles for data that won't actually be accessed again before it's evicted. Unfortunately, anybody who actually listens to this kind of "SSDs are magic pixie dust" BS won't be pursuing other, better, approaches.
Where exactly can one get "50TBs of SSDs in one machine for $80,000"?
Great response Jeff, thanks.
@Jeff: Re #4: Even the 25nm flash is still pretty reliable. If you have a layer that aggregates writes to a full flash block, you can get something like 820TiBibytes of writes to a single 160GB Intel 320 Series SSD.
(We're doing ~8.1TiB per percentage on the wearout indicator)
You can also pull all these stats (number of MiB written, percentage of spare flash and wearout) via SMART, at least for the Intel SSDs.
Doing monitoring with RRDs being written to 2 OCZ SSDs (using RAID0 mdadm), the estimated lifetime is ~21 years.
So flash endurance isn't a concern. Have a read of http://www.usenix.org/event/hotstorage10/tech/full_papers/Mohan.pdf
Paul Nendick, by building something like this http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
Obviously for reliability you don't want to run a single SSD drive in your server.
However, AFAIK no raid controller supports TRIM command, so drive's longevity and long term performance specs are a questionable.
+1 for Jeff's post ... Lots of insight, zero hype :)
@Paul Intel - SSDSA2CW600G3K5 - 600GB SSD 320 Series ~$1100
Like Paul I'm also interested in the "50TBs of SSDs in one machine for $80,000" quote.
In the comment above Chris has said you can get the Intel 600 SSD for ~$1,100.
51,200 GB / 600 GB = 85 drives.
So for $90,000 (85 * $1100) I can get the SSD's for my 50TB, but what am I going to stick them in? @mxx pointed to backblaze's home grown storage solution, but like many others I'm not in the game of building my own hardware but would rather purchase from a provider.
The best option I can find is a Dell MD1220 direct attached storage which allows for 24 drives. Unfortunately, the biggest SSD they supply is 150GB giving a total of 3.6TB.
Does anyone know any better options?
Side note: we want to raid our SSD's as they can fail but we'll leave that out of this discussion.
A SuperMicro SC417E16-RJBOD1 will do the trick - 4U, 88 x 2.5" bays.
Excellent suggestions everyone.
Now what about securing this pool of data against drive failure? Last I checked, Intel's TRIM for RAID only supported RAID 0 or RAID 1 - nothing like RAID 5 or greater.
And once they have, what sort of RAID controller(s) can handle 88 SSDs? What would the usable amount of space be after being RAID'd? What would the rebuild time be after a single disk failure?
And what about a wholistic view of the throughput? If were to add in a pair of bonded 10 GigE ethernet ports to this JBOD, how much throughput could a smattering of NFS clients pull before the PCI bus inside this JBOD gets saturated? One can quickly negate the investment in SSD by mating that tech to other parts of a typical NAS or SAN stack that haven't evolved in performance at the same rate SSDs have.
Paul
PS: that Backblaze design isn't without criticism: http://www.c0t0d0s0.org/archives/5899-Some-perspective-to-this-DIY-storage-server-mentioned-at-Storagemojo.html
Perhaps forgo hardware raid controllers? Get JBOD enclosure with multiple independent sata ports, connect that thing to a dedicated storage server that be doing software raid. You'll get the benefit of TRIM command and not have to worry about enclosure's/controller's proprietary RAID setup/limitations.
Software RAID is not an easy answer. CPU power might not be a problem but you have the extra bandwidth of all the duplicated IO to worry about when it is not offloaded to dedicated hardware.
May I go one step further and say that anyone who demands RAID 5 is a moron. It's one thing to think you want something because you don't know any better. But to demand it when there is ample information that RAID 5 is a dinosaur is inexcusable. RAID 10 is fine.
Follow up 10 years later.
Many of us run SSD drives in our servers and some NAS are all SSD.
SDD as it existed 10 years ago is on the way out because the technology only allows for 1 access at a time.
Newer Tech combined with SSD has been overtaking and will dominate the marketplace in the next few years as more and of us want bigger faster multi-threaded storage.
So for those who thought SSD was a flash in the pan....
sorry for the pun...
It was not and many of us including me still have to see an SSD die in service.
I suppose having a litigious society may have had a good result on industry. Many of the SSD drive makers reported their expectations on the lower side of expectentcy instead of the higher side. This set our expectations to a more realistic use pattern and it exceeded it by more than we expected.
For those who wonder if IOPS is really everything I will harken back to the days in the 1980's when a lovely Grace Hopper
gave a speech to computer techs and programmers of the time about speed of code. She held up a small piece of wire and stated that this is how far electricity travels in a nanosecond.....She then held up a coil of wire and said "this is how far electricity travles in a millisecond" . She lectured everyone on the BASICS, the display is the slowest part of the computer because of the bus it works on is one of the slowest parts of a computer. Less use of the display, fastser programs. etc. So it is always about the IOPS no matter what you believe is important. The questions is are you using them as proficiently as you should. Because Electricity travles at a very specific and unchanging rate.