Strategy: Avoid Lots of Little Files

Wednesday

Nov042015

Strategy: Avoid Lots of Little Files

Wednesday, November 4, 2015 at 9:59AM

I've been bitten by this one. It happens when you quite naturally use the file system as a quick and dirty database. A directory is a lot like a table and a file name looks a lot like a key. You can store many-to-one relationships via subdirectories. And the path to a file makes a handy quick lookup key.

The problem is a file system isn't a database. That realization doesn't hit until you reach a threshold where there are actually lots of files. Everything works perfectly until then.

When the threshold is hit iterating a directory becomes very slow because most file system directory data structures are not optimized for the lots of small files case. And even opening a file becomes slow.

According to Steve Gibson on Security Now (@16:10) LastPass ran into this problem. LastPass stored every item in their vault in an individual file. This allowed standard file syncing technology to be used to update only the changed files. Updating a password changes just one file so only that file is synced.

Steve thinks this is a design mistake, but this approach makes perfect sense. It's simple and robust, which is good design given, what I assume, is the original reasonable expectation of relatively small vaults.

The problem is the file approach doesn't scale to larger vaults with thousands of files for thousands of web sites. Interestingly, decrypting files was not the bottleneck, the overhead of opening files became the problem. The slowdown was on the elaborate security checks the OS makes to validate if a process has the rights to open a file.

The new version of 1Password uses a UUID to shard items into one of 16 files based on the first digit of the UUID. Given good random number generation the files should grow more or less equally as items are added. Problem solved. Would this be your first solution when first building a product? Probably not.

Apologies to 1Password if this is not a correct characterization of their situation, but even if wrong, the lesson still remains.

HighScalability Team |

7 Comments |

Permalink |

Print Article

Email Article

Strategy

Reader Comments (7)

1Password had the problem not LastPass.

Glad you're trying to quote Steve Gibson. Security Now is a great podcast.

November 4, 2015 |

Davinder

I personally think that using a filesystem has incredible advantages when the query / access model is simple. I wonder when developers gave up the notion of persisting state in a file, it must have been around the time big relational database vendors entered the scene, and gave lots of guarantees and theory around reliability. Funny thing is, even they store on files in most cases. So in this day and age of 3 times over redundant storage that auto syncs across volumes, i think a developer can simplify things massively by not going down the relational database route. The only problem with that is that a whole generation of developers now sneers at storing things in files, as they have been weened off it for so long, so it feels alien to many to use the simplest thing possible, which is a storage mechanism implemented in the OS kernel.
When scaling this, obviously eventual consistency and replication start to be a factor, but all of this is fairly well understood on the filesystem level. In my opinion options like CEPH that allow different access patterns to the same stored entity are an interesting new technology, that should hopefully get developers back on track and out of the claws of database
systems / vendors.

Rant Over :-)

November 4, 2015 |

Martin Schinz

Modern local file systems can handle very large numbers of files pretty well, but you have to use the right mkfs and/or mount options. Also, this UUID-sharding technique can have unexpected downsides. For example, on the Gluster project our geo-replication used to keep track of which directories had been changed (had files added/removed/renamed) so we wouldn't have to scan quiescent ones. (It's now completely log-based, in part to avoid problems like the one I'm describing.) This generally worked well, except that one user did exactly the same kind of UUID sharding you recommend. As a result, during each reasonably short time interval they'd change add one or two files in each and every subdirectory, forcing us to scan all of them. When they changed their sharding scheme to be time-based, so that all new files were in one or two subdirectories, their performance increased substantially. It's unfortunate that some "recent but not modern" file systems like HDFS still encourage people to contort their directory structures this way.

November 4, 2015 |

Jeff Darcy

Thanks Davinder. What a screw up! How embarrassing. Steve talks about Last Pass after that section so I must have s//'d.

And agreed, Steve is great.

thanks!

November 4, 2015 |

HighScalability Team

But why would you iterate over these directories?
If the full path to a certain file is stored in an index file (or somewhere else), you can open it directly.

November 5, 2015 |

florin

Florin, if you want to copy a directory to another directory. In my particular failure it was copying a primary's database stored as files in a directory tree to a secondary hot standby on bootup.

Another example is simply doing a search. You have to iterate over directories, open up files, and scan each file looking for a match.

November 5, 2015 |

HighScalability Team

At Egnyte we ran into this problem very early in 2009 or so and we built a layer of indirection where in Mysql we would store metadata for the file and each file would be represented by a UUID internally. Then on Disk we would create the directories by first 2 chars of UUID and again inside them by next 2 chars and again by next 2 chars and then store the final file in the 3rd level. This allowed us to store multi billions of files and Petabytes of data over and array of commodity machines with replicated copies. I dont think there is a problem in storing large amount of small files as long as you can balance the read/write loads. We used replication/caching/sharding to solve these upload download issues.

November 15, 2015 |

Kalpesh Patel

Post a New Comment

Enter your information below to add a new comment.

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Strategy: Avoid Lots of Little Files

Related Articles

Reader Comments (7)

Post a New Comment