Better for performance – many files in one directory, or many subdirectories each with one file

filefilesystems

While building web applications often we have files associated with database entries, eg: we have a user table and each category has a avatar field, which holds the path to associated image.

To make sure there are no conflicts in filenames we can either:

  • rename files upon upload to ID.jpg; the path would be then /user-avatars/ID.jpg
  • or create a sub-directory for each entity, and leave the original filename intact; the path would be then /user-avatars/ID/original_filename.jpg

where ID is users's unique ID number

Both perfectly valid from application logic's point of view.

But which one would be better from filesystem performance point of view? We have to keep in mind that the number of category entries can be very high (milions).

Is there any limit to a number of sub-directories a directory can hold?

Best Solution

It's going to depend on your file system, but I'm going to assume you're talking about something simple like ext3, and you're not running a distributed file system (some of which are quite good at this). In general, file systems perform poorly over a certain number of entries in a single directory, regardless of whether those entries are directories or files. So no matter whether if you're creating one directory per image or one image in the root directory, you will run into scaling problems. If you look at this answer:

How many files in a directory is too many (on Windows and Linux)?

You'll see that ext3 runs into limits at about 32K entries in a directory, far fewer than you're proposing.

Off the top of my head, I'd suggest doing some rudimentary sharding into a multilevel directory tree, something like /user-avatars/1/2/12345/original_filename.jpg. (Or something appropriate for your type of ID, but I am interpreting your question to be about numeric IDs.) Doing that will also make your life easier later when you decide you want to distribute across a storage cluster, since you can spread the directories around.