Thousands of Images, How to Organize the Directory Structure? (Linux)

Thousands of images, how should I organize the directory structure? (linux)

I would do the following:

  1. Take an MD5 hash of each image as it comes in.
  2. Write that MD5 hash in the database where you are keeping track of these things.
  3. Store them in a directory structure where you use the first couple of bytes of the MD5 hash hex string as the dir name. So if the hash is 'abcdef1234567890' you would store it as 'a/b/abcdef1234567890'.

Using a hash also lets you merge the same image uploaded multiple times.

What is the best directory structure for handling large amounts of images uploaded?

Drilling down subdirs on a unique hash like that is a good solution, but the number of subdirs in your example is way too many. Each two character subdir can support 256 entries, so if you're going to have 5000 users, you'll get only about 20 files per subdir when going just a single level deep, which is perfectly reasonable. Two levels deep will easily handle millions of users.

Also, I wouldn't cut the filename to whatever remaining characters are on the hash. Use the full hash for the filename, regardless of how many levels deep you go. Files will be much easier to manage if you need to (for example) move them to a new store. I.e., don't do this:

49/f68a5c8493ec2c0bf489821c21fc3b.jpg

Do this:

49/49f68a5c8493ec2c0bf489821c21fc3b.jpg

Just 3 images per user, do I need to go to the trouble of creating a clever directory structure?

I don't see why hashing the files would be of use unless you were expecting a good deal of them to be identical; then you could save lots of storage.

If users will ever be able to edit their usernames (and you can't tell the future!) then I wouldn't use that in the structure. Use a database primary key or something which will never change.

If you did want a bit more structure you could still split the ids across levels of structure, so for example: a user with an id 1234 would have images stored at:

/images/domain.com/12/34/imagename.png

Which would at least mean you didn't have more than a hundred directories when you look at the list...

How to store images in your filesystem

Just split your userid from behind. e.g.

UserID = 6435624 
Path = /images/24/56/6435624

As for the backup you could use MySQL Replication and backup the slave
database to avoid problems (e.g. locks) while backuping.

Chosing directory to save item images in e-commerce shop

Having a folder structure such as pictures/xxxx/picture_name.jpg will not make it faster to display an image. It will, however, be much easier to manually find images as the system grows -- so your data is easier to maintain.

Defining the folder structure is fairly straight forward. For example if you're using CarrierWave in your Rails application then defining a custom folder structure is simply a matter of overloading one method.
For instance, here's a generic folder structure that you may wish to use:

  def store_dir 
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end

However, not all files in your system will be uploaded through the interface. Have a read through the Rails tutorials on the asset pipeline; understand the difference in usage between the app/assets, vendor/assets and public/ folders. A proper understanding and use of the framework here will greatly improve the application performance.

Does a large number of directories negatively impact performance?

1) The answer depends on

  • OS (e.g. Linux vs Windows) and

  • filesystem (e.g. ext3 vs NTFS).

2) Keep in mind that when you arbitrarily create a new subdirectory, you're using more inodes

3) Linux usually handles "many files/directory" better than Windows

4) A couple of additional links (assuming you're on Linux):

  • 200,000 images in single folder in linux, performance issue or not?

  • https://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory

  • https://serverfault.com/questions/147731/do-large-folder-sizes-slow-down-io-performance

  • http://tldp.org/LDP/intro-linux/html/sect_03_01.html

  • *

How many files can I put in a directory?

FAT32:

  • Maximum number of files: 268,173,300
  • Maximum number of files per directory: 216 - 1 (65,535)
  • Maximum file size: 2 GiB - 1 without LFS, 4 GiB - 1 with

NTFS:

  • Maximum number of files: 232 - 1 (4,294,967,295)
  • Maximum file size

    • Implementation: 244 - 26 bytes (16 TiB - 64 KiB)
    • Theoretical: 264 - 26 bytes (16 EiB - 64 KiB)
  • Maximum volume size

    • Implementation: 232 - 1 clusters (256 TiB - 64 KiB)
    • Theoretical: 264 - 1 clusters (1 YiB - 64 KiB)

ext2:

  • Maximum number of files: 1018
  • Maximum number of files per directory: ~1.3 × 1020 (performance issues past 10,000)
  • Maximum file size

    • 16 GiB (block size of 1 KiB)
    • 256 GiB (block size of 2 KiB)
    • 2 TiB (block size of 4 KiB)
    • 2 TiB (block size of 8 KiB)
  • Maximum volume size

    • 4 TiB (block size of 1 KiB)
    • 8 TiB (block size of 2 KiB)
    • 16 TiB (block size of 4 KiB)
    • 32 TiB (block size of 8 KiB)

ext3:

  • Maximum number of files: min(volumeSize / 213, numberOfBlocks)
  • Maximum file size: same as ext2
  • Maximum volume size: same as ext2

ext4:

  • Maximum number of files: 232 - 1 (4,294,967,295)
  • Maximum number of files per directory: unlimited
  • Maximum file size: 244 - 1 bytes (16 TiB - 1)
  • Maximum volume size: 248 - 1 bytes (256 TiB - 1)


Related Topics



Leave a reply



Submit