Thousands of images, how should I organize the directory structure? (linux)
I would do the following:
- Take an MD5 hash of each image as it comes in.
- Write that MD5 hash in the database where you are keeping track of these things.
- Store them in a directory structure where you use the first couple of bytes of the MD5 hash hex string as the dir name. So if the hash is 'abcdef1234567890' you would store it as 'a/b/abcdef1234567890'.
Using a hash also lets you merge the same image uploaded multiple times.
What is the best directory structure for handling large amounts of images uploaded?
Drilling down subdirs on a unique hash like that is a good solution, but the number of subdirs in your example is way too many. Each two character subdir can support 256 entries, so if you're going to have 5000 users, you'll get only about 20 files per subdir when going just a single level deep, which is perfectly reasonable. Two levels deep will easily handle millions of users.
Also, I wouldn't cut the filename to whatever remaining characters are on the hash. Use the full hash for the filename, regardless of how many levels deep you go. Files will be much easier to manage if you need to (for example) move them to a new store. I.e., don't do this:
49/f68a5c8493ec2c0bf489821c21fc3b.jpg
Do this:
49/49f68a5c8493ec2c0bf489821c21fc3b.jpg
Just 3 images per user, do I need to go to the trouble of creating a clever directory structure?
I don't see why hashing the files would be of use unless you were expecting a good deal of them to be identical; then you could save lots of storage.
If users will ever be able to edit their usernames (and you can't tell the future!) then I wouldn't use that in the structure. Use a database primary key or something which will never change.
If you did want a bit more structure you could still split the ids across levels of structure, so for example: a user with an id 1234 would have images stored at:
/images/domain.com/12/34/imagename.png
Which would at least mean you didn't have more than a hundred directories when you look at the list...
How to store images in your filesystem
Just split your userid from behind. e.g.
UserID = 6435624
Path = /images/24/56/6435624
As for the backup you could use MySQL Replication and backup the slave
database to avoid problems (e.g. locks) while backuping.
Chosing directory to save item images in e-commerce shop
Having a folder structure such as pictures/xxxx/picture_name.jpg
will not make it faster to display an image. It will, however, be much easier to manually find images as the system grows -- so your data is easier to maintain.
Defining the folder structure is fairly straight forward. For example if you're using CarrierWave in your Rails application then defining a custom folder structure is simply a matter of overloading one method.
For instance, here's a generic folder structure that you may wish to use:
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
However, not all files in your system will be uploaded through the interface. Have a read through the Rails tutorials on the asset pipeline; understand the difference in usage between the app/assets
, vendor/assets
and public/
folders. A proper understanding and use of the framework here will greatly improve the application performance.
Does a large number of directories negatively impact performance?
1) The answer depends on
OS (e.g. Linux vs Windows) and
filesystem (e.g. ext3 vs NTFS).
2) Keep in mind that when you arbitrarily create a new subdirectory, you're using more inodes
3) Linux usually handles "many files/directory" better than Windows
4) A couple of additional links (assuming you're on Linux):
200,000 images in single folder in linux, performance issue or not?
https://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory
https://serverfault.com/questions/147731/do-large-folder-sizes-slow-down-io-performance
http://tldp.org/LDP/intro-linux/html/sect_03_01.html
*
How many files can I put in a directory?
FAT32:
- Maximum number of files: 268,173,300
- Maximum number of files per directory: 216 - 1 (65,535)
- Maximum file size: 2 GiB - 1 without LFS, 4 GiB - 1 with
NTFS:
- Maximum number of files: 232 - 1 (4,294,967,295)
- Maximum file size
- Implementation: 244 - 26 bytes (16 TiB - 64 KiB)
- Theoretical: 264 - 26 bytes (16 EiB - 64 KiB)
- Maximum volume size
- Implementation: 232 - 1 clusters (256 TiB - 64 KiB)
- Theoretical: 264 - 1 clusters (1 YiB - 64 KiB)
ext2:
- Maximum number of files: 1018
- Maximum number of files per directory: ~1.3 × 1020 (performance issues past 10,000)
- Maximum file size
- 16 GiB (block size of 1 KiB)
- 256 GiB (block size of 2 KiB)
- 2 TiB (block size of 4 KiB)
- 2 TiB (block size of 8 KiB)
- Maximum volume size
- 4 TiB (block size of 1 KiB)
- 8 TiB (block size of 2 KiB)
- 16 TiB (block size of 4 KiB)
- 32 TiB (block size of 8 KiB)
ext3:
- Maximum number of files: min(volumeSize / 213, numberOfBlocks)
- Maximum file size: same as ext2
- Maximum volume size: same as ext2
ext4:
- Maximum number of files: 232 - 1 (4,294,967,295)
- Maximum number of files per directory: unlimited
- Maximum file size: 244 - 1 bytes (16 TiB - 1)
- Maximum volume size: 248 - 1 bytes (256 TiB - 1)
Related Topics
Installing Jenkins Plugins to Docker Jenkins
Saving Gmon.Out Before Killing a Process
Application 'Appname' Failed to Start (Port 8080 Not Available) on Open Shift Node App
Where Are the Stacks for the Other Threads Located in a Process Virtual Address Space
Whiptail: How to Redirect Output to Environment Variable
Glibc: Elf File Os Abi Invalid
How to Add Entry in Linux Man Page Database
Meaning of Exit Status 1 Returned by Linux Command
Syntax of for Loop in Linux Shell Scripting
Can You Enter X64 32-Bit "Long Compatibility Sub-Mode" Outside of Kernel Mode
Add Text Between Two Patterns in File Using Sed Command
Run a Shell Script from Docker-Compose Command, Inside the Container