How to Store One Billion Files on Ext4

How to store one billion files on ext4?

When creating an ext4 file system, you can specify the usage type:

mkfs.ext4 -T usage-type /dev/something

The available usage types are listed in /etc/mke2fs.conf. The
main difference between usage types is the inode ratio. The lower
the inode ratio, the more you can create files in your file
system.

The usage type in mke2fs.conf which allocates the highest number of
inodes in the file system is "news". With this usage type on a 1
TB hard drive, ext4 creates 244 million inodes.

# tune2fs -l /dev/sdb1 | grep -i "inode count"
Inode count: 244219904
# sgdisk --print /dev/sdb
Disk /dev/sdb: 1953525168 sectors, 931.5 GiB

This means that it would require more than 4 TB to create an ext4
file system with "-Tnews" that could possibly hold 1 billion
inodes.

What is the best way to store 1 Billion small text files?

Your first option is much faster.

Think of a directory in a file system like a a text file with an unsorted list of all files in this directory with an address where to find the file on the disk. To read a file you need to know the address of the file on the disk. If you have a path like '/myfilename', then you need to find the file / which is a directory and contains all files in this directory. Than you need to scan this file for the entry 'myfilename', which may in worst case require you to traverse the entire file. In average case that will take O(N/2) while N is apperently 1 billion (the number of total files in this directory).

If you have multiple directories... Say always 1000 files in a directory so that you have 3 levels of directorys and your filepath is now /A/B/myfilename, then you will need to first open the / directory, find A (which requires O(1000/2), open that file and find B (O(1000/2) again) and open that file again to find myfilename (yet again O(1000/2)). So adding those up will be 3*O(1000/2) = 1500, which is MUCH faster than the O(500.000.000) that we had previously.

This is a very important aspect of file systems to always keep in mind. If you have a directory that may run into danger to exceed having 10.000 files stored in it, I'd strongly recommend to think about a strategy to sort those files into subdirectories.

Whether you should better use a relational database depends on other questions: Do you need backups (to be created concurrently)? Do you need transactions beyond what simple journaling file systems offer? Do you need concurrency control? Do you need to search your through your files? How often do you need to access the files? How often do you change your files?

For further readings on file systems I recommand the book modern operating system by Tanenbaum (chapter 6 "File systems"), that is available online here: http://lovingod.host.sk/index.html?page=tanenbaum%2FOperating-Systems-Design.html

what is the max files per directory in EXT4?

It depends upon the MKFS parameters used during the filesystem creation. Different Linux flavors have different defaults, so it's really impossible to answer your question definitively.

How many files should I allow in one folders in Linux server for uploading by users?

This isn't a problem at all. Store them all in one directory.

The time when it's a bad idea is if it's a directory you're going to be managing by hand. Then, very large numbers start to cause you a headache. It would be like having one massive drawer in your filing cabinet rather than lots of separate ones.

But if you've made sure that the files all have unique names, and if you have a programmatic method of managing everything, then having them all in one directory is a good plan. In fact, it's easier, because if you split them up then you need to store something to help you find the directory containing the file you want.

How to Increase inode limits on Ubuntu?

IMHO you can not change the inode limit after creating the filesystem for many filesystems.
You can set the number of inodes by using the -N switch of mkfs. Before recreating the filesystem you can check the default calculation of inodes by using the -n switch.
consult this answer: How to store one billion files on ext4? for more information.



Related Topics



Leave a reply



Submit