Fast Linux File Count for a Large Number of Files

The fastest way to count the number of files in a directory (including subdirectories)

I did a quick research. Using a directory with 100,000 files I compared the following commands:

ls -R <dir>
ls -lR <dir>
find <dir> -type f

I ran them twice, once redirecting into a file (>file), and once piping into wc (|wc -l). Here are the run times in seconds:

        >file   |wc
ls -R 14 14
find 89 56
ls -lR 91 82

The difference between >file and |wc -l is smaller than the difference between ls and find.

It appears that ls -R is at least 4x faster than find.

What is the fastest / easiest way to count large number of files in a directory (in Linux)?

ls does a stat(2) call for every file. Other tools, like find(1) and the shell wildcard expansion, may avoid this call and just do readdir. One shell command combination that might work is find dir -maxdepth 1|wc -l, but it will gladly list the directory itself and miscount any filename with a newline in it.

From Python, the straight forward way to get just these names is os.listdir(directory). Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.

Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.

Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.

Recursively counting files in a Linux directory

This should work:

find DIR_NAME -type f | wc -l

Explanation:

  • -type f to include only files.
  • | (and not ¦) redirects find command's standard output to wc command's standard input.
  • wc (short for word count) counts newlines, words and bytes on its input (docs).
  • -l to count just newlines.

Notes:

  • Replace DIR_NAME with . to execute the command in the current folder.
  • You can also remove the -type f to include directories (and symlinks) in the count.
  • It's possible this command will overcount if filenames can contain newline characters.

Explanation of why your example does not work:

In the command you showed, you do not use the "Pipe" (|) to kind-of connect two commands, but the broken bar (¦) which the shell does not recognize as a command or something similar. That's why you get that error message.

Fast way to find the number of files in one directory on Linux

Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)

By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..

edit: ext2 stored them as a linked list.

modern ext3 implements hashed B-Trees

Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().

from here:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
int count;
struct DIR *d;
if( (d = opendir(".")) != NULL)
{
for(count = 0; readdir(d) != NULL; count++);
closedir(d);
}
printf("\n %d", count);
return 0;
}

Count lines in large files

Try: sed -n '$=' filename

Also cat is unnecessary: wc -l filename is enough in your present way.

quickest way to count the number of files in a directory containing hundreds of thousands of files

What you do right now reads the whole directory (more or less) into memory only to discard that content for its count. Avoid that by streaming the directory instead:

my $count;
opendir(my $dh, $curDir) or die "opendir($curdir): $!";
while (my $de = readdir($dh)) {
next if $de =~ /^\./ or $de =~ /config_file/;
$count++;
}
closedir($dh);

Importantly, don't use glob() in any of its forms. glob() will expensively stat() every entry, which is not overhead you want.

Now, you might have much more sophisticated and lighter weight ways of doing this depending on OS capabilities or filesystem capabilities (Linux, by way of comparison, offers inotify), but streaming the dir as above is about as good as you'll portably get.

How to count number of files in each directory?

Assuming you have GNU find, let it find the directories and let bash do the rest:

find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[@]}" "$dir"
done

How to fast count large set of files

Use below code to count all files and directories inside "opt" folder

QDir dir("/opt/");
dir.count();

Use below code to list *.jpg files in current and all its subdirectories.

QDirIterator it("/opt/", QStringList() << "*.jpg", QDir::Files, QDirIterator::Subdirectories);
int count = 0;
while (it.hasNext()){
qDebug() << it.next();
count++;
}
qDebug() << "count:" << count;


Related Topics



Leave a reply



Submit