The fastest way to count the number of files in a directory (including subdirectories)
I did a quick research. Using a directory with 100,000 files I compared the following commands:
ls -R <dir>
ls -lR <dir>
find <dir> -type f
I ran them twice, once redirecting into a file (>file
), and once piping into wc (|wc -l
). Here are the run times in seconds:
>file |wc
ls -R 14 14
find 89 56
ls -lR 91 82
The difference between >file
and |wc -l
is smaller than the difference between ls
and find
.
It appears that ls -R
is at least 4x faster than find
.
What is the fastest / easiest way to count large number of files in a directory (in Linux)?
ls
does a stat(2)
call for every file. Other tools, like find(1)
and the shell wildcard expansion, may avoid this call and just do readdir
. One shell command combination that might work is find dir -maxdepth 1|wc -l
, but it will gladly list the directory itself and miscount any filename with a newline in it.
From Python, the straight forward way to get just these names is os.listdir(directory). Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.
Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.
Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.
Recursively counting files in a Linux directory
This should work:
find DIR_NAME -type f | wc -l
Explanation:
-type f
to include only files.|
(and not¦
) redirectsfind
command's standard output towc
command's standard input.wc
(short for word count) counts newlines, words and bytes on its input (docs).-l
to count just newlines.
Notes:
- Replace
DIR_NAME
with.
to execute the command in the current folder. - You can also remove the
-type f
to include directories (and symlinks) in the count. - It's possible this command will overcount if filenames can contain newline characters.
Explanation of why your example does not work:
In the command you showed, you do not use the "Pipe" (|
) to kind-of connect two commands, but the broken bar (¦
) which the shell does not recognize as a command or something similar. That's why you get that error message.
Fast way to find the number of files in one directory on Linux
Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)
By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..
edit: ext2 stored them as a linked list.
modern ext3 implements hashed B-Trees
Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().
from here:
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
int count;
struct DIR *d;
if( (d = opendir(".")) != NULL)
{
for(count = 0; readdir(d) != NULL; count++);
closedir(d);
}
printf("\n %d", count);
return 0;
}
Count lines in large files
Try: sed -n '$=' filename
Also cat is unnecessary: wc -l filename
is enough in your present way.
quickest way to count the number of files in a directory containing hundreds of thousands of files
What you do right now reads the whole directory (more or less) into memory only to discard that content for its count. Avoid that by streaming the directory instead:
my $count;
opendir(my $dh, $curDir) or die "opendir($curdir): $!";
while (my $de = readdir($dh)) {
next if $de =~ /^\./ or $de =~ /config_file/;
$count++;
}
closedir($dh);
Importantly, don't use glob()
in any of its forms. glob()
will expensively stat()
every entry, which is not overhead you want.
Now, you might have much more sophisticated and lighter weight ways of doing this depending on OS capabilities or filesystem capabilities (Linux, by way of comparison, offers inotify), but streaming the dir as above is about as good as you'll portably get.
How to count number of files in each directory?
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[@]}" "$dir"
done
How to fast count large set of files
Use below code to count all files and directories inside "opt" folder
QDir dir("/opt/");
dir.count();
Use below code to list *.jpg files in current and all its subdirectories.
QDirIterator it("/opt/", QStringList() << "*.jpg", QDir::Files, QDirIterator::Subdirectories);
int count = 0;
while (it.hasNext()){
qDebug() << it.next();
count++;
}
qDebug() << "count:" << count;
Related Topics
Error: Could Not Find Tiller' When Running 'Helm Version'
Rename Part of File Name Based on Exact Match in Contents of Another File
Where Does Pp (Par) Unpack Add (-A) Files
How to Set a Variable Used in a Perl Script as Environment Variable
How to Duplicate String in Bash
How to Set Up Curl to Permanently Use a Proxy
Magic Numbers of the Linux Reboot() System Call
What's the Difference Between "Env" and "Set" (On MAC Os X or Linux)
Matlab Execute Script from Linux Command Line
Concurrency of Posix Threads in Multiprocessor MAChine
Printing an Integer with X86 32-Bit Linux Sys_Write (Nasm)
Makefile with Multiple Targets
Jboss as 7.1.1 Ejb 3:Ejb Pool Error
Linux, Where Are the Return Codes Stored of System Daemons and Other Processes
How to Copy Folder with Files to Another Folder in Unix/Linux
How to Find Out Why My Storage Space on Amazon Ec2 Is Full
How to Delete First Two Lines and Last Four Lines from a Text File with Bash