Why Doesn't "Total" from Ls -L Add Up to Total File Sizes Listed

Why doesn't total from ls -l add up to total file sizes listed?

You can find the definition of that line in the ls documentation for your platform. For coreutils ls (the one found on a lot of Linux systems), the information can be found via info coreutils ls:

For each directory that is listed, preface the files with a line
`total BLOCKS', where BLOCKS is the total disk allocation for all
files in that directory.

Using ls to list directories and their total sizes

Try something like:

du -sh *

short version of:

du --summarize --human-readable *

Explanation:

du: Disk Usage

-s: Display a summary for each specified file. (Equivalent to -d 0)

-h: "Human-readable" output. Use unit suffixes: Byte, Kibibyte (KiB), Mebibyte (MiB), Gibibyte (GiB), Tebibyte (TiB) and Pebibyte (PiB). (BASE2)

Summing total file sizes of directory is different by a large margin: Ruby -e, du -ach, ls -al total

If du is showing you a lot of 4K and 8K files, this is because it is showing you the block size. For performance, storage on disk is made up of blocks. A typical block these days is 4K. Even a single byte will take a full block.

$ echo '1' > this

$ hexdump this
0000000 31 0a                                          
0000002

$ ls -l this
-rw-r--r-- 1 schwern staff 2 Dec  5 15:16 this

$ du -h this
4.0K    this

$ du --apparent-size -h this
2   this

$ ruby -e 'puts File.size(ARGV[0])' this
2

The file in question has 2 bytes of content. ls -l and File.size report the content of two bytes.

du, by default, reports the block size of the file. This is because it is a Disk Usage tool and you want to know the true amount of disk taken up. Those 2 bytes take up 4K of disk. 1000 2 byte files will take 4000K, not 2000 bytes.

For this reason, many programs will avoid having many tiny files and instead save disk space by packing them together into a single image file. A simple example is Git packfiles.

total size of group of files selected with 'find'

The command du tells you about disk usage. Example usage for your specific case:

find rapidly_shrinking_drive/ -name "offender1" -mtime -1 -print0 | du --files0-from=- -hc | tail -n1

(Previously I wrote du -hs, but on my machine that appears to disregard find's input and instead summarises the size of the cwd.)

fastest way to sum the file sizes by owner in a directory

Get a listing, add up sizes, and sort it by owner (with Perl)

perl -wE'
    chdir (shift // "."); 
    for (glob ".* *") { 
        next if not -f;
        ($owner_id, $size) = (stat)[4,7]
            or do { warn "Trouble stat for: $_"; next };
        $rept{$owner_id} += $size 
    } 
    say (getpwuid($_)//$_, " => $rept{$_} bytes") for sort keys %rept
'

I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob-ed (while I found glob much faster in a related problem).

I expect good runtimes in comparison with ls, which slows down dramatically as a file list in a single directory gets long. This is due to the system so Perl will be affected as well but as far as I recall it handles it far better. However, I've seen a dramatic slowdown only once entries get to half a million or so, not a few thousand, so I am not sure why it runs slow on your system.

If this need be recursive in directories it finds then use File::Find. For example

perl -MFile::Find -wE'
    $dir = shift // "."; 
    find( sub { 
        return if not -f;
        ($owner_id, $size) = (stat)[4,7] 
            or do { warn "Trouble stat for: $_"; return }; 
        $rept{$owner_id} += $size 
    }, $dir ); 
    say (getpwuid($_)//$_, "$_ => $rept{$_} bytes") for keys %rept
'

This scans a directory with 2.4 Gb, of mostly small files over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh took around 5 seconds (the first time round).

It is reasonable to bring these two into one script

use warnings;
use strict;
use feature 'say';    
use File::Find;
use Getopt::Long;

my %rept;    
sub get_sizes {
    return if not -f; 
    my ($owner_id, $size) = (stat)[4,7] 
        or do { warn "Trouble stat for: $_"; return };
    $rept{$owner_id} += $size 
}

my ($dir, $recurse) = ('.', '');
GetOptions('recursive|r!' => \$recurse, 'directory|d=s' => \$dir)
    or die "Usage: $0 [--recursive] [--directory dirname]\n";

($recurse) 
    ? find( { wanted => \&get_sizes }, $dir )
    : find( { wanted => \&get_sizes, 
              preprocess => sub { return grep { -f } @_ } }, $dir );

say (getpwuid($_)//$_, " => $rept{$_} bytes") for keys %rept;

I find this to perform about the same as the one-dir-only code above, when run non-recursively (default as it stands).

Note that File::Find::Rule interface has many conveniences but is slower in some important use cases, what clearly matters here. (That analysis should be redone since it's a few years old.)

bash script: calculate sum size of files

Ultimately, as other answers will point out, it's not a good idea to parse the output of ls because it may vary between systems. But it's worth knowing why the script doesn't work.

The ambiguous redirect error is because you need quotes around your ls command i.e.:

while IFS='' read -r line || [[ -n "$line" ]]; do
  echo $line
done < "`ls -l | grep opencv | awk '{print $5}'`"

But this still doesn't do what you want. The "<" operator is expecting a filename, which is being defined here as the output of the ls command. But you don't want to read a file, you want to read the output of ls. For that you can use the "<<<" operator, also known as a "here string" i.e.:

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo $line
done <<< "`ls -l | grep opencv | awk '{print $5}'`"

This works as expected, but has some drawbacks. When using a "here string" the command must first execute in full, then store the output of said command in a temporary variable. This can be a problem if the command takes long to execute or has a large output.

IMHO the best and most standard method of iterating a commands output line by line is the following:

ls -l | grep opencv | awk '{print $5} '| while read -r line ; do
    echo "line: $line"
done

ls -ltr command in UNIX and Linux - Behaviour

It's because of different filesystems. Total shows how many blocks used by files. Add -s and you will see (ls -ltrs)

Why Doesn't "Total" from Ls -L Add Up to Total File Sizes Listed