How Does Stat Command Calculate the Blocks of a File

How does stat command calculate the blocks of a file?

The stat command-line tool uses the stat / fstat etc. functions, which return data in the stat structure. The st_blocks member of the stat structure returns:

The total number of physical blocks of size 512 bytes actually allocated on disk. This field is not defined for block special or character special files.

So for your "Email" example, with a size of 965 and a block count of 8, it is indicating that 8*512=4096 bytes are physically allocated on disk. The reason it's not 2 is that the file system on disk does not allocate space in units of 512, it evidently allocates them in units of 4096. (And the unit of allocation may vary depending on file size and filesystem sophistication. E.g. ZFS supports different units of allocation.)

Similarly, for the wxPython example, it indicates that 7056*512 bytes, or 3612672 bytes are physically allocated on disk. You get the idea.

The IO block size is "a hint as to the 'best' unit size for I/O operations" - it's usually the unit of allocation on the physical disk. Don't get confused between the IO block and the block that stat uses to indicate physical size; the blocks for physical size are always 512 bytes.

Update based on comment:

Like I said, st_blocks is how the OS indicates how much space is used by the file on disk. The actual units of allocation on disk are the choice of the file system. For example, ZFS can have allocation blocks of variable size, even in the same file, because of the way it allocates blocks: files start out having a small block size, and the block size keeps on increasing until it reaches a particular point. If the file is later truncated, it will probably keep the old block size. So based on the history of the file, it can have multiple possible block sizes. So given a file size it is not always obvious why it has a particular physical size.

Concrete example: on my Solaris box, with a ZFS file system, I can create a very short file:

$ echo foo > test
$ stat test
  Size: 4               Blocks: 2          IO Block: 512    regular file
(irrelevant details omitted)

OK, small file, 2 blocks, physical disk usage is 1024 for this file.

$ dd if=/dev/zero of=test2 bs=8192 count=4
$ stat test2
  Size: 32768           Blocks: 65         IO Block: 32768  regular file

OK, now we see physical disk usage of 32.5K, and an IO block size of 32K. I then copied it to test3 and truncated this test3 file in an editor:

$ cp test2 test3
$ joe -hex test3
$ stat test3
  Size: 4               Blocks: 65         IO Block: 32768  regular file

Well now, here's a file with 4 bytes in it - just like test - but it's using 32.5K physically on the disk, because of the way the ZFS file system allocates space. Block sizes increase as the file gets larger, but they don't decrease when the file gets smaller. (And yes, this can lead to substantial wasted space depending on the kinds of files and file operations you do on ZFS, which is why it allows you to set the maximum block size on a per-filesystem basis, and change it dynamically.)

Hopefully, you can now appreciate that there isn't necessarily a simple relationship between file size and physical disk usage. Even in the above it's not clear why 32.5K bytes are needed to store a file that's exactly 32K in size - it appears that ZFS generally needs an extra 512 bytes for extra storage of its own. Perhaps it's using that storage for checksums, reference counts, transaction state - file system bookkeeping. By including these extras in the indicated physical file size, it seems like ZFS is trying not to mislead the user as to the physical costs of the file. That doesn't mean it's trivial to reverse-engineer the calculation without knowing intimate details about the underlying file system implementation.

stat() and block size in Cygwin

The stat command line tool has a %B format option, which displays the block size it is using. It appears stat uses a 1024-byte block in Cygwin.

Also, it appears that the NTFS 4096-byte block size is actually what is being used under the hood, and stat is just presenting 1024-byte blocks:

$ dd if=/dev/urandom of=foo count=1 bs=4095
$ stat -c '%B %b' foo
1024 4
$ dd if=/dev/urandom of=foo count=1 bs=4097
$ stat -c '%B %b' foo
1024 8

There is a discussion of where the 512-byte vs 1024-byte block sizes come from in https://unix.stackexchange.com/questions/28780/file-block-size-difference-between-stat-and-ls. Apparently it is to do with Linux kernel conventions vs GNU utility conventions.

Wrong number of data blocks given by stats command

stats gives you the number of indexable blocks in your data file. These blocks are separated by pairs of blank records (i.e. two blank lines).

If you did plot 'modele.out' index 0 you would find that it plotted all your data points as well, whereas index 1 would give you an error. There is only one (indexable) block in your data.

The solution

separate your blocks by two blank lines
change your splot command to splot 'modele.out' index i using 2:3:5 notitle

When you are using splot, a single blank line separates each row (or datablock, to use the term in the manual). This isn't the same thing as a block! In all other contexts (as far as I'm aware) there are two lines between each block (or indexable block to use the term in the manual).

update

As suggested by Christoph in the comments, if you wanted to keep your file in the same format and were sure that there were no blank lines at the end, you could change your loop to this:

do for [i=0:STATS_blank] {

and use your original splot line (with every, rather than index).

Difference between block and block size in Advanced Unix

See

 man fstat

which reports:

The blocks field indicates the number of blocks allocated to the file, 512-byte units.
The blocksize field gives the "preferred" blocksize for efficient file system I/O.

Calculate blocksize of files in nested directories in C

I solved it! I was passing a string corresponding to the name of the file to lstat and not the relative path to the file Synopsis för the stat function .

C, total from unix ls function

I fixed it:

n = scandir(path, &namelist, filter, alphasort);

if (l_option) { // flag if -l is given
  while (i < n) {
    char* temp = (char *) malloc(sizeof(path)+sizeof(namelist[i]->d_name));
    strcpy(temp, path); //copy path to temp
    stat(strcat(temp, namelist[i]->d_name), &s); // we pass path to + name of file
    total += s.st_blocks;
    free(temp);
    free(namelist[i++]); // optimization rules!
  }
  free(namelist);
  printf("total %d\n", total/2);
}

So basicly, I make new char array containing the dir_name + the name of file, then I get stat structure and use it to find the total.

How Does Stat Command Calculate the Blocks of a File