Query Size of Block Device File in Python

Query size of block device file in Python

The “most clean” (i.e. not dependent on external volumes and most reusable) Python solution I've reached, is to open the device file and seek at the end, returning the file offset:

def get_file_size(filename):
    "Get the file size by seeking at end"
    fd= os.open(filename, os.O_RDONLY)
    try:
        return os.lseek(fd, 0, os.SEEK_END)
    finally:
        os.close(fd)

Get storage device block size from name/descriptor of a file on that device

As fstat() man page says:

 int fstat(int fildes, struct stat *buf);  
 int stat(const char *path, struct stat *buf);

The stat() function obtains information about the file pointed to by path. Read, write or execute permission of the named file is not required, but all directories listed in the path name leading to the file must be searchable.

The fstat() obtains the same information about an open file known by the file descriptor fildes.

The buf argument is a pointer to a stat structure as defined by and into which information is placed concerning the file.
the stat structure is defined as:

struct stat {
    dev_t    st_dev;    /* device inode resides on */
    ino_t    st_ino;    /* inode's number */
    mode_t   st_mode;   /* inode protection mode */
    nlink_t  st_nlink;  /* number of hard links to the file */
    uid_t    st_uid;    /* user-id of owner */
    gid_t    st_gid;    /* group-id of owner */
    dev_t    st_rdev;   /* device type, for special file inode */
    struct timespec st_atimespec;  /* time of last access */
    struct timespec st_mtimespec;  /* time of last data modification */
    struct timespec st_ctimespec;  /* time of last file status change */
    off_t    st_size;   /* file size, in bytes */
    quad_t   st_blocks; /* blocks allocated for file */
    u_long   st_blksize;/* optimal file sys I/O ops blocksize */
 };

I hope it helps you.

Determine the size of a block device

fdisk doesn't understand the partition layout used by my Mac running Linux, nor any other non-PC partition format. (Yes, there's mac-fdisk for old Mac partition tables, and gdisk for newer GPT partition table, but those aren't the only other partition layouts out there.)

Since the kernel already scanned the partition layouts when the block device came into service, why not ask it directly?


$ cat /proc/partitions
major minor  #blocks  name

   8       16  390711384 sdb
   8       17     514079 sdb1
   8       18  390194752 sdb2
   8       32  976762584 sdc
   8       33     514079 sdc1
   8       34  976245952 sdc2
   8        0  156290904 sda
   8        1     514079 sda1
   8        2  155774272 sda2
   8       48 1465138584 sdd
   8       49     514079 sdd1
   8       50 1464621952 sdd2

Find size and free space of the filesystem containing a given file

If you just need the free space on a device, see the answer using os.statvfs() below.

If you also need the device name and mount point associated with the file, you should call an external program to get this information. df will provide all the information you need -- when called as df filename it prints a line about the partition that contains the file.

To give an example:

import subprocess
df = subprocess.Popen(["df", "filename"], stdout=subprocess.PIPE)
output = df.communicate()[0]
device, size, used, available, percent, mountpoint = \
    output.split("\n")[1].split()

Note that this is rather brittle, since it depends on the exact format of the df output, but I'm not aware of a more robust solution. (There are a few solutions relying on the /proc filesystem below that are even less portable than this one.)

How do I get a block device's size correctly in go?

The OP asked how to get the size of a block device. To get the size of a block device (or any file), you can File.Seek to the end of the file using io.SeekEnd and read the position returned. Credit to others for python and C.

Running the example getsize.go below shows:

$ sudo go run getsize.go /dev/sda
/dev/sda is 256060514304 bytes.

lsblk --bytes /dev/device will give you the same information. That is how much data the block device can store.

The Statfs_t path, and df /path/to/mounted/filesystem will give you information about how much data you can store in the filesystem mounted at provided path. Filesystems have overhead, probably in the 2-5% range depending on details of the filesystem, and also keep track of how much space is Free or Used.

There is no api that I am aware of that can provide information about unmounted filesystems on a block device. dumpe2fs can give you that information for the ext{2,3,4} filesystems. There likely exist tools for other filesystems. Such tools are filesystem specific. When you mount the filesystem, then the linux kernel's filesystem driver exposes that information that is returned by df.

Code:

// getsize.go: get the size of a block device or file
package main

import (
    "fmt"
    "io"
    "os"
)

func main() {
    var path string
    if len(os.Args) < 2 {
        fmt.Println("Give path to file/disk")
        os.Exit(1)
    }
    path = os.Args[1]
    file, err := os.Open(path)
    if err != nil {
        fmt.Printf("error opening %s: %s\n", path, err)
        os.Exit(1)
    }
    pos, err := file.Seek(0, io.SeekEnd)
    if err != nil {
        fmt.Printf("error seeking to end of %s: %s\n", path, err)
        os.Exit(1)
    }
    fmt.Printf("%s is %d bytes.\n", path, pos)
}

Preferred block size when reading/writing big binary files

Let the OS make the decision for you. Use the mmap module:

https://docs.python.org/3/library/mmap.html

It uses your OS's underlying memory mapping mechanism for mapping the contents of a file into RAM.

Be aware that there's a 2GB file size limit if you're using 32-bit Python, so be sure to use the 64-bit version if you decide to go this route.

For example:

f1 = open('input_file', 'r+b')
m1 = mmap.mmap(f1.fileno(), 0)
f2 = open('out_file', 'a+b') # out_file must be >0 bytes on windows
m2 = mmap.mmap(f2.fileno(), 0)
m2.resize(len(m1))
m2[:] = m1 # copy input_file to out_file
m2.flush() # flush results

Note that you never had to call any read() functions and decide how many bytes to bring into RAM. This example just copies one file into another, but as you said in your example, you can do whatever processing you need in between. Note that while the entire file is mapped to an address space in RAM, that doesn't mean it has actually been copied there. It will be copied piecewise, at the discretion of the OS.

Determine cluster size of file system in Python

On UNIX/Linux platforms, use Python's built-in os.statvfs. On Windows, unless you can find a third-party library that does it, you'll need to use ctypes to call the Win32 function GetDiskFreeSpace, like this:

import ctypes

sectorsPerCluster = ctypes.c_ulonglong(0)
bytesPerSector = ctypes.c_ulonglong(0)
rootPathName = ctypes.c_wchar_p(u"C:\\")

ctypes.windll.kernel32.GetDiskFreeSpaceW(rootPathName,
    ctypes.pointer(sectorsPerCluster),
    ctypes.pointer(bytesPerSector),
    None,
    None,
)

print(sectorsPerCluster.value, bytesPerSector.value)

Note that ctypes only became part of the Python stdlib in 2.5 or 2.6 (can't remember which).

I put this sort of thing in a function that first checks whether the UNIX variant is present, and falls back to ctypes if (presumably because it's running on Windows) not. That way, if Python ever does implement statvfs on Windows, it will just use that.

Query Size of Block Device File in Python