Files on Multiple Processes

files on multiple processes

No, other processes can change the file contents as you are reading it. Try running "man fcntl" and ignore the section on "advisory" locks; those are "optional" locks that processes only have to pay attention to if they want. Instead, look for the (alas, non-POSIX) "mandatory" locks. Those are the ones that will protect you from other programs. Try a read lock.

Can multiple processes write to the same folder?

This has absolutely nothing to do with Python, as file operations in Python use OS level system calls (unless run as root, your Python program would not have permissions to do raw device writes anyway and doing them as root would be incredibly stupid).

A little bit of file system theory if anyone cares to read:

Yes, if you study file system architecture and how data is actually stored on drives, there are similarities between files and directories - but only on data storage level. The reason being there is no need to separate these two. For example ext4 file system has a method of storing information about a file (metadata), stored in small units called inodes, and the actual file itself. Inode contains a pointer to the actual disk space where file data can be found.

File systems generally are rather agnostic to directories. A file system is basically just this: it contains information about free disk space, information about files with pointers to data, and the actual data. Part of metadata is the directory where the file resides. In modern file systems (ancient FAT is the exception that is still in use) data storage on disk is not related to directories. Directories are used to allow both humans and the computer implementing the file system locate files and folders quickly instead of walking through sequentially the list of inodes until the correct file is found.

You may have read that directories are just files. Yes, they are "files" that contain either a list of files in it (or actually a tree but please do not confuse this with a directory tree - it is just a mechanism of storing information about large directories so that files in that directory do not need to be searched sequentially within the directory entry). The reason this is a file is that it is the mechanism how file systems store data. There is no need to have a specific data storage mechanism, as a directory only contains a list of files and pointers to their inodes. You could think of it as a database or even simpler, a text file. But in the end it is just a file that contains pointers, not something that is allocated on the disk surface to contain the actual files stored in the directory.

That was the background.

The file system implementation on your computer is just a piece of software that knows how to deal with all this. When you open a file in a certain directory for writing, something like this usually happens:

  1. A free inode is located and an entry created there
  2. Free clusters / blocks database is queried to find storage space for the file contents
  3. File data is stored and blocks/clusters are marked "in use" in that database
  4. Inode is updated to contain file metadata and a
    pointer to this disk space
  5. "File" containing the directory data of
    the target directory is located
  6. This file is modified so that one
    record is added. This record has a pointer to the inode just
    created, and the file name as well
  7. Inode of the file is updated to
    contain a link to the directory, too.

It is the job of operating system and file system driver within it to ensure all this happens consistently. In practice it means the file system driver queues operations. Writing several files into the same directory simultaneously is a routine operation - for example web browser cache directories get updated this way when you browse the internet. Under the hood the file system driver queues these operations and completes steps 1-7 for each new file before it starts processing the following operation.

To make it a bit more complex there is a journal acting as an intermediate buffer. Your transactions are written to the journal, and when the file system is idle, the file system driver commits the journal transactions to the actual storage space, but theory remains the same. This is a performance and reliability issue.

You do not need to worry about this on application level, as it is the job of the operating system to do all that.

In contrast, if you create a lot of randomly named files in the same directory, in theory there could be a conflict at some point if your random name generator produced two identical file names. There are ways to mitigate this, and this would be the part you need to worry about in your application. But anything deeper than that is the task of the operating system.



Related Topics



Leave a reply



Submit