Why How to Successfully Move a File in Linux While It Is Being Written To

Why can I successfully move a file in Linux while it is being written to?

When you move a file inside the same filesystem, the file itself (the inode) isn't moved at all. The only thing that changes are the directory entries in that filesystem. (The system call invoked by mv in this case is rename(2) - check that page for additional information and restrictions.)

When a process opens a file, the filename is passed to the OS to indicate which file is meant, but the file descriptor you get back isn't linked to that name at all (you can't get back a filename from it) – it is linked to the inode.

Since the inode remains unchanged when you rename a file (inside the same filesystem), processes that have it open can happily keep reading from and writing to it – nothing changed for them, their file descriptor is still valid and pointing to the right data.

Same thing if you delete a file. Processes can keep reading and writing from it even if the file is no longer reachable through any directory entry. (This can lead to confusing situations where df reports that your disk is full, but du says you're using much less space that df reports. The blocks assigned to deleted files that are still open won't be released until those processes close their file descriptor.)

If the mv moves the file across filesystems, then the behavior is different since inodes are specific to each filesystem. In that case, mv will actually copy the data over, creating a new inode (and directory entry) on the destination filesystem. When the copy is over, the old file is unlinked, and removed if there are no open filehandles on it, as above.

In your case, if you had crossed a filesystem boundary, you'd have a partial file in the destination. And your upload process happily writing to a deleted file you can't easily access, possibly filling up that filesystem, until the upload finished at which point the inode would get dropped.

Some posts on Unix & Linux that you could find interesting:

  • How are directories implemented in Unix filesystems?
  • What is a Superblock, Inode, Dentry and a File?

Linux Move txt files not start with String to Another folder

To move only regular files, add -type f and exclude the files you don't want with \!:

find . -type f -name '*.txt' \! -name '*tmp*' -exec mv -i -t /home {} +

The -i asks for permission to overwrite a file if it already exists and the + instead of the \; is used to move as many files as possible with one invocation of mv (and thus we need to specify the target directory with the -t option as {} contains more than one file).

move only if file exists in a shell script

You should test if the file exists

if [ -f blah ]; then
mv blah destination
fi

Move files to different directories based on file name tokens

It's not a bash script, but perl is much better at this kind of thing and is installed on all Linux systems

while(<>) {
chomp;
$file = $_;
($colour, $location, $name, $year, $city, $numbers) = split(/_/,$file);
$dest0 = "/dir/work/$colour";
$dest1 = "$dest0/$name";
mkdir ($dest0) unless (-d $dest0);
mkdir ($dest1) unless (-d $dest1);
rename ($file, "$dest1/$file");
}

The script splits your input file on the underscore character, creates all the directories to the destination and then renames the file to the new filename. Rename takes care of all the copying and deleting for you. In fact it just changes the directory entries without any copying at all.

UPDATE

The above version takes its input from a file containing a list of filenames to process. For an alternative version which processes all files in the current directory, replace the while line with

while(glob("*")) {

Prevent files to be moved by another process in linux

First and foremost: rename is atomic. It is not possible for a file to be moved twice. One of the moves will fail, because the file is no longer there. If the scripts run in parallel, both list the same 10 files and instead of first 10 files moved to /tmp/task1 and next 10 to /tmp/task2 you may get 4 moved to /tmp/task1 and 6 to /tmp/task2. Or maybe 5 and 5 or 9 and 1 or any other combination. But each file will only end up in one task.

So nothing is incorrect; each file is still processed only once. But it will be inefficient, because you could process 10 files at a time, but you are only processing 5. If you want to make sure you always process 10 if there is enough files available, you will have to do some synchronization. There are basically two options:

  1. Place lock around the list+copy. This is most easily done using flock(1) and a lock file. There are two ways to call that too:

    1. Call the whole copying operation via flock:

      flock targdir -c copy-script

      This requires that you make the part that should be excluded a separate script.

    2. Lock via file descriptor. Before the copying, do

      exec 3>targdir/.lock
      flock 3

      and after it do

      flock -u 3

      This lets you lock over part of the script only. This does not work in Cygwin (but you probably don't need that).

  2. Move the files one by one until you have enough.

    ls -1h targdir/*.json > ${TMP_LIST_FILE}
    # ^^^ do NOT limit here
    COUNT=0
    while read REMOTE_FILE
    do
    if mv $REMOTE_FILE $SCRDRL 2>/dev/null; then
    COUNT=$(($COUNT + 1))
    fi
    if [ "$COUNT" -ge "$LIMIT" ]; then
    break
    fi
    done < "${TMP_LIST_FILE}"
    rm -f "${TMP_LIST_FILE}"

    The mv will sometimes fail, in which case you don't count the file and try to move the next one, assuming the mv failed because the file was meanwhile moved by the other script. Each script copies at most $LIMIT files, but it may be rather random selection.

On a side-note if you don't absolutely need to set environment variables in the while loop, you can do without a temporary file. Simply:

ls -1h targdir/*.json | while read REMOTE_FILE
do
...
done

You can't propagate variables out of such loop, because as part of a pipeline it runs in subshell.

If you do need to set environment variables and can live with using bash specifically (I usually try to stick to /bin/sh), you can also write

while read REMOTE_FILE
do
...
done <(ls -1h targdir/*.json)

In this case the loop runs in current shell, but this kind of redirection is bash extension.



Related Topics



Leave a reply



Submit