Why can I successfully move a file in Linux while it is being written to?
When you move a file inside the same filesystem, the file itself (the inode) isn't moved at all. The only thing that changes are the directory entries in that filesystem. (The system call invoked by mv
in this case is rename(2)
- check that page for additional information and restrictions.)
When a process opens a file, the filename is passed to the OS to indicate which file is meant, but the file descriptor you get back isn't linked to that name at all (you can't get back a filename from it) – it is linked to the inode.
Since the inode remains unchanged when you rename a file (inside the same filesystem), processes that have it open can happily keep reading from and writing to it – nothing changed for them, their file descriptor is still valid and pointing to the right data.
Same thing if you delete a file. Processes can keep reading and writing from it even if the file is no longer reachable through any directory entry. (This can lead to confusing situations where df
reports that your disk is full, but du
says you're using much less space that df
reports. The blocks assigned to deleted files that are still open won't be released until those processes close their file descriptor.)
If the mv
moves the file across filesystems, then the behavior is different since inodes are specific to each filesystem. In that case, mv
will actually copy the data over, creating a new inode (and directory entry) on the destination filesystem. When the copy is over, the old file is unlinked, and removed if there are no open filehandles on it, as above.
In your case, if you had crossed a filesystem boundary, you'd have a partial file in the destination. And your upload process happily writing to a deleted file you can't easily access, possibly filling up that filesystem, until the upload finished at which point the inode would get dropped.
Some posts on Unix & Linux that you could find interesting:
- How are directories implemented in Unix filesystems?
- What is a Superblock, Inode, Dentry and a File?
Linux Move txt files not start with String to Another folder
To move only regular files, add -type f
and exclude the files you don't want with \!
:
find . -type f -name '*.txt' \! -name '*tmp*' -exec mv -i -t /home {} +
The -i
asks for permission to overwrite a file if it already exists and the +
instead of the \;
is used to move as many files as possible with one invocation of mv
(and thus we need to specify the target directory with the -t
option as {}
contains more than one file).
move only if file exists in a shell script
You should test if the file exists
if [ -f blah ]; then
mv blah destination
fi
Move files to different directories based on file name tokens
It's not a bash script, but perl is much better at this kind of thing and is installed on all Linux systems
while(<>) {
chomp;
$file = $_;
($colour, $location, $name, $year, $city, $numbers) = split(/_/,$file);
$dest0 = "/dir/work/$colour";
$dest1 = "$dest0/$name";
mkdir ($dest0) unless (-d $dest0);
mkdir ($dest1) unless (-d $dest1);
rename ($file, "$dest1/$file");
}
The script splits your input file on the underscore character, creates all the directories to the destination and then renames the file to the new filename. Rename takes care of all the copying and deleting for you. In fact it just changes the directory entries without any copying at all.
UPDATE
The above version takes its input from a file containing a list of filenames to process. For an alternative version which processes all files in the current directory, replace the while
line with
while(glob("*")) {
Prevent files to be moved by another process in linux
First and foremost: rename is atomic. It is not possible for a file to be moved twice. One of the moves will fail, because the file is no longer there. If the scripts run in parallel, both list the same 10 files and instead of first 10 files moved to /tmp/task1
and next 10 to /tmp/task2
you may get 4 moved to /tmp/task1
and 6 to /tmp/task2
. Or maybe 5 and 5 or 9 and 1 or any other combination. But each file will only end up in one task.
So nothing is incorrect; each file is still processed only once. But it will be inefficient, because you could process 10 files at a time, but you are only processing 5. If you want to make sure you always process 10 if there is enough files available, you will have to do some synchronization. There are basically two options:
Place lock around the list+copy. This is most easily done using
flock
(1) and a lock file. There are two ways to call that too:Call the whole copying operation via flock:
flock targdir -c copy-script
This requires that you make the part that should be excluded a separate script.
Lock via file descriptor. Before the copying, do
exec 3>targdir/.lock
flock 3and after it do
flock -u 3
This lets you lock over part of the script only. This does not work in Cygwin (but you probably don't need that).
Move the files one by one until you have enough.
ls -1h targdir/*.json > ${TMP_LIST_FILE}
# ^^^ do NOT limit here
COUNT=0
while read REMOTE_FILE
do
if mv $REMOTE_FILE $SCRDRL 2>/dev/null; then
COUNT=$(($COUNT + 1))
fi
if [ "$COUNT" -ge "$LIMIT" ]; then
break
fi
done < "${TMP_LIST_FILE}"
rm -f "${TMP_LIST_FILE}"The
mv
will sometimes fail, in which case you don't count the file and try to move the next one, assuming themv
failed because the file was meanwhile moved by the other script. Each script copies at most$LIMIT
files, but it may be rather random selection.
On a side-note if you don't absolutely need to set environment variables in the while
loop, you can do without a temporary file. Simply:
ls -1h targdir/*.json | while read REMOTE_FILE
do
...
done
You can't propagate variables out of such loop, because as part of a pipeline it runs in subshell.
If you do need to set environment variables and can live with using bash specifically (I usually try to stick to /bin/sh
), you can also write
while read REMOTE_FILE
do
...
done <(ls -1h targdir/*.json)
In this case the loop runs in current shell, but this kind of redirection is bash extension.
Related Topics
What's the Location of the Javafx Runtime Jar File, Jfxrt.Jar, on Linux
Tool for Creating a Java Daemon Service on Linux
Java Is Installed, in Listing, But Execution Produces "./Java: No Such File or Directory"
Java Jsch Changing User on Remote MAChine and Execute Command
Tomcat 7: How to Set Initial Heap Size Correctly
Difference Between Using Java.Library.Path and Ld_Library_Path
How to Get the Ip of the Computer on Linux Through Java
How to Kill a Linux Process in Java with Sigkill Process.Destroy() Does Sigterm
Cannot Load R Xlsx Package on MAC Os 10.11
Wait Until Tomcat Finishes Starting Up
How to Make the Method Return Type Generic
Difference Between Getattribute() and Getparameter()
Determine If a String Is an Integer in Java
Maven Dependencies Are Failing With a 501 Error