Best Approach to Detecting a Move or Rename to a File in Linux

Best approach to detecting a move or rename to a file in Linux?

Look into inotify which allows you to get a call when anybody does anything to files in a specified file system or directory.

How to move and rename each file in folder to top folder? [linux shell] [bash]

folder="folderX"
find "$folder" -type f -exec cp '{}' '{}'.bak \; -exec mv '{}'.bak "$folder" \;

Set the root folder/directory in a variable folder and then use this to run find and then exec. Find files in the directory structure and first copy the name of the file to the name of the file followed by ".bak" and then move the file from the directory to the root folder/directory

How do the UNIX commands mv and rm work with open files?

Unix filesystems use reference counting and a two-layer architecture for finding files.

The filename refers to something called an inode, for information node or index node. The inode stores (a pointer to) the file contents as well as some metadata, such as the file's type (ordinary, directory, device, etc.) and who owns it.

Multiple filenames can refer to the same inode; they are then called hard links. In addition, a file descriptor (fd) refers to an inode. An fd is the type of object a process gets when it opens a file.

A file in a Unix filesystem only disappears when the last reference to it is gone, so when there are no more names (hard links) or fd's referencing it. So, rm does not actually remove a file; it removes a reference to a file.

This filesystem setup may seem confusing and it sometimes poses problems (esp. with NFS), but it has the benefit that locking is not necessary for a lot of applications. Many Unix programs also use the situation to their advantage by opening a temporary file and deleting it immediately after. As soon as they terminate, even if they crash, the temporary file is gone.

How does git detect similar files, for its rename detection?

Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using git mv or git rm and git add is effectively the same.)

When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes.

To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the -M flag for git diff. There are also configuration values such as merge.renameLimit (the number of files to consider when performing rename detection during a merge).

To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code.

Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.

Python/Linux: How to determine when a moved file is fully available?

For a conceptual explanation of Atomic and cross filesystem moves, refer this moves in Python ( can really save your time)

You can take the following approaches to deal with your problem:-

->Monitor Filesystem Events with Pyinotify usage of Pynotify

-> Lock the file for few seconds using flock

-> Using lsof we can basically check for the processes that are using a particular file.

`from subprocess import check_output,Popen, PIPE
try:
lsout=Popen(['lsof',filename],stdout=PIPE, shell=False)
check_output(["grep",filename], stdin=lsout.stdout, shell=False)
except:
#check_output will throw an exception here if it won't find any process using that file`

just write your log processing code in the except part and you are good to go.

-> a daemon that monitors the parent folder for any changes, by using, E.G., the watchdog library watchdog implementation

-> You can either check the file which is being used by another process by looping through the PID/s in /proc for a specific id (assuming you have the control over the program which is adding the new files continuously to identify its id).

-> Can check if a file has a handle on it using psutil.

Rename multiple files by replacing a particular pattern in the filenames using a shell script

An example to help you get off the ground.

for f in *.jpg; do mv "$f" "$(echo "$f" | sed s/IMG/VACATION/)"; done

In this example, I am assuming that all your image files contain the string IMG and you want to replace IMG with VACATION.

The shell automatically evaluates *.jpg to all the matching files.

The second argument of mv (the new name of the file) is the output of the sed command that replaces IMG with VACATION.

If your filenames include whitespace pay careful attention to the "$f" notation. You need the double-quotes to preserve the whitespace.

Recursively rename files using find and sed

This happens because sed receives the string {} as input, as can be verified with:

find . -exec echo `echo "{}" | sed 's/./foo/g'` \;

which prints foofoo for each file in the directory, recursively. The reason for this behavior is that the pipeline is executed once, by the shell, when it expands the entire command.

There is no way of quoting the sed pipeline in such a way that find will execute it for every file, since find doesn't execute commands via the shell and has no notion of pipelines or backquotes. The GNU findutils manual explains how to perform a similar task by putting the pipeline in a separate shell script:

#!/bin/sh
echo "$1" | sed 's/_test.rb$/_spec.rb/'

(There may be some perverse way of using sh -c and a ton of quotes to do all this in one command, but I'm not going to try.)

Bash - How to rename files inside a directory based on names.txt

Since rename operation with mv command is not reversible.

  1. Suggesting to write all mv commands into a file renames.sh.

    awk '{print "mv \""$1"\"", "\""$2"\""}' RS="\n\n" RSnames.txt > renames.sh
  2. Inspect and correct the renames.sh file.

Note: clear all quotes ' and/or double-quotes " from your file names. Each file name needs to be wrapped in double-quotes ". Script will fail if there is ' or " in file name.


  1. Execute all mv commands in a renames.sh file. By running the renames.sh files as a script.

    bash renames.sh

awk script explantion:

RS="\n\n"

Set awk record separator to empty line.

print "mv \""$1"\"", "\""$2"\""

Print a bash command mv "file1" "file2" .

File1 retrieved from 1st awk field $1, File2 retrieved from 2nd awk field $2.



Related Topics



Leave a reply



Submit