Rename Part of File Name Based on Exact Match in Contents of Another File

Rename part of file name based on exact match in contents of another file

You can perfectly combine your for and while loops to only use mv:

while read from to ; do
for i in test* ; do
if [ "$i" != "${i/$from/$to}" ] ; then
mv $i ${i/$from/$to}
fi
done
done < replacements.txt

An alternative solution with sed could consist in using the e command that executes the result of a substitution (Use with caution! Try without the ending e first to print what commands would be executed).

Hence:

sed 's/\(\w\+\)\s\+\(\w\+\)/mv sample_\1\.txt sample_\2\.txt/e' replacements.txt

would parse your replacements.txt file and rename all your .txt files as desired.

We just have to add a loop to deal with the other extentions:

for j in .txt .bak .tsv .fq .fq.abc ; do
sed "s/\(\w\+\)\s\+\(\w\+\)/mv 'sample_\1$j' 'sample_\2$j'/e" replacements.txt
done

(Note that you should get error messages when it tries to rename non-existing files, for example when it tries to execute mv sample_ACGT.fq sample_name1.fq but file sample_ACGT.fq does not exist)

Replace exact part of file name with Powershell

The dot in regex means Any Character. Without escaping that, things go wrong.

Try

Rename-Item -NewName {$_.Name -replace ('{0}$' -f [regex]::Escape('.L.wav')),'_L.wav'}

or manually escape the regex metacharacters:

Rename-Item -NewName {$_.Name -replace '\.L\.wav$','_L.wav'}

The $ at the end anchors the text to match at the end on the string

Also, instead of doing ls *.* | Rename-Item {...}, better use

(Get-ChildItem -Filter '*.L.wav' -File) | Rename-Item {...}

(ls is alias to Get-ChildItem )

  • Using the -Filter you can specify what files you're looking for.
  • Using the -File switch, you make sure you do not also try to rename folder objects.
  • By surrounding the Get-ChildItem part of the code in brackets, you make sure the gathering of the files is complete before you start renaming them. Otherwise, chances are the code will try and keep renaming files that are already processed.

Replacing / removing specific part of file names using regex

You can use the regex:

_\d{6}(\.[^.]+)$

and replace with $1 instead.

The regex is matching 6 digits, then group 1 ((\.[^.]+)) matches the extension, which you replace with in the replacement string. The extension is matched by "a dot followed by a bunch of non-dots". Also note that the end of string anchor $ to assert that all of this must be at the end of the string.

Change your code to:

string newName = Regex.Replace(f.FullName, @"_\d{6}(\.[^.]+)$", "$1");

How to rename files using certain string inside each file?

You can use the following script to achieve your goal. Note, for the script to work on macOS, you either have to install GNU grep via Homebrew, or substitute the grep call with ggrep.

  • The script will search the current directory and all its subdirectories for *.html files.
  • It will substitute only the names of the files that contain the specific tag.
  • For multiple files that containt the same tag, each subsicuent file apart from the first will have an identifier appended to its name. E.g., 1_234.html, 1_234_1.html, 1_234_2.html
  • For files that contain multiple tags, the first tag encountered will be used.
#!/bin/bash

rename_file ()
{
# Check that file name received is an existing regular file
file_name="$(realpath "${1}")"
if [ ! -f "${file_name}" ]; then
echo "No argument or non existing file or non regular file provided"
exit 1
fi

# Get the tag number. If the number does not exist, the variable tag will be
# empty. The first tag on a file will be used if there are multiple tags
# within a file.
tag="$(grep -oP -m 1 '(?<=<div id="myID" style="display:none">).*?(?=</div>)' \
-- "${file_name}")"

# Rename the file only if it contained a tag
if [ -n "${tag}" ]; then
file_path="$(dirname "${file_name}")"

# Change directory to the file's location silently
pushd "${file_path}" > /dev/null || return

# Check for multiple occurences of files with the same tag
if [ -e "${tag}.html" ]; then
counter="$(find ./ -maxdepth 1 -type f -name "${tag}.html" -o -name "${tag}_*.html" | wc -l)"
tag="${tag}_${counter}"
fi

# Rename the file
mv "${file_name}" "${tag}.html"

# Return to previous directory silently
popd > /dev/null || return
fi

}

# Necessary in order to call rename_file from find command within main
export -f rename_file

# The entry point function of the script. This function searches for all the
# html files in the directory that the script is run, and all subdirectories.
# The function calls rename_files upon each of the found files.
main ()
{
find ./ -type f -name "*.html" -exec bash -c 'rename_file "${1}"' _ {} \;
}

main

mv/rename files with common part but unknown file pattern

I wasn't looking at the right place for my problem
the if was my reel problem. this is ok!

ABC_Files=$(ls "$DOSSIER/$OLD_NAME"*.abc 2> /dev/null | wc -l)
if [ **"$ABC_Files" != "0"** ];
then
for i in "${DOSSIER}/$OLD_NAME"*.abc; do
[ -f "$i" ] || continue
mv "$i" "${i/$OLD_NAME/$NEW_NAME}"
done
fi

of course assuming you know that

$DOSSIER is the path

$OLD_NAME is your actual filename

$NEW_NAME is your new filename

Match file names and replace with new name

A simple solution in python:

from collections import OrderedDict
LINES_PER_CYCLE = 1000

with open('output.txt', 'wb') as output, open('test_2.txt', 'rb') as fin:
fin_line = ''

# Loop until fin reaches EOF.
while True:
cache = OrderedDict()

# Fill the cache with up to LINES_PER_CYCLE entries.
for _ in xrange(LINES_PER_CYCLE):
fin_line = fin.readline()
if not fin_line:
break

key, rest = fin_line.strip().split(' ', 1)
cache[key] = ['', rest]

# Loop over the file_1.txt to find tags with given id.
with open('test_1.txt', 'rb') as fout:
for line in fout:
tag, _ = line.split(' ', 1)
_, idx = tag.rsplit('_', 1)
if idx in cache:
cache[idx][0] = tag

# Write matched lines to the output file, in the same order
# as the lines were inserted into the cache.
for _, (tag, rest) in cache.iteritems():
output.write('{} {}\n'.format(tag, rest))

# If fin has reached EOF, break.
if not fin_line:
break

What it does is reading up to LINES_PER_CYCLE entries from the file_2.txt, finding matching entries in file_1.txt and writing to the output. As a result of limited memory (for cache), file_1.txt is searched through multiple times.

This assumes that the tag/id part is separated by whitespace from the -------, and that the tag and id are separated by an underscore from themselves, ie. 'tag_idx blah blah'.



Related Topics



Leave a reply



Submit