How to check for duplicate files, if duplicates are found then append filenames?
You can have cp
make the backup:
cp --backup --suffix=.JPG "$image" "$DEST"
From man cp
:
--backup[=CONTROL]
make a backup of each existing destination file
-S, --suffix=SUFFIX
override the usual backup suffix
Bash Script To Check for Duplicate File Names from Command Line Arguments
You are overcomplicating things.
for var in "$@"
do
if [ -e "$dirname/$var" ]; then
prompt user with options #i can handle this part :)
else
mv "$var" "$dirName"
fi
done
Make sure you use adequate quoting everywhere, by the way. Variables which contain file names should basically always be double quoted.
How to find duplicate files with same name but in different case that exist in same directory in Linux?
The other answer is great, but instead of the "rather monstrous" perl script i suggest
perl -pe 's!([^/]+)$!lc $1!e'
Which will lowercase just the filename part of the path.
Edit 1: In fact the entire problem can be solved with:
find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'
Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:
find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1
Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find
. Not so elegant, but still:
#!/usr/bin/perl -w
use strict;
use warnings;
my %dup_series_per_dir;
while (<>) {
my ($dir, $file) = m!(.*/)?([^/]+?)$!;
push @{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}
for my $dir (sort keys %dup_series_per_dir) {
my @all_dup_series_in_dir = grep { @{$_} > 1 } values %{$dup_series_per_dir{$dir}};
for my $one_dup_series (@all_dup_series_in_dir) {
print "$dir\{" . join(',', sort @{$one_dup_series}) . "}\n";
}
}
Linux: what's a fast way to find all duplicate files in a directory?
On Debian 11:
% mkdir files; (cd files; echo "one" > 1; echo "two" > 2a; cp 2a 2b)
% find files/ -type f -print0 | xargs -0 md5sum | tee listing.txt | \
awk '{print $1}' | sort | uniq -c | awk '$1>1 {print $2}' > dups.txt
% grep -f dups.txt listing.txt
c193497a1a06b2c72230e6146ff47080 files/2a
c193497a1a06b2c72230e6146ff47080 files/2b
- Find and print all files null terminated (
-print0
). - Use
xargs
tomd5sum
them. - Save a copy of the sums and filenames in "listing.txt" file.
- Grab the sum and pass to
sort
thenuniq -c
to count, saving into the "dups.txt" file. - Use
awk
to list duplicates, thengrep
to find the sum and filename.
Related Topics
Compute Base64 Encoded Hash from a Given Hash
What Is Segment 00 in My Linux Executable Program (64 Bits)
Linux Script Start,Stop,Restart
Get and Use a Password with Special Characters in Bash Shell
Environment Variable Used in Shell Script Appear Blank in Log File When Run by Cron
Can Libpcap Reassemble Tcp Segments
Installing Gnu Parallel Without Root Permission
Why Doesn't Linux Accept() Return Eintr
Curl Command Doesn't Work in Bash Script
Delete Files with Backslash in Linux
/Var/Log/Daemon.Log Taking More Space How to Reduce It
Determine Vm Size of Process Killed by Oom-Killer
R Package Installation in Linux
Rsync, 'Uid/Gid Impossible to Set' Cases Cause Future Hard Link Failure, How to Fix