Bash: Checking If Files Are Duplicates Within a Directory

How to check for duplicate files, if duplicates are found then append filenames?

You can have cp make the backup:

cp --backup --suffix=.JPG "$image" "$DEST"

From man cp:

--backup[=CONTROL]
make a backup of each existing destination file

-S, --suffix=SUFFIX
override the usual backup suffix

Bash Script To Check for Duplicate File Names from Command Line Arguments

You are overcomplicating things.

for var in "$@"
do
if [ -e "$dirname/$var" ]; then
prompt user with options #i can handle this part :)
else
mv "$var" "$dirName"
fi
done

Make sure you use adequate quoting everywhere, by the way. Variables which contain file names should basically always be double quoted.

How to find duplicate files with same name but in different case that exist in same directory in Linux?

The other answer is great, but instead of the "rather monstrous" perl script i suggest

perl -pe 's!([^/]+)$!lc $1!e'

Which will lowercase just the filename part of the path.

Edit 1: In fact the entire problem can be solved with:

find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'

Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:

find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1

Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find. Not so elegant, but still:

#!/usr/bin/perl -w

use strict;
use warnings;

my %dup_series_per_dir;
while (<>) {
my ($dir, $file) = m!(.*/)?([^/]+?)$!;
push @{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}

for my $dir (sort keys %dup_series_per_dir) {
my @all_dup_series_in_dir = grep { @{$_} > 1 } values %{$dup_series_per_dir{$dir}};
for my $one_dup_series (@all_dup_series_in_dir) {
print "$dir\{" . join(',', sort @{$one_dup_series}) . "}\n";
}
}

Linux: what's a fast way to find all duplicate files in a directory?

On Debian 11:

% mkdir files; (cd files; echo "one" > 1; echo "two" > 2a; cp 2a 2b)
% find files/ -type f -print0 | xargs -0 md5sum | tee listing.txt | \
awk '{print $1}' | sort | uniq -c | awk '$1>1 {print $2}' > dups.txt
% grep -f dups.txt listing.txt
c193497a1a06b2c72230e6146ff47080 files/2a
c193497a1a06b2c72230e6146ff47080 files/2b
  • Find and print all files null terminated (-print0).
  • Use xargs to md5sum them.
  • Save a copy of the sums and filenames in "listing.txt" file.
  • Grab the sum and pass to sort then uniq -c to count, saving into the "dups.txt" file.
  • Use awk to list duplicates, then grep to find the sum and filename.


Related Topics



Leave a reply



Submit