Fastest Way to Tell If Two Files Have the Same Contents in Unix/Linux

Fastest way to tell if two files have the same contents in Unix/Linux?

I believe cmp will stop at the first byte difference:

cmp --silent $old $new || echo "files are different"

Fastest way of finding differences between two files in unix?

You could try..

comm -13 <(sort file1) <(sort file2) > file3

or

grep -Fxvf file1 file2 > file3

or

diff file1 file2 | grep "<" | sed 's/^<//g'  > file3

or

join -v 2 <(sort file1) <(sort file2) > file3

How do I compare two files in unix based on their columns

From the condition c[$1] == 0 in the awk script from the question I assumed you want to print lines from file2 that contain a code that is not present in file1.

As it is clarified now, that you want to count the codes that are present in both files, see below at the end of the answer for the reverse check.

Slight modifications to your script will fix the problems:

awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1]++ == 0' file1 file2

Option -F , specifies comma (,) as field separator.

The condition if(NR!=1)c[$1]++; skips the header line in file1.

The post-increment operator in c[$1]++ == 0 will make the condition fail for the second or later occurrence of the same code in file2.

I omit the trailing | wc -l here to show the output lines.

I modified file2 to contain two lines with the same code in column 1 that is not present in file1.

With file2 shown here

AND,Europe,Andorra,2020-07-26,897.0
ABW,North America,Aruba,2020-03-13,2.0
ABW,North America,Aruba,2020-10-06,4079.0
ALB,Europe,Albania,2020-08-23,8275.1
ALB,Europe,Albania,2020-08-23,8275.2
AFG,Asia,Afghanistan,2020-09-06,38324.0
AFG,Asia,Afghanistan,2020-09-06,38324.0

and file1 from the question I get this output:

AND,Europe,Andorra,2020-07-26,897.0
ALB,Europe,Albania,2020-08-23,8275.1

(Only the first line with ALB is printed`.)

You can also implemente the counting in awk instead of using wc -l.

awk -F , 'NR==FNR { if(NR!=1)c[$1]++; next } c[$1]++ == 0 {count++} END {print count}' file1 file2

If you want to print the lines from file2 that contain a code that is present in file1, the script can be modified like this:

awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1] { c[$1]=0; print}' file1 file2

This prints

ABW,North America,Aruba,2020-03-13,2.0
AFG,Asia,Afghanistan,2020-09-06,38324.0

(The first line with code ABW.)


Alternative solution as requested in a comment.

tail -n +2 file1|cut -f1 -d,|sort -u>code1
cut -f1 -d, file2|sort -u>code2
fgrep -vf code1 code2
rm code1 code2

Or combined in one command without using temporary files code1 and code2:

fgrep -f <(tail -n +2 file1|cut -f1 -d,|sort -u) <(cut -f1 -d, file2|sort -u)

Add | wc -l to count the lines instead of printing them.

Explanation:

tail -n +2 print everything starting from the 2nd line

cut -f1 -d, print the first field, delimited with ,

sort -u sort lines and remove duplicates

fgrep -f code1 code2 print all lines from code2 that contain any of the strings from code1

compare two files in UNIX

I got the solution by using comm

comm -23 file1 file2 

will give you the desired output.

The files need to be sorted first anyway.

How to detect only the different files in my bash shell script?

Here is your script corrected:

while IFS= read -r filename;
do
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# inspecting the digest of each file individually #
# shows many files are identical and so are the digests #
# It also prints MD5 (full file path) = md5_signature! #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
md5 "old/$filename" # please use double quotes
md5 "new/$filename"
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Using -q eliminates all output from md5 except the sig #
# Your script now works correctly #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

[[ $(md5 -q "old/$filename") == $(md5 -q "new/$filename") ]] || echo differs; # differs
done < files.txt

Problems:

  1. You had a typo of new/$fullfile rather than new/$filename
  2. You should use "new/$filename" (ie, use double quotes) around the file name expansions
  3. Use md5 -q to compare output of md5 on different files. Otherwise md5, by default, prints the input file path in the form of MD5 (full_path/base_name) = 2504fcc0c0a57d14aa6b4193b5efaf94. Since these paths are guaranteed to be different in two different directories, the different path names will cause the failure in the string comparison.

The comments above assume you are using md5 on BSD or, likely, on macOS.

Here is an alternate solution that works both on Linux with md5sum and BSD with md5. Just feed the content of the file to the stdin of either program and only the md5 signature is printed:

$ md5 <new/file.pdf
2504fcc0c0a57d14aa6b4193b5efaf94

vs if you use the file name, the path is printed and the MD5 hash signature used is printed:

$ md5 new/file.pdf
MD5 (new/file.pdf) = 2504fcc0c0a57d14aa6b4193b5efaf94

The same holds true for md5sum on Linux or GNU core utilities.

Compare two files line by line and find the largest and smallest number using shell scripting

You can use:

sort -n file1 file2 > _sorted.tmp
min=$(head -1 _sorted.tmp)
max=$(tail -1 _sorted.tmp)

Without temporary file:

arr=( $(sort -n file1 file2) )
min=${arr[1]}
max=${arr[@]:(-1)}

IF Statement to Compare Two Files in Unix

if [ "$(md5sum < version1.txt)" = "$(md5sum < version2.txt)" ]; then
echo "Files have the same content"
else
echo "Files have NOT the same content"
fi

If one of the MD5 checksums is already computed and stored in a text file, you can use

if [ "$(md5sum < version1.txt)" = "$(awk '{print $1}' md5hash.txt)" ]; then
...

Comparing two files in linux terminal

Here is my solution for this :

mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english

Compare two files and display difference in table form linux shell script

If you would like nice side-by-side output, you can use:

$ diff -y --suppress-common-lines file1.txt file2.txt

Example Use/Output

$ diff -y --suppress-common-lines file1.txt file2.txt
2:tar-1.23-13.el6.x86_64/ | 2:tar-1.23-15.el6_8.x86_64/
> samba-common-3.6.23-43.el6_9.x86_64/
> samba-winbind-clients-3.6.23-43.el6_9.x86_64/
> samba-winbind-3.6.23-43.el6_9.x86_64/


Related Topics



Leave a reply



Submit