Fastest way to tell if two files have the same contents in Unix/Linux?
I believe cmp
will stop at the first byte difference:
cmp --silent $old $new || echo "files are different"
Fastest way of finding differences between two files in unix?
You could try..
comm -13 <(sort file1) <(sort file2) > file3
or
grep -Fxvf file1 file2 > file3
or
diff file1 file2 | grep "<" | sed 's/^<//g' > file3
or
join -v 2 <(sort file1) <(sort file2) > file3
How do I compare two files in unix based on their columns
From the condition c[$1] == 0
in the awk
script from the question I assumed you want to print lines from file2
that contain a code that is not present in file1
.
As it is clarified now, that you want to count the codes that are present in both files, see below at the end of the answer for the reverse check.
Slight modifications to your script will fix the problems:
awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1]++ == 0' file1 file2
Option -F ,
specifies comma (,
) as field separator.
The condition if(NR!=1)c[$1]++;
skips the header line in file1
.
The post-increment operator in c[$1]++ == 0
will make the condition fail for the second or later occurrence of the same code in file2
.
I omit the trailing | wc -l
here to show the output lines.
I modified file2
to contain two lines with the same code in column 1 that is not present in file1
.
With file2
shown here
AND,Europe,Andorra,2020-07-26,897.0
ABW,North America,Aruba,2020-03-13,2.0
ABW,North America,Aruba,2020-10-06,4079.0
ALB,Europe,Albania,2020-08-23,8275.1
ALB,Europe,Albania,2020-08-23,8275.2
AFG,Asia,Afghanistan,2020-09-06,38324.0
AFG,Asia,Afghanistan,2020-09-06,38324.0
and file1
from the question I get this output:
AND,Europe,Andorra,2020-07-26,897.0
ALB,Europe,Albania,2020-08-23,8275.1
(Only the first line with ALB
is printed`.)
You can also implemente the counting in awk
instead of using wc -l
.
awk -F , 'NR==FNR { if(NR!=1)c[$1]++; next } c[$1]++ == 0 {count++} END {print count}' file1 file2
If you want to print the lines from file2
that contain a code that is present in file1
, the script can be modified like this:
awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1] { c[$1]=0; print}' file1 file2
This prints
ABW,North America,Aruba,2020-03-13,2.0
AFG,Asia,Afghanistan,2020-09-06,38324.0
(The first line with code ABW
.)
Alternative solution as requested in a comment.
tail -n +2 file1|cut -f1 -d,|sort -u>code1
cut -f1 -d, file2|sort -u>code2
fgrep -vf code1 code2
rm code1 code2
Or combined in one command without using temporary files code1
and code2
:
fgrep -f <(tail -n +2 file1|cut -f1 -d,|sort -u) <(cut -f1 -d, file2|sort -u)
Add | wc -l
to count the lines instead of printing them.
Explanation:
tail -n +2
print everything starting from the 2nd linecut -f1 -d,
print the first field, delimited with ,
sort -u
sort lines and remove duplicatesfgrep -f code1 code2
print all lines from code2
that contain any of the strings from code1
compare two files in UNIX
I got the solution by using comm
comm -23 file1 file2
will give you the desired output.
The files need to be sorted first anyway.
How to detect only the different files in my bash shell script?
Here is your script corrected:
while IFS= read -r filename;
do
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# inspecting the digest of each file individually #
# shows many files are identical and so are the digests #
# It also prints MD5 (full file path) = md5_signature! #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
md5 "old/$filename" # please use double quotes
md5 "new/$filename"
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Using -q eliminates all output from md5 except the sig #
# Your script now works correctly #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
[[ $(md5 -q "old/$filename") == $(md5 -q "new/$filename") ]] || echo differs; # differs
done < files.txt
Problems:
- You had a typo of
new/$fullfile
rather thannew/$filename
- You should use
"new/$filename"
(ie, use double quotes) around the file name expansions - Use
md5 -q
to compare output ofmd5
on different files. Otherwisemd5
, by default, prints the input file path in the form ofMD5 (full_path/base_name) = 2504fcc0c0a57d14aa6b4193b5efaf94
. Since these paths are guaranteed to be different in two different directories, the different path names will cause the failure in the string comparison.
The comments above assume you are using md5
on BSD or, likely, on macOS.
Here is an alternate solution that works both on Linux with md5sum
and BSD with md5
. Just feed the content of the file to the stdin of either program and only the md5 signature is printed:
$ md5 <new/file.pdf
2504fcc0c0a57d14aa6b4193b5efaf94
vs if you use the file name, the path is printed and the MD5 hash signature used is printed:
$ md5 new/file.pdf
MD5 (new/file.pdf) = 2504fcc0c0a57d14aa6b4193b5efaf94
The same holds true for md5sum
on Linux or GNU core utilities.
Compare two files line by line and find the largest and smallest number using shell scripting
You can use:
sort -n file1 file2 > _sorted.tmp
min=$(head -1 _sorted.tmp)
max=$(tail -1 _sorted.tmp)
Without temporary file:
arr=( $(sort -n file1 file2) )
min=${arr[1]}
max=${arr[@]:(-1)}
IF Statement to Compare Two Files in Unix
if [ "$(md5sum < version1.txt)" = "$(md5sum < version2.txt)" ]; then
echo "Files have the same content"
else
echo "Files have NOT the same content"
fi
If one of the MD5 checksums is already computed and stored in a text file, you can use
if [ "$(md5sum < version1.txt)" = "$(awk '{print $1}' md5hash.txt)" ]; then
...
Comparing two files in linux terminal
Here is my solution for this :
mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english
Compare two files and display difference in table form linux shell script
If you would like nice side-by-side output, you can use:
$ diff -y --suppress-common-lines file1.txt file2.txt
Example Use/Output
$ diff -y --suppress-common-lines file1.txt file2.txt
2:tar-1.23-13.el6.x86_64/ | 2:tar-1.23-15.el6_8.x86_64/
> samba-common-3.6.23-43.el6_9.x86_64/
> samba-winbind-clients-3.6.23-43.el6_9.x86_64/
> samba-winbind-3.6.23-43.el6_9.x86_64/
Related Topics
The Return Code from 'Grep' Is Not as Expected on Linux
Current Linux Kernel Debugging Techniques
How Were the Weightings in the Linux Load Computation Chosen
How to Get the First Column of Comm Output
How to See Top Processes Sorted by Actual Memory Usage
Using Rsync Include and Exclude Options to Include Directory and File by Pattern
Timed Out While Waiting for the MAChine to Boot When Vagrant Up
How to Udp Broadcast with C in Linux
Rename Files in Multiple Directories to the Name of the Directory
How to Remove All Special Characters in Linux Text
Extract Tar the Tar.Bz2 File Error
Arch Linux - Apt-Get Update Equivalent Command
Google-Chrome Failed to Move to New Namespace
Can You Prevent a Command from Going into the Bash Shell Command History