Copy differences between two files in unix
Another way to get diff is by using awk:
awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
Though I must admit that I haven't run any benchmarks and can't say which is the fastest solution.
Fastest way of finding differences between two files in unix?
You could try..
comm -13 <(sort file1) <(sort file2) > file3
or
grep -Fxvf file1 file2 > file3
or
diff file1 file2 | grep "<" | sed 's/^<//g' > file3
or
join -v 2 <(sort file1) <(sort file2) > file3
compare two files in UNIX
I got the solution by using comm
comm -23 file1 file2
will give you the desired output.
The files need to be sorted first anyway.
Compare two files in unix and add the delta to one file
Sort the two files together with the -u
option to remove duplicates.
sort -u File1.txt File2.txt > NewFile.txt && mv NewFile.txt File1.txt
Comparing differences two files in Unix to produce bool
You're looking for cmp
:
if cmp -s file1 file2; then
echo "They're the same."
else
echo "They're different"
fi
Compare two files line by line and generate the difference in another file
diff(1) is not the answer, but comm(1) is.
NAME
comm - compare two sorted files line by line
SYNOPSIS
comm [OPTION]... FILE1 FILE2
...
-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files
So
comm -2 -3 file1 file2 > file3
The input files must be sorted. If they are not, sort them first. This can be done with a temporary file, or...
comm -2 -3 <(sort file1) <(sort file2) > file3
provided that your shell supports process substitution (bash does).
How do I compare two files in unix based on their columns
From the condition c[$1] == 0
in the awk
script from the question I assumed you want to print lines from file2
that contain a code that is not present in file1
.
As it is clarified now, that you want to count the codes that are present in both files, see below at the end of the answer for the reverse check.
Slight modifications to your script will fix the problems:
awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1]++ == 0' file1 file2
Option -F ,
specifies comma (,
) as field separator.
The condition if(NR!=1)c[$1]++;
skips the header line in file1
.
The post-increment operator in c[$1]++ == 0
will make the condition fail for the second or later occurrence of the same code in file2
.
I omit the trailing | wc -l
here to show the output lines.
I modified file2
to contain two lines with the same code in column 1 that is not present in file1
.
With file2
shown here
AND,Europe,Andorra,2020-07-26,897.0
ABW,North America,Aruba,2020-03-13,2.0
ABW,North America,Aruba,2020-10-06,4079.0
ALB,Europe,Albania,2020-08-23,8275.1
ALB,Europe,Albania,2020-08-23,8275.2
AFG,Asia,Afghanistan,2020-09-06,38324.0
AFG,Asia,Afghanistan,2020-09-06,38324.0
and file1
from the question I get this output:
AND,Europe,Andorra,2020-07-26,897.0
ALB,Europe,Albania,2020-08-23,8275.1
(Only the first line with ALB
is printed`.)
You can also implemente the counting in awk
instead of using wc -l
.
awk -F , 'NR==FNR { if(NR!=1)c[$1]++; next } c[$1]++ == 0 {count++} END {print count}' file1 file2
If you want to print the lines from file2
that contain a code that is present in file1
, the script can be modified like this:
awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1] { c[$1]=0; print}' file1 file2
This prints
ABW,North America,Aruba,2020-03-13,2.0
AFG,Asia,Afghanistan,2020-09-06,38324.0
(The first line with code ABW
.)
Alternative solution as requested in a comment.
tail -n +2 file1|cut -f1 -d,|sort -u>code1
cut -f1 -d, file2|sort -u>code2
fgrep -vf code1 code2
rm code1 code2
Or combined in one command without using temporary files code1
and code2
:
fgrep -f <(tail -n +2 file1|cut -f1 -d,|sort -u) <(cut -f1 -d, file2|sort -u)
Add | wc -l
to count the lines instead of printing them.
Explanation:
tail -n +2
print everything starting from the 2nd linecut -f1 -d,
print the first field, delimited with ,
sort -u
sort lines and remove duplicatesfgrep -f code1 code2
print all lines from code2
that contain any of the strings from code1
Compare two files and append the differences at the end
I think you can solve this problem by using diff -U <large number>
. This will give you output that will be easy to parse to reconstruct what you want. If <large number>
is larger than the longer of your two files, then you will get a predictable output format:
$diff -u 1000 file1 file2
--- file1 2019-07-22 14:39:39.344674000 -0400
+++ file2 2019-07-22 14:39:45.072654000 -0400
@@ -1,4 +1,4 @@
A
+B
C
-D
E
Then you can use grep and sed to reconstruct the two output files you want:
diff -u 1000 file1 file2 | sed '1,3d' > tmp
grep '^ ' tmp | sed 's/^ //' > file1.out
cp file1.out file2.out
grep '^-' tmp | sed 's/^-//' >> file1.out
grep '^+' tmp | sed 's/^+//' >> file2.out
Notes:
sed '1,3d'
just deletes the first three lines of the diff output, since they're not contents. I previously hadtail +3
here but that is not so portable; sed is safer.- The first grep extracts lines in common (start with a space in the diff).
- The next two greps extract lines not in common (
-
means infile1
only,+
infile2
only). - If
file1
andfile2
are identical, this will yield empty output files.
Compare two files field by field in Unix
paste/awk
solution
$ paste -d'|' file1 file2 |
awk -F'|' '{w=NF/2;
for(i=1;i<=w;i++)
if($i!=$(i+w)) printf "%d %d %s %s", NR,i,$i,$(i+w);
print ""}'
1 5 lmn 123
I changed the order, it makes more sense to me to print the line number first then field number, however you can change it easily...
Once paste matches lines from two files go over field of the first half (first file) and compare with the second half (second file) and print the differences. awk
has the implicit loop to over all records (lines). I haven't tested this with large files but for awk
part it doesn't matter (record by record). I'm not sure how eager paste
is but I doubt it will blink.
Unix - how to print the difference between two text files?
Your best bet is probably to sort the two files and run comm
on the result. If you have bash
as your shell, you can use Process Substitution:
comm -3 <(sort a.unl) <(sort b.unl)
This will print all the lines in a.unl
but not in b.unl
, all the lines in b.unl
but not in a.unl
(they will be indented by a tab); the -3
suppresses the lines in both a.unl
and b.unl
.
If you don't have bash
, you probably need something like:
sort a.unl > a.srt
sort b.unl > b.srt
comm -3 a.srt b.srt
rm -f a.srt b.srt
To make that more nearly bombproof (so it doesn't leave intermediate files around if you interrupt things), then you need:
tmp=tmp.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15
sort a.unl > $tmp.a
sort b.unl > $tmp.b
comm -3 $tmp.a $tmp.b
rm -f $tmp.?
trap 0
Related Topics
History Command Works in a Terminal, But Doesn't When Written as a Bash Script
Crontab Is Not Working on Amazon Ec2 Server
Makefile Export .O File to a Different Path Than .Cpp
Using Sed to Split a String with a Delimiter
Linux - Understanding the Mount Namespace & Clone Clone_Newns Flag
Integrate Emacs Copy/Paste with System Copy/Paste
Why Does '/Proc/Meminfo' Show 32Gb When Aws Instance Has Only 16Gb
Sed: Insert a Line in a Certain Position
Ftdi D2Xx Conflict with Ftdi_Sio on Linux - How to Remove Ftdi_Sio Automatically
Is Kernel Space Mapped into User Space on Linux X86
How to Find a List of Ip Addresses in Another File
How to Break Up an Extremely Long String Literal in Bash