Copy Differences Between Two Files in Unix

Copy differences between two files in unix

Another way to get diff is by using awk:

awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2

Though I must admit that I haven't run any benchmarks and can't say which is the fastest solution.

Fastest way of finding differences between two files in unix?

You could try..

comm -13 <(sort file1) <(sort file2) > file3

or

grep -Fxvf file1 file2 > file3

or

diff file1 file2 | grep "<" | sed 's/^<//g'  > file3

or

join -v 2 <(sort file1) <(sort file2) > file3

compare two files in UNIX

I got the solution by using comm

comm -23 file1 file2 

will give you the desired output.

The files need to be sorted first anyway.

Compare two files in unix and add the delta to one file

Sort the two files together with the -u option to remove duplicates.

sort -u File1.txt File2.txt > NewFile.txt && mv NewFile.txt File1.txt

Comparing differences two files in Unix to produce bool

You're looking for cmp:

if cmp -s file1 file2; then
echo "They're the same."
else
echo "They're different"
fi

Compare two files line by line and generate the difference in another file

diff(1) is not the answer, but comm(1) is.

NAME
comm - compare two sorted files line by line

SYNOPSIS
comm [OPTION]... FILE1 FILE2

...

-1 suppress lines unique to FILE1

-2 suppress lines unique to FILE2

-3 suppress lines that appear in both files

So

comm -2 -3 file1 file2 > file3

The input files must be sorted. If they are not, sort them first. This can be done with a temporary file, or...

comm -2 -3 <(sort file1) <(sort file2) > file3

provided that your shell supports process substitution (bash does).

How do I compare two files in unix based on their columns

From the condition c[$1] == 0 in the awk script from the question I assumed you want to print lines from file2 that contain a code that is not present in file1.

As it is clarified now, that you want to count the codes that are present in both files, see below at the end of the answer for the reverse check.

Slight modifications to your script will fix the problems:

awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1]++ == 0' file1 file2

Option -F , specifies comma (,) as field separator.

The condition if(NR!=1)c[$1]++; skips the header line in file1.

The post-increment operator in c[$1]++ == 0 will make the condition fail for the second or later occurrence of the same code in file2.

I omit the trailing | wc -l here to show the output lines.

I modified file2 to contain two lines with the same code in column 1 that is not present in file1.

With file2 shown here

AND,Europe,Andorra,2020-07-26,897.0
ABW,North America,Aruba,2020-03-13,2.0
ABW,North America,Aruba,2020-10-06,4079.0
ALB,Europe,Albania,2020-08-23,8275.1
ALB,Europe,Albania,2020-08-23,8275.2
AFG,Asia,Afghanistan,2020-09-06,38324.0
AFG,Asia,Afghanistan,2020-09-06,38324.0

and file1 from the question I get this output:

AND,Europe,Andorra,2020-07-26,897.0
ALB,Europe,Albania,2020-08-23,8275.1

(Only the first line with ALB is printed`.)

You can also implemente the counting in awk instead of using wc -l.

awk -F , 'NR==FNR { if(NR!=1)c[$1]++; next } c[$1]++ == 0 {count++} END {print count}' file1 file2

If you want to print the lines from file2 that contain a code that is present in file1, the script can be modified like this:

awk -F, 'NR==FNR { if(NR!=1)c[$1]++; next} c[$1] { c[$1]=0; print}' file1 file2

This prints

ABW,North America,Aruba,2020-03-13,2.0
AFG,Asia,Afghanistan,2020-09-06,38324.0

(The first line with code ABW.)


Alternative solution as requested in a comment.

tail -n +2 file1|cut -f1 -d,|sort -u>code1
cut -f1 -d, file2|sort -u>code2
fgrep -vf code1 code2
rm code1 code2

Or combined in one command without using temporary files code1 and code2:

fgrep -f <(tail -n +2 file1|cut -f1 -d,|sort -u) <(cut -f1 -d, file2|sort -u)

Add | wc -l to count the lines instead of printing them.

Explanation:

tail -n +2 print everything starting from the 2nd line

cut -f1 -d, print the first field, delimited with ,

sort -u sort lines and remove duplicates

fgrep -f code1 code2 print all lines from code2 that contain any of the strings from code1

Compare two files and append the differences at the end

I think you can solve this problem by using diff -U <large number>. This will give you output that will be easy to parse to reconstruct what you want. If <large number> is larger than the longer of your two files, then you will get a predictable output format:

$diff -u 1000 file1 file2
--- file1 2019-07-22 14:39:39.344674000 -0400
+++ file2 2019-07-22 14:39:45.072654000 -0400
@@ -1,4 +1,4 @@
A
+B
C
-D
E

Then you can use grep and sed to reconstruct the two output files you want:

diff -u 1000 file1 file2 | sed '1,3d' > tmp
grep '^ ' tmp | sed 's/^ //' > file1.out
cp file1.out file2.out
grep '^-' tmp | sed 's/^-//' >> file1.out
grep '^+' tmp | sed 's/^+//' >> file2.out

Notes:

  • sed '1,3d' just deletes the first three lines of the diff output, since they're not contents. I previously had tail +3 here but that is not so portable; sed is safer.
  • The first grep extracts lines in common (start with a space in the diff).
  • The next two greps extract lines not in common (- means in file1 only, + in file2 only).
  • If file1 and file2 are identical, this will yield empty output files.

Compare two files field by field in Unix

paste/awk solution

$ paste -d'|' file1 file2 | 
awk -F'|' '{w=NF/2;
for(i=1;i<=w;i++)
if($i!=$(i+w)) printf "%d %d %s %s", NR,i,$i,$(i+w);
print ""}'

1 5 lmn 123

I changed the order, it makes more sense to me to print the line number first then field number, however you can change it easily...

Once paste matches lines from two files go over field of the first half (first file) and compare with the second half (second file) and print the differences. awk has the implicit loop to over all records (lines). I haven't tested this with large files but for awk part it doesn't matter (record by record). I'm not sure how eager paste is but I doubt it will blink.

Unix - how to print the difference between two text files?

Your best bet is probably to sort the two files and run comm on the result. If you have bash as your shell, you can use Process Substitution:

comm -3 <(sort a.unl) <(sort b.unl)

This will print all the lines in a.unl but not in b.unl, all the lines in b.unl but not in a.unl (they will be indented by a tab); the -3 suppresses the lines in both a.unl and b.unl.

If you don't have bash, you probably need something like:

sort a.unl > a.srt
sort b.unl > b.srt
comm -3 a.srt b.srt
rm -f a.srt b.srt

To make that more nearly bombproof (so it doesn't leave intermediate files around if you interrupt things), then you need:

tmp=tmp.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15

sort a.unl > $tmp.a
sort b.unl > $tmp.b
comm -3 $tmp.a $tmp.b

rm -f $tmp.?
trap 0


Related Topics



Leave a reply



Submit