Print Differences of File1 to File2 Without Deleting Anything from File2

compare 2 files and append a value from file1 to end of file2 after match

Hi use the below code:

file1 = open("input1.txt","r")
file2 = open("input2.txt","r")

file2_array = [data for data in file2]

file = open("output.txt","a")
for row in file1:
element_to_check = row.split(",")[0]
for row_to_check in file2_array:
if element_to_check in row_to_check:
file.write('%s,%s'%(row_to_check, row.split(",")[2]) + '\n')

This code is reading from the two input files(i am assuming as text files) and then comparing the data and if the condition is satisfied, then it appends the code to the line and writes it to a output file.

How can I print lines from file1 and file2 where columns 9 in file 1 is less than column 4 in file 2

your key from file 1 is field 2, not 1.

$ awk 'NR==FNR {a[$2]=$0; next} 
$1 in a {split(a[$1],t);
if(t[9]>=$4 && t[10]<=$5) print a[$1], $0}' file1 file2 | column -t

BG chr2 100.000 15 0 0 1 15 216745730 216745744 5.1 chr2 hg38_refGene exon 216645730 216845744
BG chr1 100.000 15 0 0 1 15 6195235 6195335 5.1 chr1 hg38_refGene CDS 6095235 6395421

Comparison of two file in Unix and display differences

This'll do what I think you want, you might want to tweak the output format:

$ cat tst.awk
BEGIN { FS="[= ]" }
{
match(" "$0,/ v_party_id="[^"]+"/)
key = substr($0,RSTART,RLENGTH)
}
NR==FNR {
file1[key] = $0
next
}
{
if ( key in file1 ) {
nf = split(file1[key],tmp)
for (i=1; i<nf; i+=2) {
f1[key,tmp[i]] = tmp[i+1]
}

msg = sep = ""
for (i=1; i<NF; i+=2) {
if ( $(i+1) != f1[key,$i] ) {
msg = msg sep OFS ARGV[1] "." $i "=" f1[key,$i] OFS FILENAME "." $i "=" $(i+1)
sep = ","
}
}
if ( msg != "" ) {
print "Mismatch in row " FNR msg
}
delete file1[key]
}
else {
file2[key] = $0
}
}
END {
for (key in file1) {
print "In file1 only:", key, file1[key]
}
for (key in file2) {
print "In file2 only:", key, file2[key]
}
}


$ awk -f tst.awk file1 file2
Mismatch in row 1 file1.n_pd_percent="0.16323687" file2.n_pd_percent="0.2045", file1.v_accounting_standard="IFRS" file2.v_accounting_standard="SQRT"
Mismatch in row 2 file1.v_accounting_standard="SQRT" file2.v_accounting_standard="IFRS"
In file1 only: v_party_id="103811200" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811200" v_src_system_id="SMT"
In file1 only: v_party_id="103811100" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811100" v_src_system_id="SMT"
In file2 only: v_party_id="103811400" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811400" v_src_system_id="SMT"

awk print line of file2 based on condition of file1

No as clean as a awk solution

$ paste file2 file1 | sed '/0/d' | cut -f1
B
C

You mentioned something about millions of lines, in order to just do a single pass through the files, I'd resort to python. Something like this perhaps (python 2.7):

with open("file1") as fd1, open("file2") as fd2:
for l1, l2 in zip(fd1, fd2):
if not l1.startswith('0'):
print l2.strip()

compare and print 2 columns from 2 files in awk ou perl

Assuming your $3 values are unique within each input file as shown in your sample input/output:

$ cat tst.awk
NR==FNR {
foos[$3] = $1
bars[$3] = $2
next
}
$3 in foos {
print foos[$3] "-" $1, bars[$3] "-" $2, $3
}


$ awk -f tst.awk file1.txt file2.txt
0001-2451 00000001-00000010 084010800001080

I named the arrays foos[] and bars[] as I don't know what the first 2 columns of your input actually represent - choose a more meaningful name.

List differences in two files using awk

If you really want to use awk:

$ cat f1
a|1
b|2
c|1
$ cat f2
b|2
c|1
d|0
$ awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' f1 f2
a|1
d|0
$

Compare two text files and if the second file has a row which contains both the columns of first file delete that row

This assumes the pairs in file1 never have the same value in both fields:

$ cat tst.awk
NR==FNR {
pairs1[NR] = $1
pairs2[NR] = $2
next
}
{
orig = $0
gsub(/[[:space:],]+/," ")
delete vals
for (i=1; i<=NF; i++) {
vals[$i]
}
for (nr in pairs1) {
if ( (pairs1[nr] in vals) && (pairs2[nr] in vals) ) {
next
}
}
print orig
}


$ awk -f tst.awk file1 file2
2002, 5052, 7001, 1500, 2500
2003, 5051, 3500, 4500, 4952

Extracting difference values between two files

Using awk you can do this:

awk 'FNR==NR { seen[$0]=FILENAME; next }
{if ($1 in seen) delete seen[$1]; else print $1, FILENAME}
END { for (i in seen) print i, seen[i] }' file{1,2}
6 file2
7 file2
5 file1

While traversing file1 we are storing column1 of each row in an array seen with value as FILENAME. Next while iterating file2 we print each missing entry and delete if entry is found (common entries). Finally in END block we print all remaining entries from array seen.

Shell script to pull variable file1 and compare against variable in file2 and print difference

Something like this maybe.

It parses the logfile to extract the tickets numbers only.

Then iterates over them and try a grep to found if it was sent.

LOGFILE="file1"
TMPFILE="file2"

for TICKET in $(awk '{print $1}' $LOGFILE); do
if ! grep -q "Ticket #: ${TICKET} " $TMPFILE; then
send_email_for_ticket $TICKET
fi
done


Related Topics



Leave a reply



Submit