compare 2 files and append a value from file1 to end of file2 after match
Hi use the below code:
file1 = open("input1.txt","r")
file2 = open("input2.txt","r")
file2_array = [data for data in file2]
file = open("output.txt","a")
for row in file1:
element_to_check = row.split(",")[0]
for row_to_check in file2_array:
if element_to_check in row_to_check:
file.write('%s,%s'%(row_to_check, row.split(",")[2]) + '\n')
This code is reading from the two input files(i am assuming as text files) and then comparing the data and if the condition is satisfied, then it appends the code to the line and writes it to a output file.
How can I print lines from file1 and file2 where columns 9 in file 1 is less than column 4 in file 2
your key from file 1 is field 2, not 1.
$ awk 'NR==FNR {a[$2]=$0; next}
$1 in a {split(a[$1],t);
if(t[9]>=$4 && t[10]<=$5) print a[$1], $0}' file1 file2 | column -t
BG chr2 100.000 15 0 0 1 15 216745730 216745744 5.1 chr2 hg38_refGene exon 216645730 216845744
BG chr1 100.000 15 0 0 1 15 6195235 6195335 5.1 chr1 hg38_refGene CDS 6095235 6395421
Comparison of two file in Unix and display differences
This'll do what I think you want, you might want to tweak the output format:
$ cat tst.awk
BEGIN { FS="[= ]" }
{
match(" "$0,/ v_party_id="[^"]+"/)
key = substr($0,RSTART,RLENGTH)
}
NR==FNR {
file1[key] = $0
next
}
{
if ( key in file1 ) {
nf = split(file1[key],tmp)
for (i=1; i<nf; i+=2) {
f1[key,tmp[i]] = tmp[i+1]
}
msg = sep = ""
for (i=1; i<NF; i+=2) {
if ( $(i+1) != f1[key,$i] ) {
msg = msg sep OFS ARGV[1] "." $i "=" f1[key,$i] OFS FILENAME "." $i "=" $(i+1)
sep = ","
}
}
if ( msg != "" ) {
print "Mismatch in row " FNR msg
}
delete file1[key]
}
else {
file2[key] = $0
}
}
END {
for (key in file1) {
print "In file1 only:", key, file1[key]
}
for (key in file2) {
print "In file2 only:", key, file2[key]
}
}
$ awk -f tst.awk file1 file2
Mismatch in row 1 file1.n_pd_percent="0.16323687" file2.n_pd_percent="0.2045", file1.v_accounting_standard="IFRS" file2.v_accounting_standard="SQRT"
Mismatch in row 2 file1.v_accounting_standard="SQRT" file2.v_accounting_standard="IFRS"
In file1 only: v_party_id="103811200" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811200" v_src_system_id="SMT"
In file1 only: v_party_id="103811100" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811100" v_src_system_id="SMT"
In file2 only: v_party_id="103811400" d_report_ref_date="2021-03-31" n_pd_percent="0.16323687" v_accounting_standard="IFRS" v_party_default_status_cd="NOTDFLT" v_party_id="103811400" v_src_system_id="SMT"
awk print line of file2 based on condition of file1
No as clean as a awk solution
$ paste file2 file1 | sed '/0/d' | cut -f1
B
C
You mentioned something about millions of lines, in order to just do a single pass through the files, I'd resort to python. Something like this perhaps (python 2.7):
with open("file1") as fd1, open("file2") as fd2:
for l1, l2 in zip(fd1, fd2):
if not l1.startswith('0'):
print l2.strip()
compare and print 2 columns from 2 files in awk ou perl
Assuming your $3
values are unique within each input file as shown in your sample input/output:
$ cat tst.awk
NR==FNR {
foos[$3] = $1
bars[$3] = $2
next
}
$3 in foos {
print foos[$3] "-" $1, bars[$3] "-" $2, $3
}
$ awk -f tst.awk file1.txt file2.txt
0001-2451 00000001-00000010 084010800001080
I named the arrays foos[]
and bars[]
as I don't know what the first 2 columns of your input actually represent - choose a more meaningful name.
List differences in two files using awk
If you really want to use awk:
$ cat f1
a|1
b|2
c|1
$ cat f2
b|2
c|1
d|0
$ awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) print k }' f1 f2
a|1
d|0
$
Compare two text files and if the second file has a row which contains both the columns of first file delete that row
This assumes the pairs in file1 never have the same value in both fields:
$ cat tst.awk
NR==FNR {
pairs1[NR] = $1
pairs2[NR] = $2
next
}
{
orig = $0
gsub(/[[:space:],]+/," ")
delete vals
for (i=1; i<=NF; i++) {
vals[$i]
}
for (nr in pairs1) {
if ( (pairs1[nr] in vals) && (pairs2[nr] in vals) ) {
next
}
}
print orig
}
$ awk -f tst.awk file1 file2
2002, 5052, 7001, 1500, 2500
2003, 5051, 3500, 4500, 4952
Extracting difference values between two files
Using awk you can do this:
awk 'FNR==NR { seen[$0]=FILENAME; next }
{if ($1 in seen) delete seen[$1]; else print $1, FILENAME}
END { for (i in seen) print i, seen[i] }' file{1,2}
6 file2
7 file2
5 file1
While traversing file1
we are storing column1 of each row in an array seen
with value as FILENAME
. Next while iterating file2
we print each missing entry and delete if entry is found (common entries). Finally in END
block we print all remaining entries from array seen
.
Shell script to pull variable file1 and compare against variable in file2 and print difference
Something like this maybe.
It parses the logfile to extract the tickets numbers only.
Then iterates over them and try a grep to found if it was sent.
LOGFILE="file1"
TMPFILE="file2"
for TICKET in $(awk '{print $1}' $LOGFILE); do
if ! grep -q "Ticket #: ${TICKET} " $TMPFILE; then
send_email_for_ticket $TICKET
fi
done
Related Topics
Hash ("#") Symbol in /Etc/Environment Causes String to Be Split
Linux Equivalent of Windows Dll Forwarders or MACos Reexport_Library
Determine Vm Size of Process Killed by Oom-Killer
Can Someone Explain the Shell Shock Bash Code
Glassfish There Is a Process Already Using the Admin Port 4848
R Package Installation in Linux
Unix How to Block Unix/Linux 'Wall' Messaging
Why This Bash Function Prints Only First Word of Whole String
Rsync, 'Uid/Gid Impossible to Set' Cases Cause Future Hard Link Failure, How to Fix
Pthread Condition Variables on Linux, Odd Behaviour
Removing of Specific Line in Text File
I'm Having Difficulty Understanding the Shellshock Vulnerability Verification
Significance of Address 0X8048080
Grep for String and Open at the Corresponding Line
Random Alphanumeric String Linux Swift 3
How to Make .Gitignore Configurable Based on Environment Variables