How to Find Common Rows in Multiple Files Using Awk

awk find the common rows to two files and combine the rows to a row in a third file

You can use this awk command to achieve your output:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {
print $0, ($3 in a ? a[$3] : "NA")}' file2.tab file1.tab

name level regno dept sex grade
john 900 123 csc male A
debby 800 378 mth male NA
ken 800 234 csc male A
sol 700 923 mth female NA
dare 900 273 phy male B
olanna 800 283 csc female D
olumba 400 245 phy male NA
petrus 800 284 mth female NA

Finding common value across multiple files containing single column values

awk to the rescue!

to find the common element in all files (assuming uniqueness within the same file)

awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files

count all occurrences and print the values where count equals number of files.

find common rows between two dataframes based on two columns using bash

Just adding the explanation to oguz' fine answer in the comments above:

BEGIN{FS=OFS=","} defines , to be the separator for both input and output.

NR==FNR{pair[$1,$2];next} while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair.

($1,$2) in pair||($2,$1) in pair{print $1,$2} operating on the second file, check if field one and two in any order are present as index in the array pair, and print them if they are.

how to find out common columns and its records from two files using awk

Following awk may help you on same.

awk -F"|" 'FNR==NR{for(i=1;i<=NF;i++){a[$i]};next} FNR==1 && FNR!=NR{for(j=1;j<=NF;j++){if($j in a){b[++p]=j}}} {for(o=1;o<=p;o++){printf("%s%s",$b[o],o==p?ORS:OFS)}}' OFS="|" File2  File1

Adding a non-one liner form of solution too now.

awk -F"|" '
FNR==NR{
for(i=1;i<=NF;i++){
a[$i]};
next}
FNR==1 && FNR!=NR{
for(j=1;j<=NF;j++){
if($j in a){ b[++p]=j }}
}
{
for(o=1;o<=p;o++){
printf("%s%s",$b[o],o==p?ORS:OFS)}
}
' OFS="|" File2 File1

Edit by Ed Morton: FWIW here's the same script written with normal indenting/spacing and a couple of more meaningful variable names:

BEGIN { FS=OFS="|" }
NR==FNR {
for (i=1; i<=NF; i++) {
names[$i]
}
next
}
FNR==1 {
for (i=1; i<=NF; i++) {
if ($i in names) {
f[++numFlds] = i
}
}
}
{
for (i=1; i<=numFlds; i++) {
printf "%s%s", $(f[i]), (i<numFlds ? OFS : ORS)
}
}

subsetting rows from a file by matching columns from another file

With your shown samples please try following.

awk 'FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' Input_file2  Input_file1

In case its tab delimited then try following:

awk 'BEGIN{FS=OFS="\t"} FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' file2 file1

Explanation: Adding detailed explanation for above.

awk '                         ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read.
arr[$1,$2] ##Creating array arr with index of $1 and $2 here.
next ##next will skip all further statements from here.
}
(($1,$2) in arr) ##Checking if $1,$2 is present in arr then print line.
' Input_file2 Input_file1 ##Mentioning Input_file names here.


Related Topics



Leave a reply



Submit