How to Find Common Rows in Multiple Files Using Awk

awk find the common rows to two files and combine the rows to a row in a third file

You can use this awk command to achieve your output:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {
     print $0, ($3 in a ? a[$3] : "NA")}' file2.tab file1.tab

name    level  regno  dept  sex     grade
john    900    123    csc   male    A
debby   800    378    mth   male    NA
ken     800    234    csc   male    A
sol     700    923    mth   female  NA
dare    900    273    phy   male    B
olanna  800    283    csc   female  D
olumba  400    245    phy   male    NA
petrus  800    284    mth   female  NA

Finding common value across multiple files containing single column values

awk to the rescue!

to find the common element in all files (assuming uniqueness within the same file)

awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files

count all occurrences and print the values where count equals number of files.

find common rows between two dataframes based on two columns using bash

Just adding the explanation to oguz' fine answer in the comments above:

BEGIN{FS=OFS=","} defines , to be the separator for both input and output.

NR==FNR{pair[$1,$2];next} while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair.

($1,$2) in pair||($2,$1) in pair{print $1,$2} operating on the second file, check if field one and two in any order are present as index in the array pair, and print them if they are.

how to find out common columns and its records from two files using awk

Following awk may help you on same.

awk -F"|" 'FNR==NR{for(i=1;i<=NF;i++){a[$i]};next} FNR==1 && FNR!=NR{for(j=1;j<=NF;j++){if($j in a){b[++p]=j}}} {for(o=1;o<=p;o++){printf("%s%s",$b[o],o==p?ORS:OFS)}}' OFS="|" File2  File1

Adding a non-one liner form of solution too now.

awk -F"|" '
FNR==NR{
   for(i=1;i<=NF;i++){
     a[$i]};
   next}
FNR==1 && FNR!=NR{
   for(j=1;j<=NF;j++){
     if($j in a){ b[++p]=j }}
}
{
   for(o=1;o<=p;o++){
     printf("%s%s",$b[o],o==p?ORS:OFS)}
}
' OFS="|" File2  File1

Edit by Ed Morton: FWIW here's the same script written with normal indenting/spacing and a couple of more meaningful variable names:

BEGIN { FS=OFS="|" }
NR==FNR {
    for (i=1; i<=NF; i++) {
        names[$i]
    }
    next
}
FNR==1 {
    for (i=1; i<=NF; i++) {
        if ($i in names) {
            f[++numFlds] = i
        }
    }
}
{
    for (i=1; i<=numFlds; i++) {
        printf "%s%s", $(f[i]), (i<numFlds ? OFS : ORS)
    }
}

subsetting rows from a file by matching columns from another file

With your shown samples please try following.

awk 'FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' Input_file2  Input_file1

In case its tab delimited then try following:

awk 'BEGIN{FS=OFS="\t"} FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' file2 file1

Explanation: Adding detailed explanation for above.

awk '                         ##Starting awk program from here.
FNR==NR{                      ##Checking condition FNR==NR which will be TRUE when file2 is being read.
  arr[$1,$2]                  ##Creating array arr with index of $1 and $2 here.
  next                        ##next will skip all further statements from here.
}
(($1,$2) in arr)              ##Checking if $1,$2 is present in arr then print line.
' Input_file2  Input_file1    ##Mentioning Input_file names here.