awk find the common rows to two files and combine the rows to a row in a third file
You can use this awk command to achieve your output:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {
print $0, ($3 in a ? a[$3] : "NA")}' file2.tab file1.tab
name level regno dept sex grade
john 900 123 csc male A
debby 800 378 mth male NA
ken 800 234 csc male A
sol 700 923 mth female NA
dare 900 273 phy male B
olanna 800 283 csc female D
olumba 400 245 phy male NA
petrus 800 284 mth female NA
Finding common value across multiple files containing single column values
awk
to the rescue!
to find the common element in all files (assuming uniqueness within the same file)
awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files
count all occurrences and print the values where count equals number of files.
find common rows between two dataframes based on two columns using bash
Just adding the explanation to oguz' fine answer in the comments above:
BEGIN{FS=OFS=","}
defines ,
to be the separator for both input and output.
NR==FNR{pair[$1,$2];next}
while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair
.
($1,$2) in pair||($2,$1) in pair{print $1,$2}
operating on the second file, check if field one and two in any order are present as index in the array pair
, and print them if they are.
how to find out common columns and its records from two files using awk
Following awk
may help you on same.
awk -F"|" 'FNR==NR{for(i=1;i<=NF;i++){a[$i]};next} FNR==1 && FNR!=NR{for(j=1;j<=NF;j++){if($j in a){b[++p]=j}}} {for(o=1;o<=p;o++){printf("%s%s",$b[o],o==p?ORS:OFS)}}' OFS="|" File2 File1
Adding a non-one liner form of solution too now.
awk -F"|" '
FNR==NR{
for(i=1;i<=NF;i++){
a[$i]};
next}
FNR==1 && FNR!=NR{
for(j=1;j<=NF;j++){
if($j in a){ b[++p]=j }}
}
{
for(o=1;o<=p;o++){
printf("%s%s",$b[o],o==p?ORS:OFS)}
}
' OFS="|" File2 File1
Edit by Ed Morton: FWIW here's the same script written with normal indenting/spacing and a couple of more meaningful variable names:
BEGIN { FS=OFS="|" }
NR==FNR {
for (i=1; i<=NF; i++) {
names[$i]
}
next
}
FNR==1 {
for (i=1; i<=NF; i++) {
if ($i in names) {
f[++numFlds] = i
}
}
}
{
for (i=1; i<=numFlds; i++) {
printf "%s%s", $(f[i]), (i<numFlds ? OFS : ORS)
}
}
subsetting rows from a file by matching columns from another file
With your shown samples please try following.
awk 'FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' Input_file2 Input_file1
In case its tab delimited then try following:
awk 'BEGIN{FS=OFS="\t"} FNR==NR{arr[$1,$2];next} (($1,$2) in arr)' file2 file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read.
arr[$1,$2] ##Creating array arr with index of $1 and $2 here.
next ##next will skip all further statements from here.
}
(($1,$2) in arr) ##Checking if $1,$2 is present in arr then print line.
' Input_file2 Input_file1 ##Mentioning Input_file names here.
Related Topics
How to Set Firefox Binary Path of Firefox in Selenium in Linux
How to Delete the Matching Pattern from Given Occurrence
Why Does Autoconf Erroneously Find a Function Which Isn't Available Later
Bash Shell: Cannot Use Variable $ as a Path to Run Tar
How to Modify Eip's Tracee Forked Procee
Significance of Address 0X8048080
How to Enable Keep-Alive in Haproxy
Shell Command to Get Directory with Least Access Date/Time
Write to Port 0Cf8H Fails with Segfault
Comparing 16 Bit Numbers in Nasm Produces Wrong Results
Problems with an Imagej Plugin
Bash: Update a Variable Within a File
Preventing to Bash Script from Running in Parallel or Overlap Using Cron