How to Use Awk to Test If a Column Value Is in Another File

How to use awk to test if a column value is in another file?

You could use something like this:

awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt

This saves each line from A.txt as a key in the array a and then prints any lines from B.txt whose second field is in the array.

NR == FNR is the standard way to target the first file passed to awk, as NR (the total record number) is only equal to FNR (the record number for the current file) for the first file. next skips to the next record so the $2 in a part is never reached until the second file.

How to cross check values in one column in a file in another file and print the missing values?

1st solution: With your shown samples please try following awk code, written and tested in GNU awk. Both the codes will print 1st field from file1 if its NOT present in 2nd field, its taking care of addiotnal " present before and after $1 in file2.

awk '
FNR==1 { next }
FNR==NR{
gsub(/^"|"$/,"",$1)
arr[$1]
next
}
!($1 in arr){
print $1
}
' file2 file1



2nd solution OR without using gsub as used in above awk code a slight different from above one, where I am not doing substitution of " values present before and after in $1, I am simply wrapping $1 of file1 and checking if that is present in file2 or not here.

awk '
FNR==1 { next }
FNR==NR{
arr[$1]
next
}
!("\""$1"\"" in arr){
print $1
}
' file2 file1

using awk with column value conditions

If you're looking for a particular string, put quotes around it:

awk '$1 == "findtext" {print $3}'

Otherwise, awk will assume it's a variable name.

awk the column of one file from another file

It's not clear (to me) the format of file2 (eg, is that a space or tab between fields?), or if a line in file2 could have more than 3 (white) spaced delimited strings (eg, apples black raspberries 6), so picking a delimiter for file2 would require more details. Having said that ...

  • there are no pipes ('|') in the sample files so the current code (using -F"|") is going to lump the entire line into awk variable $1
  • we can make this a bit easier by recognizing that we're only interested in the last field from file2

Adding an entry to file2:

$ cat file2
apples peaches 3
apples peaches 9
oranges pears 7
apricots figs 1
apples black raspberries 2

A couple small changes to the current awk code:

awk 'FNR==NR {f1[$1]; next} $(NF) in f1' file1 file2

This generates:

apples  peaches 3
apricots figs 1
apples black raspberries 2

awk check if value from a column in file A exists in file B and if not print whole line of file A

With your shown samples, could you please try following. Written and tested in GNU awk.

awk -v RS='\r?\n' 'FNR==NR{arr[$0];next} !($4 in arr)' fileB FS=";" fileA

Explanation: Adding detailed explanation for above.

awk -v RS='\r?\n' '    ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when fileB is being read.
arr[$0] ##Creating array arr with current line index.
next ##next will skip all further statements from here.
}
!($4 in arr) ##Checking condition if 4th column of fileA is present in arr then print line.
' fileB FS=";" fileA ##Mentioning Input_file(s) and setting FS=";" before fileA.

Awk to check if value of column1 from file 1 is present in file2 and update field in another column

awk 'NR==FNR { if(FNR!=1&&$2!="INR") arr[$1]=1; next}
FNR!=1 && $0 && $1 in arr { $2="EXPWT" }
1;' file2 file1

result is

DocumentN   SupplyType

262842964 EXPWT

E08/1002/00798 TAX

E08/1002/00816 EXPWT

INDTR1117GSM1459 CAN

INDTR1117GSM5530 EXPWT

262852823 TAX

I assume you just want to update the 2nd fields of records in file1, where it has non-IND currency in file2, to EXPWT

$0 in FNR!=1 && $0 && $1 in arr, is to take consideration of empty lines

as @kvantour answered you may use sub($2,"EXPWT") to retain the original field separator, but make sure the first field doesn't contain the string of second field

How to check value of a column lies between values of two columns in other file and print corresponding value from column in Unix?

There's a standard idiom in awk which uses FNR (file record number) and NR (overall record number) to detect when you're reading the first file. You read and save the values of the first file in arrays, and then use the arrays while reading the second file.

In this context, you want to read file1 first, saving the records based on the value in column 1 ($1). This assumes that the keys in file1 (the first field) are unique. Then, when reading the second file,

awk 'FNR == NR { val[$1] = $2 }
FNR != NR { if ($1 in val && val[$1] >= $2 && val[$1] <= $3)
print $1, val[$1], $4
}' file1 file2

Sample output:

scaffold1_size11 12 Os08
scaffold2_size22 26 Os09
scaffold3_size33 67 Os02

Note that this is different from the sample output in the question, which is:

scaffold1_size11    12  Os08
scaffold2_size22 26
scaffold3_size33 67 Os02

I assume that's a typo in the question since none of the rows in file2 is missing the fourth column.

You'll also see the idiom used like:

awk 'FNR == NR { …save…; next }
{ …process… }'

The next skips the second block of code while reading the first file. It might be marginally more efficient, but I tend to like the explicit clarity of the two inverted conditions.

If spacing in the output is an issue, use an appropriate printf statement in place of the print.

Awk: check if field value is in one of the given

This should do:

awk '$2~/^(23|17|21|1)$/' file

This will test if field #2 is one of 23,17,21 or 1


Just en example on how to do it with array:

awk 'BEGIN{split("23 17 21 1",tmp); for (i in tmp) arr[tmp[i]]} $2 in arr' file

Make a variable data with number to use

Split it with split into array arr
Loop trough all value in arr fore every line and print if $2 is found in arr.

EDIT: Updated with Eds suggestions.



Related Topics



Leave a reply



Submit