How to use awk to test if a column value is in another file?
You could use something like this:
awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt
This saves each line from A.txt
as a key in the array a
and then prints any lines from B.txt
whose second field is in the array.
NR == FNR
is the standard way to target the first file passed to awk, as NR
(the total record number) is only equal to FNR
(the record number for the current file) for the first file. next
skips to the next record so the $2 in a
part is never reached until the second file.
How to cross check values in one column in a file in another file and print the missing values?
1st solution: With your shown samples please try following awk
code, written and tested in GNU awk
. Both the codes will print 1st field from file1 if its NOT present in 2nd field, its taking care of addiotnal "
present before and after $1 in file2.
awk '
FNR==1 { next }
FNR==NR{
gsub(/^"|"$/,"",$1)
arr[$1]
next
}
!($1 in arr){
print $1
}
' file2 file1
2nd solution OR without using gsub
as used in above awk
code a slight different from above one, where I am not doing substitution of "
values present before and after in $1, I am simply wrapping $1 of file1 and checking if that is present in file2 or not here.
awk '
FNR==1 { next }
FNR==NR{
arr[$1]
next
}
!("\""$1"\"" in arr){
print $1
}
' file2 file1
using awk with column value conditions
If you're looking for a particular string, put quotes around it:
awk '$1 == "findtext" {print $3}'
Otherwise, awk will assume it's a variable name.
awk the column of one file from another file
It's not clear (to me) the format of file2
(eg, is that a space or tab between fields?), or if a line in file2
could have more than 3 (white) spaced delimited strings (eg, apples black raspberries 6
), so picking a delimiter for file2
would require more details. Having said that ...
- there are no pipes ('|') in the sample files so the current code (using
-F"|"
) is going to lump the entire line intoawk
variable$1
- we can make this a bit easier by recognizing that we're only interested in the last field from
file2
Adding an entry to file2
:
$ cat file2
apples peaches 3
apples peaches 9
oranges pears 7
apricots figs 1
apples black raspberries 2
A couple small changes to the current awk
code:
awk 'FNR==NR {f1[$1]; next} $(NF) in f1' file1 file2
This generates:
apples peaches 3
apricots figs 1
apples black raspberries 2
awk check if value from a column in file A exists in file B and if not print whole line of file A
With your shown samples, could you please try following. Written and tested in GNU awk
.
awk -v RS='\r?\n' 'FNR==NR{arr[$0];next} !($4 in arr)' fileB FS=";" fileA
Explanation: Adding detailed explanation for above.
awk -v RS='\r?\n' ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when fileB is being read.
arr[$0] ##Creating array arr with current line index.
next ##next will skip all further statements from here.
}
!($4 in arr) ##Checking condition if 4th column of fileA is present in arr then print line.
' fileB FS=";" fileA ##Mentioning Input_file(s) and setting FS=";" before fileA.
Awk to check if value of column1 from file 1 is present in file2 and update field in another column
awk 'NR==FNR { if(FNR!=1&&$2!="INR") arr[$1]=1; next}
FNR!=1 && $0 && $1 in arr { $2="EXPWT" }
1;' file2 file1
result is
DocumentN SupplyType
262842964 EXPWT
E08/1002/00798 TAX
E08/1002/00816 EXPWT
INDTR1117GSM1459 CAN
INDTR1117GSM5530 EXPWT
262852823 TAX
I assume you just want to update the 2nd fields of records in file1, where it has non-IND
currency in file2, to EXPWT
$0 in FNR!=1 && $0 && $1 in arr
, is to take consideration of empty lines
as @kvantour answered you may use sub($2,"EXPWT")
to retain the original field separator, but make sure the first field doesn't contain the string of second field
How to check value of a column lies between values of two columns in other file and print corresponding value from column in Unix?
There's a standard idiom in awk
which uses FNR
(file record number) and NR
(overall record number) to detect when you're reading the first file. You read and save the values of the first file in arrays, and then use the arrays while reading the second file.
In this context, you want to read file1
first, saving the records based on the value in column 1 ($1
). This assumes that the keys in file1
(the first field) are unique. Then, when reading the second file,
awk 'FNR == NR { val[$1] = $2 }
FNR != NR { if ($1 in val && val[$1] >= $2 && val[$1] <= $3)
print $1, val[$1], $4
}' file1 file2
Sample output:
scaffold1_size11 12 Os08
scaffold2_size22 26 Os09
scaffold3_size33 67 Os02
Note that this is different from the sample output in the question, which is:
scaffold1_size11 12 Os08
scaffold2_size22 26
scaffold3_size33 67 Os02
I assume that's a typo in the question since none of the rows in file2
is missing the fourth column.
You'll also see the idiom used like:
awk 'FNR == NR { …save…; next }
{ …process… }'
The next
skips the second block of code while reading the first file. It might be marginally more efficient, but I tend to like the explicit clarity of the two inverted conditions.
If spacing in the output is an issue, use an appropriate printf
statement in place of the print
.
Awk: check if field value is in one of the given
This should do:
awk '$2~/^(23|17|21|1)$/' file
This will test if field #2
is one of 23,17,21 or 1
Just en example on how to do it with array:
awk 'BEGIN{split("23 17 21 1",tmp); for (i in tmp) arr[tmp[i]]} $2 in arr' file
Make a variable data
with number to use
Split it with split
into array arr
Loop trough all value in arr
fore every line and print if $2
is found in arr.
EDIT: Updated with Eds suggestions.
Related Topics
Bind Outgoing Traffic to Eth0 Instead of Eth0:1
Why Do I Get The Information of "Suspended (Tty Input)" When I Run My Script in The Background
Using Flycheck/Flymake on Kernel Source Tree
Is There an Os Command I Can Run to Determine If Running Inside a Xen Based Virtual Machine
How to Write Kernel Space Memory (Physical Address) to a File Using O_Direct
Check If a Git Branch Is Ahead of Another Using a Script
X11 Forwarding of Gui App in Docker Container
Posix_Fadvise(Willneed) Makes Io Slower
What Is The Safest Way to Empty a Directory in *Nix
Installing 32 Bit Libraries (Glibc) on 64 Bit Rhel Without Using Yum
Set Environment Variable in Gdb from Output of Command
Arm Performance Counters Vs Linux Clock_Gettime
Install Sqlite3 Dev and Other Packages in Centos
How to Redirect Entire Output of Spark-Submit to a File