Filtering Rows Based on Number of Columns with Awk

Filtering Rows Based On Number of Columns with AWK

You need to use the NF (number of fields) variable to control the actions, such as in the following transcript:

$ echo '0333 foo
> bar
> 23243 qux' | awk 'NF==2{print}{}'
0333 foo
23243 qux

This will print the line if the number of fields is two, otherwise it will do nothing. The reason I have the (seemingly) strange construct NF==2{print}{} is because some implementations of awk will print by default if no rules are matched for a line. The empty command {} guarantees that this will not happen.

If you're lucky enough to have one of those that doesn't do this, you can get away with:

awk 'NF==2'

but the first solution above will work in both cases.

Filter out rows from column A based on values in column B

With an array. I assume that there are no duplicates in the first column.

awk -F ',' 'NR>1{
array[$1]++; array[$2]--
}
END{
for(i in array){ if(array[i]==1){ print i } }
}' file

As one line:

awk -F ',' 'NR>1{ array[$1]++; array[$2]-- } END{for(i in array){ if(array[i]==1){ print i } } }' file

Output:


esther@example.com
daisy@example.com
bill@example.com

Using awk command to filter out specific columns from a huge file

I think you may be looking for something more along the lines of:

gzcat filename.csv.gz |
awk -F, '{print $19,$26,$27}' |
gzip > filename_FILTERED.csv.gz

How to use Awk to filter rows using a column value under double quotes

Just match the literal " with an escape character. This is the straight-forward filter to match the literal "AA" on the first column. Since awk works on a pattern { action } basis, the condition match to see if first column is "AA" can be done directly without needing to use explicit { print }

If the condition is met for that line, awk is left with a condition as awk 1 file on which case the line is printed.

awk -v FS=, '$1=="\"AA\""' file

Also, you can avoid escapes, by putting the match string in a variable under single-quotes and let it match the variable

awk -v FS=, -v m='"AA"' '$1==m' file

Filter a file using a column value greater than a number (awk not working)

You need to tell your awk to coerce $8 into a number by computing $8+0. It is recommended that you ensure you have GNU awk installed to avoid issues. Also, you may probably use dos2unix before working on the files to normalize the line endings.

The whole command can be written as

awk -F"," '/^LQN/ && $8+0 >= 10 {print $1, $8}' df_TPM.csv

See the online awk demo.

NOTE: To only count these lines, use
The whole command can be written as

awk -F, '/^LQN/ && $8+0 >= 10 {cnt++} END{print cnt}' df_TPM.csv

To find the lines that do not start with LQN, just add the negation operator ! before /^LQN/:

awk -F, '!/^LQN/ && $8+0 >= 10 {cnt++} END{print cnt}' df_TPM.csv

Details

  • -F"," (= -F,) - set the field separator to a comma
  • /^LQN/ && $8+0 >= 10 - if the current line starts with LQN and the eighth field is equal or larger than 10
  • !/^LQN/ && $8+0 >= 10 - if the current line does not start with LQN and the eighth field is equal or larger than 10
  • {print $1, $8} - print Field 1 and 8
  • {cnt++} - increment the cnt variable
  • END{print cnt} - print the cnt variable once the awk finishes processing lines.


Related Topics



Leave a reply



Submit