How to Use If/Else Awk to Evaluate a File and Extract This Information

How to evaluate or process if statements in data?

Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:

$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]

if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}

$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');

$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');

The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

Awk if else with conditions

This awk should work for you:

awk 'FNR==NR {map[$2]=$4; next} {print $4, map[$4]+0}' mapfile testfile

190568 0.009489
194947 0.009707
197042 0
212894 0

This awk command processes mapfile first and stores $2 as key with $4 as a value in an associative array named as map.
Later when it processes testfile in 2nd block we print $4 from 2nd file with the stored value in map using key as $4. We add 0 in stored value to make sure that we get 0 when $4 is not present in map.

Analysing two files using awk with if condition

Here I corrected your AWK code:

awk -F"," 'NR==FNR{
number_day = $1 FS $2
batch[number_day]=$7
next
}
{
number_day = $1 FS $2
print $0 "," batch[number_day]
}' sam_batch.csv sam_name.csv

Output:

Number,Day,Sample,Batch
171386,0,38_171386_D0_2-1.raw,1
171386,0,38_171386_D0_2-2.raw,1
171386,2,30_171386_D2_1-1.raw,5
171386,2,30_171386_D2_1-2.raw,5
171386,-1,40_171386_D-1_1-1.raw,6
171386,-1,40_171386_D-1_1-2.raw,6

(No need for double-checking if you understand how the script works.)


Here's another AWK solution (my original answer):

awk -v "b=sam_batch.csv" 'BEGIN {
FS=OFS=","
while(( getline line < b) > 0) {
n = split(line,a)
nd = a[1] FS a[2]
nd2b[nd] = a[n]
}
}
{ print $1,$2,$3,nd2b[$1 FS $2] }' sam_name.csv

Both solutions parse file sam_batch.csv at the beginning to form a dictionary of (number, day) -> batch. Then they parse sam_name.csv, printing out the first three fields together with the "Batch" from another file.

extracting information from a file using awk and storing it into a variable with bash

This can be an approach to create a bash array:

$ declare -A mylist
$ i=1
$ while read line; do mylist[$i]=$(echo $line | awk '{print $1}'); ((i++)); done < test.conf

Then you can access the values with:

$ for i in "${mylist[@]}"; do echo "$i"; done
ip1
ip2
ip3

Or also, with Jonathan Leffer's very interesting approach, you can populate the array with the following command:

mylist=( $(awk '{print $1}' test.conf) )

It will store data like this:

mylist=(ip1 ip2 ip3 ...)

How to use conditional expression to select data?

I would be very tempted to leave the input delimiter unmodified so blanks and tabs separate fields, rather than insisting on tabs only. That means you want records after the first (to skip the headings line) that have six fields:

awk 'NR > 1 && NF == 6 { if ($6 == "+") x = $4; else x = $5; print $1, $2, $3, x; }'

If you want to control the output format more, you can dink with OFS, or use printf:

awk 'BEGIN { OFS = "\t" }
NR > 1 && NF == 6 { if ($6 == "+") x = $4; else x = $5; print $1, $2, $3, x; }'

awk 'NR > 1 && NF == 6 { if ($6 == "+") x = $4; else x = $5;
printf "%-8s %-12s %s %9s\n", $1, $2, $3, x; }'

There are other ways to handle it, I'm sure...

The first script produces:

Susd4 NM_144796 chr1 184695027
Ptpn14 NM_008976 chr1 191552147
Cd34 NM_001111059 chr1 196765080
Gm5698 NM_001166637 chr1 31055753
Epha4 NM_007936 chr1 77511663
Sp110 NM_175397 chr1 87495392
Bcl2 NM_009741 chr1 108610879

The content is correct, I believe; the formatting can be improved in many ways. The last script produces:

Susd4    NM_144796    chr1 184695027
Ptpn14 NM_008976 chr1 191552147
Cd34 NM_001111059 chr1 196765080
Gm5698 NM_001166637 chr1 31055753
Epha4 NM_007936 chr1 77511663
Sp110 NM_175397 chr1 87495392
Bcl2 NM_009741 chr1 108610879

You can tweak field widths as necessary.

If else script in bash using grep and awk

The attempt to use awk as the argument to for is basically a syntax error, and you have a number of syntax problems and inefficiencies here.

Try this:

for chr in {1..22}; do
awk '{print $4}' "test$chr.bim" |
while IFS="" read -r snp; do
if ! grep -q "$snp" "map$chr.txt"; then
echo "$snp 0"
else
awk -v snp="$snp" '
$0 ~ snp { print snp, $4 }' "map$chr.txt"
fi >> "position.$chr"
done
done

The entire thing could probably be further refactored to a single Awk script.

for chr in {1..22}; do
awk 'NR == FNR { ++a[$4]; next }
$2 in a { print a[$2], $4; ++found[$2] }
END { for(k in a) if (!found[k]) print a[k], 0 }' \
"test$chr.bim" "map$chr.txt" >> "position.$chr"
done

The correct for syntax for what I'm guessing you wanted would look like

for snp in $(awk '{print $4}' "test$chr.bim"); do

but this has other problems; see don't read lines with for

awk conditional statement based on a value between colon

Your code:

awk '{print $10}' file.txt | awk -F  ":" '/1/ {print $3}'

should be just 1 awk script:

awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt

but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:

awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt

and to print the rows where that value is less than 7 would be:

awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt

awk filter out CSV file content based on condition on a column

1st Solution (preferred): Following awk may help you.

awk 'FNR==NR{a[$1];next} !($2 in a)' exclude.list  FS="," myfile.csv

2nd Solution (Comprehensive): Adding one more awk by changing Input_file(s) sequence of reading, though first solution is more preferable I am adding this to cover all possibilities of solutions :)

awk '
FNR==NR{
a[$2]=$0;
if(!b[$2]++){ c[++i]=$2 };
next}
($1 in a) { delete a[$1]}
END{
for(j=1;j<=i;j++){
if(a[c[j]]){print a[c[j]]}
}}
' FS="," myfile.csv FS=" " exclude.list

Extracting a field from a line with condition in bash

You can do this all in awk, using getline()

awk '{var1=$5; var2=$6
while ((getline < "file2.txt") > 0)
if (index($0, var1) && index($0, var2)) print $3
close("file2.txt")
}' file1.txt

You are reading each line from file1.txt, putting field 5 & 6 into an awk variable to test later. Then using a while/getline to go through each line of the second file, and if both fields are found, then printing $3. Closing the file so that the next loop starts from record 1 of the second file.

Or, if you want to have a bash loop in file1, and then use awk, you can pass the variables in (as mentioned here by someone else), or escape them out.

awk '{if ($2 == '$var1') print $3}' file2.txt

The above will see the bash variable $var1 as a string in awk.



Related Topics



Leave a reply



Submit