How to Feed Awk Input from Both Pipe and File

How to feed awk input from both pipe and file?

As Karoly suggests,

str=$( rest of commands that will give a string )
awk -v s="$str" -F, '$7==s {print $5; exit}' file

If you want to feed awk with a pipe:

cmds | awk -F, 'NR==FNR {str=$0; next}; $7==str {print $5}' - file

I think the first option is more readable.

Using a pipe to input in an awk statement

You almost certainly have something wrong with your variables, you should print them out and gradually build up the pipeline one command at a time to debug.

As it stands, it works fine for the following values:

$ max_year=2000
$ max_price=10000
$ model=a

$ grep "$model" cars
toyota corolla 1970 2500
chevy malibu 1999 3000
ford mustang 1965 10000
chevy malibu 2000 3500
honda civic 1985 450
honda accord 2001 6000
ford taurus 2004 17000
toyota rav4 2002 750
chevy impala 1985 1550

$ grep "$model" cars | awk '($3+0) >= ("'$max_year'"+0) && ($4+0) <= ("'$max_price'"+0)'
chevy malibu 2000 3500
honda accord 2001 6000
toyota rav4 2002 750

There are also better ways of doing it without having to manage your command string the way you have, since it's probably prone to errors. You can use:

grep "$model" cars |
  awk -vY=$max_year -vP=$max_price '$3>=Y&&$4<=P{print}'

(you'll note I'm not using the string+0 trick there, GNU awk, which you're almost certainly using under Linux, handles that just fine, it will compare numerically if both arguments are numeric in nature).

Reading from stdin OR file using awk

You may use the dash (-) as the filename, awk understands it as using stdin as the file to parse. For example:

awk '{++a[length()]} END{for (i in a) print i, a[i]}' -

Also, by not specifying a filename at all, awk also uses stdin

awk '{++a[length()]} END{for (i in a) print i, a[i]}'

And note that you can mix them both. The following will process file1.txt, stdin and file2.txt in that order:

awk '{++a[length()]} END{for (i in a) print i, a[i]}' file1.txt - file2.txt

When programs like awk gets an input through pipe, does it read it line by ine?

Both ways you write your code:

while IFS=, read a b c
    echo $a $b $c
done < textfile.txt

OR

cat textfile.txt | awk '{print $1 $2 $3}'

are wrong. The shell loop will be very slow and produce bizarre results based on the content of your input file. The correct way to write it to avoid the bizarre results is (you should arguably use printf instead of echo too):

while IFS=, read -r a b c
    echo "$a $b $c"
done < textfile.txt

but it'd still be incredibly slow. The shell is an environment from which to call tools with a language to sequence those calls, it is NOT a tool for text processing - the UNIX text-processing is awk.

The cat | awk command should be written as:

awk '{print $1, $2, $3}' textfile.awk

since awk is perfectly capable of opening files on it's own and NO UNIX command EVER needs cat to open the file for them, they can all either open the file themselves (cmd file) or have the shell open it for them cmd < file).

awk processes each input record one at a time, where an input record is any chunk of text separated by the value of awks RS variable (a newline by default). Doesn't matter how/where those records are coming from. The only thing you also [rarely] need to consider is buffering - see your awk and shell man pages for info on that.

One way to set shell variables from awk output:

$ cat file
the quick brown fox

$ array=( $(awk '{print $1, $2, $3}' file) )

$ echo "${array[0]}"                        
the
$ echo "${array[1]}"                        
quick
$ echo "${array[2]}"
brown

Set individual shell variables from the array contents if you like or just use the array.

Another way:

$ set -- $(awk '{print $1, $2, $3}' file)

$ echo "$1"
the
$ echo "$2"
quick
$ echo "$3"
brown

awk: process input from pipe, insert result before pattern in output file

Here's a modified version of your executable awk script that produces the ordering you want:

#!/usr/bin/awk -f

BEGIN { FS="[{}]"; mils="0.3527"; built=1 }

FNR==NR {
    if( $1 !~ /set lineno/ ) {
        if( lineno != "" ) { footer[++cnt]=$0; if(cnt==3) { FS = "[\" ]+" } }
        else print
    }
    else { lineno=$2 }
    next
}

FNR!=NR && NF > 0 { built += buildObjs( built+1 ) }

END {
    print "set lineno {" built "}"
    for(i=1;i<=cnt;i++ ) {
        print footer[i]
    }
}

function buildObjs( n )
{
    x=$4*mils; y=-$5*mils; w=$6*mils; h=$7*mils
    print "## element" n " [x]=" x " [y]=" y " [width]=" w " [height]=" h
    print "set fsize(" n ") {FALSE}"
    print "set fmargin(" n ") {FALSE}"
    print "set fmaster(" n ") {TRUE}"
    print "set ftype(" n ") {box}"
    print "set fname(" n ") {" w " " h "}"
    print "set fatt(" n ") {1}"
    print "set dplObjectSetup(" n ",TRA) {" x " " y "}"
    print "set fnum(" n ") {}"
    return 1
}

When put into a file called awko it would be run like:

hunspell -L -H ./text.xml | ./awko ./output.xml -

I don't have hunspell installed, so I tested this by running the Edit3 piped output from a file via cat:

cat ./pipeddata | ./awko ./output.xml -

Notice the - at after the output file. It's telling awk to read from stdin as the 2nd input to the awk script, which lets me deal with the first file with the standard FNR==NR { do stuff; next } logic.

Here's the breakdown:

For personal preferences, I moved the buildObjs() function to the end of the script. Notice I added a n argument to it - NR won't be used in the output. I dropped the a array because it didn't seem to be necessary and changed it's return from 0 to 1.
In the BEGIN block, setup output.xml file parsing, and mils
Whenever the FILENAME changes to -, change FS for parsing that input. The piped data FS could instead be set on the command line between the output file and the -.
When FNR==NR handle the first file
Basically, print the "header" info when your anchor hasn't been read
When the anchor is read, store it's value in lineno
After the anchor is read, store the last of the output file into the footer array in cnt order. Knowing there are only 3 lines at the end, I "cheated" to adjust the FS before the first record is read from STDIN.
When FNR!=NR and the line isn't blank (NF>0), process the piped input, incrementing built and passing it with a offset of 1 as an arg to buildObjs() ( as built starts with a value of 0 ).
In the END, the set lineno line is reconstructed/printed with the sum of lineno and built.
Then the footer from the first file is printed in order based on the cnt variable

Using the cat form, I get following:

#    file.encoding: UTF-8
# sun.jnu.encoding: UTF-8

set toolVersion {1.20}
set ftype(0) {pgs}
set fsize(0) {FALSE}
set fmargin(0) {FALSE}
set fsize(1) {TRUE}
set fmargin(1) {TRUE}
set fmaster(1) {FALSE}
set ftype(1) {pgs}
set fname(1) {}
set fatt(1) {0}
set dplObjectSetup(1,TRA) {}
set fnum(1) {}
## element2 [x]=32.6389 [y]=-21.7 [width]=3.35171 [height]=0
set fsize(2) {FALSE}
set fmargin(2) {FALSE}
set fmaster(2) {TRUE}
set ftype(2) {box}
set fname(2) {3.35171 0}
set fatt(2) {1}
set dplObjectSetup(2,TRA) {32.6389 -21.7}
set fnum(2) {}
## element3 [x]=32.3073 [y]=-38.0119 [width]=3.68325 [height]=0
set fsize(3) {FALSE}
set fmargin(3) {FALSE}
set fmaster(3) {TRUE}
set ftype(3) {box}
set fname(3) {3.68325 0}
set fatt(3) {1}
set dplObjectSetup(3,TRA) {32.3073 -38.0119}
set fnum(3) {}
## element4 [x]=46.7197 [y]=-11.5499 [width]=2.58776 [height]=0
set fsize(4) {FALSE}
set fmargin(4) {FALSE}
set fmaster(4) {TRUE}
set ftype(4) {box}
set fname(4) {2.58776 0}
set fatt(4) {1}
set dplObjectSetup(4,TRA) {46.7197 -11.5499}
set fnum(4) {}
set lineno {4}
set mode {1}
set preservePDF {1}
set preservePDFAction {Continue}

Seems like your buildObj() function logic needs some attention to get things just the way you want (I suspect the indexes you've chosen need shifting).

Read file and pipe it to awk does not print the expected results

In your first version awk is not getting stdin as you intended.

Copying from the comment by @William Purcell: "The read command reads the first line of input, and awk reads the next 2 (the 2 and the 3). That's why you see two lines of output."

For the next two lines you have initialized c variable with 1 (from first read).

If you wrap your statement in BEGIN block it will work as intended.

$ seq 3 | while IFS= read -r a; do awk -v c="$a" 'BEGIN{print c}'; done

however, that's rather inefficient way of doing things.

how to explain awk command with double pipe?

As in many languages, in awk || means or. That command will produce output if the current input line is the first one (NR == 1) or (||) the value of the last input field ($NF) on the current line is less than the given value ($NF < 0.05/461).

So it's printing the header line and any other lines for which the 2nd condition is true.

This involves a UUOC though:

cat CHR.17.dat | awk 'NR == 1 || $NF < 0.05/461'

and should instead be written:

awk 'NR == 1 || $NF < 0.05/461' CHR.17.dat

Pipe Command Output into Awk

It doesn't seem like you really want to pipe the value to awk. Instead, you want to pass it as a parameter. You could read it from the pipe with something like:

cmd1 | awk 'NR==FNR{a=$0} NR!=FNR{print $0,a}' - input.txt

but it seems much more natural to do:

awk '{print $0,a}' a="$(cmd1)" input.txt

awk on multiple files and piping output of each of it's run to the wc command separately

You could use awk to keep track of the count for each file in an array. Then at the end print the contents of the array:

  awk '$1==""{a[FILENAME]+=1} END{for(file in a) { print file, a[file] }}' `ls`

This way you don't have to tangle with wc and just shoot the contents right over to gnuplot

Example in use:

$> cat file1
,test
2,test
3,
$> cat file2
,test
2,test
3,
,test
$> awk -F"," '$1==""{a[FILENAME]+=1} END{for(file in a) { print file, a[file] }}' `ls`
file1 1
file2 2

How to Feed Awk Input from Both Pipe and File