How to Display Filename from a Column Using Awk

How to display filename from a column using Awk?

FILENAME is a string, not a number. Use %s:

awk -F"," '{ if(NF > 5) printf("Filename: %s  Index: %d Number of commas : %d\n",FILENAME,NR, NF-1); }' dsc* >> filename.csv

From the section of man awk that discusses printf:

  %d, %i  A decimal number (the integer part).

%s A character string.

How awk the filename as a column in the output?

As long as the number of files is not huge, why not just:

grep NM_001080771 *_2.5kb.txt | awk -F: '{print $2,$1}'

If you have too many files for that to work, here's a script-based approach that uses awk to append the filename:

for i in *_2.5kb.txt; do
< $i grep "NM_001080771" | \
awk -v where=`basename $i` '{print $0,where}'

./thatscript | head > prom_genes_2.5kb.txt

Here we are using awk's -v VAR=VALUE command line feature to pass in the filename (because we are using stdin we don't have anything useful in awk's built-in FILENAME variable).

You can also use such a loop around @karakfa's elegant awk-only approach:

for i in *_2.5kb.txt; do
awk '/NM_001080771/ {print $0, FILENAME}' $i

And finally, here's a version with the desired filename munging:

for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} '/NM_001080771/ {print $0, TAG}' $i

(this uses the shell's variable substitution ${variable%pattern} to trim pattern from the end of variable)


Guessing you might want to search for other strings in the future, so why don't we pass in the search string like so:

what=${1?Need search string}
for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} /${what}/' {print $0, TAG}' $i

./thatscript NM_001080771 | head > prom_genes_2.5kb.txt


Or if you have a pathological need to over-complicate and pedantically quote things, even in 5-line "throwaway" scripts:

shopt -s nullglob

what="${1?Need search string}"

for filename in $filematch; do
awk -v tag="${filename%${trimsuffix}}" \
-v what="${what}" \
'$0 ~ what {print $0, tag}' $filename

awk extract a column and output a file named by the column header

here's a similar one...

$ awk 'NR==1 {n=split($0,h)} 
{for(i=2;i<=n;i++) print $1,$i > (h[i]".reform.txt")}' file

==> col2.reform.txt <==
col1 col2
1 3
2 4
3 1
5 3

==> col3.reform.txt <==
col1 col3
1 4
2 6
3 5
5 7

==> col4.reform.txt <==
col1 col4
1 A
2 B
3 D
5 F

Using awk to include file name with format in column

This would handle one or multiple input files:

awk -v OFS='\t' '
NR==1 { print "file", $0 }
FNR==1 { n=split(FILENAME,t,/[_.]/); fname=t[n-1]; next }
{ print fname, $0 }
' *.txt

Use row 1 column ith as output filename awk

You should keep column headers in an array.

awk 'NR==1 {
for (i=2; i<=NF; ++i) {
fnames[i] = gensub(/\x27/, "", "g", $i)
print $1, $i > fnames[i] ".txt"
for (i=2; i<=NF; ++i)
print $1, "\x27" $i "\x27" > fnames[i] ".txt"
}' myfile.txt
  • \x27 is single quote in hex-escaped form
  • gensub(/\x27/, "", "g", $i) removes single quotes from column headers to name output files as you wanted.

Print filenames & line number with number of fields greater than 'x'

If you have GNU awk with you, try following code then. This will simply check condition if NF is greater than 7 then it will print that particular file's file name along with line number and nextfile will take program to next Input_file which will save our time because we need not to read whole Input_file then.

awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv

Above will print only very first match of condition to get/print all matched lines try following then:

awk -F',' 'NF>7{print FILENAME,FNR}' *.csv

Add column of filename to hundreds of files bash

tmp=$(mktemp) || { ret="$?"; printf 'Failed to create temp file\n'; exit "$ret"; }
for file in *.txt; do
awk 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' "$file" > "$tmp" &&
mv -- "$tmp" "$file" || exit

If you have GNU awk and don't have so many files you exceed the shell arguments limit you can instead use just a call to awk with no surrounding shell loop and explicitly created temp file (it'll still use a temp file behind the scenes, just like all tools that have an option for "inplace" editing):

awk -i inplace 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' *.txt

How to print filename in awk?

Could you please try following. You need not to use for loop for it.

awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7}' r*  | column -t > out

In case you have GNU awk you can use nextfile to save many cycles and directly jump to next file once condition met like:

awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7; nextfile}' r*  | column -t > out

To set \t\t as tab separator for output set OFS in BEGIN section. Add | column -t in output to get it correct format of TAB.

Remove BEGIN{OFS="\t\t"} in case it is NOT required since column command is added(OP asked for TAB in output so that was added for it).

Related Topics

Leave a reply
