How to Display Filename from a Column Using Awk

How to display filename from a column using Awk?

FILENAME is a string, not a number. Use %s:

awk -F"," '{ if(NF > 5) printf("Filename: %s  Index: %d Number of commas : %d\n",FILENAME,NR, NF-1); }' dsc* >> filename.csv

From the section of man awk that discusses printf:

  %d, %i  A decimal number (the integer part).

%s A character string.

How awk the filename as a column in the output?

As long as the number of files is not huge, why not just:

grep NM_001080771 *_2.5kb.txt | awk -F: '{print $2,$1}'

If you have too many files for that to work, here's a script-based approach that uses awk to append the filename:

#!/bin/sh
for i in *_2.5kb.txt; do
< $i grep "NM_001080771" | \
awk -v where=`basename $i` '{print $0,where}'
done

./thatscript | head > prom_genes_2.5kb.txt

Here we are using awk's -v VAR=VALUE command line feature to pass in the filename (because we are using stdin we don't have anything useful in awk's built-in FILENAME variable).

You can also use such a loop around @karakfa's elegant awk-only approach:

#!/bin/sh
for i in *_2.5kb.txt; do
awk '/NM_001080771/ {print $0, FILENAME}' $i
done

And finally, here's a version with the desired filename munging:

#!/bin/sh
for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} '/NM_001080771/ {print $0, TAG}' $i
done

(this uses the shell's variable substitution ${variable%pattern} to trim pattern from the end of variable)

Bonus

Guessing you might want to search for other strings in the future, so why don't we pass in the search string like so:

#!/bin/sh
what=${1?Need search string}
for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} /${what}/' {print $0, TAG}' $i
done

./thatscript NM_001080771 | head > prom_genes_2.5kb.txt

YET ANOTHER EDIT

Or if you have a pathological need to over-complicate and pedantically quote things, even in 5-line "throwaway" scripts:

#!/bin/sh
shopt -s nullglob

what="${1?Need search string}"
filematch="*_2.5kb.txt"
trimsuffix="_merged_peaks_2.5kb.txt"

for filename in $filematch; do
awk -v tag="${filename%${trimsuffix}}" \
-v what="${what}" \
'$0 ~ what {print $0, tag}' $filename
done

awk extract a column and output a file named by the column header

here's a similar one...

$ awk 'NR==1 {n=split($0,h)} 
{for(i=2;i<=n;i++) print $1,$i > (h[i]".reform.txt")}' file

==> col2.reform.txt <==
col1 col2
1 3
2 4
3 1
5 3

==> col3.reform.txt <==
col1 col3
1 4
2 6
3 5
5 7

==> col4.reform.txt <==
col1 col4
1 A
2 B
3 D
5 F

Using awk to include file name with format in column

This would handle one or multiple input files:

awk -v OFS='\t' '
NR==1 { print "file", $0 }
FNR==1 { n=split(FILENAME,t,/[_.]/); fname=t[n-1]; next }
{ print fname, $0 }
' *.txt

Use row 1 column ith as output filename awk

You should keep column headers in an array.

awk 'NR==1 {
for (i=2; i<=NF; ++i) {
fnames[i] = gensub(/\x27/, "", "g", $i)
print $1, $i > fnames[i] ".txt"
}
next
}
{
for (i=2; i<=NF; ++i)
print $1, "\x27" $i "\x27" > fnames[i] ".txt"
}' myfile.txt
  • \x27 is single quote in hex-escaped form
  • gensub(/\x27/, "", "g", $i) removes single quotes from column headers to name output files as you wanted.

Print filenames & line number with number of fields greater than 'x'

If you have GNU awk with you, try following code then. This will simply check condition if NF is greater than 7 then it will print that particular file's file name along with line number and nextfile will take program to next Input_file which will save our time because we need not to read whole Input_file then.

awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv

Above will print only very first match of condition to get/print all matched lines try following then:

awk -F',' 'NF>7{print FILENAME,FNR}' *.csv

Add column of filename to hundreds of files bash

tmp=$(mktemp) || { ret="$?"; printf 'Failed to create temp file\n'; exit "$ret"; }
for file in *.txt; do
awk 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' "$file" > "$tmp" &&
mv -- "$tmp" "$file" || exit
done

If you have GNU awk and don't have so many files you exceed the shell arguments limit you can instead use just a call to awk with no surrounding shell loop and explicitly created temp file (it'll still use a temp file behind the scenes, just like all tools that have an option for "inplace" editing):

awk -i inplace 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' *.txt

How to print filename in awk?

Could you please try following. You need not to use for loop for it.

awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7}' r*  | column -t > out

In case you have GNU awk you can use nextfile to save many cycles and directly jump to next file once condition met like:

awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7; nextfile}' r*  | column -t > out

To set \t\t as tab separator for output set OFS in BEGIN section. Add | column -t in output to get it correct format of TAB.

Remove BEGIN{OFS="\t\t"} in case it is NOT required since column command is added(OP asked for TAB in output so that was added for it).



Related Topics



Leave a reply



Submit