How to display filename from a column using Awk?
FILENAME
is a string, not a number. Use %s
:
awk -F"," '{ if(NF > 5) printf("Filename: %s Index: %d Number of commas : %d\n",FILENAME,NR, NF-1); }' dsc* >> filename.csv
From the section of man awk
that discusses printf
:
%d, %i A decimal number (the integer part).
%s A character string.
How awk the filename as a column in the output?
As long as the number of files is not huge, why not just:
grep NM_001080771 *_2.5kb.txt | awk -F: '{print $2,$1}'
If you have too many files for that to work, here's a script-based approach that uses awk to append the filename:
#!/bin/sh
for i in *_2.5kb.txt; do
< $i grep "NM_001080771" | \
awk -v where=`basename $i` '{print $0,where}'
done
./thatscript | head > prom_genes_2.5kb.txt
Here we are using awk's -v VAR=VALUE
command line feature to pass in the filename (because we are using stdin we don't have anything useful in awk's built-in FILENAME variable).
You can also use such a loop around @karakfa's elegant awk-only approach:
#!/bin/sh
for i in *_2.5kb.txt; do
awk '/NM_001080771/ {print $0, FILENAME}' $i
done
And finally, here's a version with the desired filename munging:
#!/bin/sh
for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} '/NM_001080771/ {print $0, TAG}' $i
done
(this uses the shell's variable substitution ${variable%pattern}
to trim pattern
from the end of variable
)
Bonus
Guessing you might want to search for other strings in the future, so why don't we pass in the search string like so:
#!/bin/sh
what=${1?Need search string}
for i in *_2.5kb.txt; do
awk -v TAG=${i%_merged_peaks_2.5kb.txt} /${what}/' {print $0, TAG}' $i
done
./thatscript NM_001080771 | head > prom_genes_2.5kb.txt
YET ANOTHER EDIT
Or if you have a pathological need to over-complicate and pedantically quote things, even in 5-line "throwaway" scripts:
#!/bin/sh
shopt -s nullglob
what="${1?Need search string}"
filematch="*_2.5kb.txt"
trimsuffix="_merged_peaks_2.5kb.txt"
for filename in $filematch; do
awk -v tag="${filename%${trimsuffix}}" \
-v what="${what}" \
'$0 ~ what {print $0, tag}' $filename
done
awk extract a column and output a file named by the column header
here's a similar one...
$ awk 'NR==1 {n=split($0,h)}
{for(i=2;i<=n;i++) print $1,$i > (h[i]".reform.txt")}' file
==> col2.reform.txt <==
col1 col2
1 3
2 4
3 1
5 3
==> col3.reform.txt <==
col1 col3
1 4
2 6
3 5
5 7
==> col4.reform.txt <==
col1 col4
1 A
2 B
3 D
5 F
Using awk to include file name with format in column
This would handle one or multiple input files:
awk -v OFS='\t' '
NR==1 { print "file", $0 }
FNR==1 { n=split(FILENAME,t,/[_.]/); fname=t[n-1]; next }
{ print fname, $0 }
' *.txt
Use row 1 column ith as output filename awk
You should keep column headers in an array.
awk 'NR==1 {
for (i=2; i<=NF; ++i) {
fnames[i] = gensub(/\x27/, "", "g", $i)
print $1, $i > fnames[i] ".txt"
}
next
}
{
for (i=2; i<=NF; ++i)
print $1, "\x27" $i "\x27" > fnames[i] ".txt"
}' myfile.txt
\x27
is single quote in hex-escaped formgensub(/\x27/, "", "g", $i)
removes single quotes from column headers to name output files as you wanted.
Print filenames & line number with number of fields greater than 'x'
If you have GNU awk
with you, try following code then. This will simply check condition if NF
is greater than 7
then it will print that particular file's file name along with line number and nextfile
will take program to next Input_file which will save our time because we need not to read whole Input_file then.
awk -F',' 'NF>7{print FILENAME,FNR;nextfile}' *.csv
Above will print only very first match of condition to get/print all matched lines try following then:
awk -F',' 'NF>7{print FILENAME,FNR}' *.csv
Add column of filename to hundreds of files bash
tmp=$(mktemp) || { ret="$?"; printf 'Failed to create temp file\n'; exit "$ret"; }
for file in *.txt; do
awk 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' "$file" > "$tmp" &&
mv -- "$tmp" "$file" || exit
done
If you have GNU awk and don't have so many files you exceed the shell arguments limit you can instead use just a call to awk with no surrounding shell loop and explicitly created temp file (it'll still use a temp file behind the scenes, just like all tools that have an option for "inplace" editing):
awk -i inplace 'BEGIN{OFS="\t"} {print $0, (FNR>1 ? FILENAME : "name")}' *.txt
How to print filename in awk?
Could you please try following. You need not to use for
loop for it.
awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7}' r* | column -t > out
In case you have GNU awk
you can use nextfile
to save many cycles and directly jump to next file once condition met like:
awk 'BEGIN{OFS="\t\t"} FNR==2{print FILENAME, $7; nextfile}' r* | column -t > out
To set \t\t
as tab separator for output set OFS
in BEGIN
section. Add | column -t
in output to get it correct format of TAB.
Remove BEGIN{OFS="\t\t"}
in case it is NOT required since column
command is added(OP asked for TAB in output so that was added for it).
Related Topics
How to Get Docker Commands to Run in the Background with Nohup
How to Determinate Destination MAC Address
Why Does Bash Behave Differently, When It Is Called as Sh
Movdqu Instruction + Page Boundary
Resolve Relative Relocations in Partial Link
Gatttool Non-Interactive Mode --Char-Write
How to Scale Ejabberd Server MAChine on Centos to Handle 200 K Connections
Random Alphanumeric String Linux Swift 3
Python3 Unicodeencodeerror When Run via Synology Task Scheduler
Recursively Rename Files to Ascii Standard
Will Process Lost Wake-Up Chance in a Preemptive Kernel
How to Compile This Asm Code Under Linux with Nasm and Gcc