Too many open files error while running awk command
Before starting on the next file, close the previous one:
awk '/pattern here/{close("file"i); i++}{print > "file"i}' InputFile
error in awk: cannot open - too many open files
Copy/paste exactly this command and it will work:
awk 'BEGIN{OFS="\t"} {out=$10"_"$8".txt"; print $1,$2,$3,$4,$12 >> out; close(out)}' mybigfile.txt
You've been experiencing 2 problems:
1) You're using an awk that is not GNU awk and so doesn't close files for you when needed, and
2) You're re-typing the commands people are suggesting you use instead of copy-pasting them and messing up the quotes when you do so, just like in the script in your question.
If you can use gawk then it'd simply be:
awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$12 > ($10"_"$8".txt")}' mybigfile.txt
Unlike with several other awks you don't technically need to parenthesize the expression on the right side of output redirection with gawk but it's a good habit to get into for portability and helps readability.
Too many open files in AWK
To awk the output is a pipe to "gzip >> "_fn
, not the file whose name is stored in _fn
, so that is what you need to close, e.g. close("gzip >> "_fn)
. You should copy/paste your shell script into http://shellcheck.net and fix the issues it tells you about first though as you have some quoting and other issues outside of the awk script.
Anyway, it seems like this might be what you're trying to do (untested):
for csv in "${_in_path}${_letter}_"*_*'.csv.gz'; do
zcat "$csv" |
sort -t',' -T tmp -k4 |
awk -F ',' '
$4 != key {
close(out)
key = $4
fn = "requests_by_IP/" key ".csv.gz"
out = "gzip >> " fn
}
{ print | out }
'
done
awk: cannot open pipe Too many open files
First of all, the error is probably because of not calling close
. But even after resolving that, if we make one call to system date
for every log line, and usually logs have many lines, then we have an extremely slow script.
So it is mandatory to use the GNU awk time functions or even better, if requirements allow, like here, to use only string functions. Usually we just rearrange fields, with the help of split()
or match()
, but if there are months to convert to numbers, there is a standard way to do it.
awk 'NR>3{ split($1, dat, "-"); split($2, tim, ":")
m=(index("JanFebMarAprMayJunJulAugSepOctNovDec", dat[2])+2)/3
print dat[3], m, dat[1], tim[1], tim[2], $4 }' file
We define the string with all 3-letter months, and for any argument to convert, we get the index()
where this substring begins, (Jan
is 1st character, Feb
4, Mar
7 etc, so (i+2)/3
will give the month number.
Output:
2020 9 27 16 00 83.004784
2020 9 27 16 01 82.821602
2020 9 27 16 02 82.786552
2020 9 27 16 03 82.666336
2020 9 27 16 04 82.837242
2020 9 27 16 05 82.579857
2020 9 27 16 06 82.693413
2020 9 27 16 08 82.700043
2020 9 27 16 09 82.646797
2020 9 27 16 10 82.794540
2020 9 27 16 11 82.600845
2020 9 27 16 12 82.815422
2020 9 27 16 13 82.866974
So these are the data, you can use printf
for any formatting you may want.
cannot open pipe too many open files
The issue you are having is that you are not closing your command which you pipe to your getline
. You write:
"echo -n "$6" | tail -c 3" | getline terminalCountry
Awk does the following with this:
If the same file name or the same shell command is used with getline more than once during the execution of an awk program, the file is opened (or the command is executed) the first time only. At that time, the first record of input is read from that file or command. The next time the same file or command is used with getline
, another record is read from it, and so on.
This implies if you have various $6
which are identical, your command will work only correctly the first time. Furthermore, it will have opened a "file" where the command writes its output too. If you have many many records, it will continuously open files and never close them leading to the error.
For a correct working order, you should close the "file" again. That is to say, you should write:
command="echo -n \047" $6 "\047 | tail -c 3"
command | getline terminalCountry
close(command)
But it feels a bit like overkill here, you might just be interested in:
terminalCountry=substr($6,length($6)-3)
Interesting reads:
- https://www.gnu.org/software/gawk/manual/gawk.html#Getline
- https://www.gnu.org/software/gawk/manual/gawk.html#Close-Files-And-Pipes
awk - too many open files issue / date parsing
Your problem is that you need to close your command:
unix="date -d\""$1" "$2"\" \"+%s\""; unix | getline timestamp; close(unix)
If you don't do this, a new pipe is opened for each record in your input file, which leads to the problem that you are experiencing.
Related Topics
How to Produce Stand Alone Haskell Executable
Xargs Sh -C Skipping the First Argument
Setting the Thread /Proc/Pid/Cmdline
Running Multiple Compass-Sass Watch Operations Automatically
Accessing Files Outside the Document Root with Apache
How to Extract One Column from Multiple Files, and Paste Those Columns into One File
Ldconfig Only Links Files Starting with Lib*
How to Export a Modified Kernel Header
How to Determine the Precise Set of Environment Variables a Systemd Environmentfile Would Set
Run Multiple Commands At Once in the Same Terminal
Makefile for Linux Kernel Module
How to Avoid the Prompts While Using Azcopy on Linux in a Script
Replace Text Based on a Dictionary
Bluetooth Low Energy:Android Gatt-Client Connect to Linux Gatt Server
How to Remove the Last Character of the Last Line of a File