Does awk CR LF handling break on cygwin?
I just checked with Arnold Robbins (the provider of gawk) and the answer is that it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1
line2
$ echo -e "line1\r\nline2" | awk -v BINMODE=3 '1' | cat -v
line1^M
line2
See the man page for more info if interested.
Different awk results on Linux and mingw64 with CRLF line endings
After searching a while, I found this question, And from this answer :
it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3
I changed your code to:
echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 $'BEGIN { RS="\\n" } {gsub("boo","foo"); print}' | cat -v
And tried it on Unix, Linux, MacOS, and Windows, all produce this output:
foo^M
foo^M
So -v BINMODE=3
is what you are looking for.
NOTE that only -v BINMODE=3
this switch & before code way working.
Usually we can pass variable to awk by -v
switch, in BEGIN
block, or set it after code before files,
but in this case I tried the three ways, only -v BINMODE=3
works.
Guess it's something to do with awk
's compiling process.
Example (under cygwin
on Windows):
$ echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 '1' | cat -v
boo^M
boo^M
$ echo -n $'boo\r\nboo\r\n' | awk 'BEGIN{BINMODE=3}1' | cat -v
boo
boo
$ echo -n $'boo\r\nboo\r\n' | awk '1' BINMODE=3 | cat -v
boo
boo
Under other mentioned platforms, they all produce:
boo^M
boo^M
removing duplicates files with CRLF using awk command
You could add the CRLF sequence to the record separator:
awk -v RS='\n|\r\n' '!seen[$0]++' file
Portable way to split an external variable containing newlines in awk?
POSIX awk does not allow physical newlines in string values.
When you use C/BASH string notation like $'a\nb'
then any POSIX compliant awk implementation will fail.
Even with gnu-awk, when you enable posix
option following error will be returned:
awk --posix -v s=$'X\nX' 'BEGIN { print split(s,a,"\n") }'
awk: fatal: POSIX does not allow physical newlines in string values
However if you remove $'...'
notation then error will not be there:
awk --posix -v s="X\nX" 'BEGIN { print split(s,a,"\n") }'
2
How to find out line-endings in a text file?
You can use the file
utility to give you an indication of the type of line endings.
Unix:
$ file testfile1.txt
testfile.txt: ASCII text
"DOS":
$ file testfile2.txt
testfile2.txt: ASCII text, with CRLF line terminators
To convert from "DOS" to Unix:
$ dos2unix testfile2.txt
To convert from Unix to "DOS":
$ unix2dos testfile1.txt
Converting an already converted file has no effect so it's safe to run blindly (i.e. without testing the format first) although the usual disclaimers apply, as always.
How come the POSIX mode of GNU Awk does not consider a new line a field, when setting the RS to another thing?
When reading the POSIX standard, then we find:
The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non-<blank> non-<newline> characters. This default <blank> and <newline> field delimiter can be changed by using the
FS
built-in variableIf
FS
is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.source: POSIX awk standard: IEEE Std 1003.1-2017
Having that said, the proper behaviour should be the following:
$ echo | awk 'BEGIN{RS="a"}{print NR,NF,length}'
1 0 1
- a single record: no <a>-character has been encountered
- no fields:
FS
is the default space so all leading and trailing <blank> and <newline> characters; are skipped - length one: there is only a single character in the record.
When defining the FS
, the story is completely different:
$ echo | awk 'BEGIN{FS="b";RS="a"}{print NR,NF,length}'
1 1 1
$ echo | awk 'BEGIN{FS="\n";RS="a"}{print NR,NF,length}'
1 2 1
In conclusion: I believe the GNU awk documentation is wrong.
Related Topics
How to Setup and Clone a Remote Git Repo on Windows
How to Make Binary Distribution of Qt Application for Linux
Evaluating Smi (System Management Interrupt) Latency on Linux-Centos/Intel MAChine
Getting Current Path in Variable and Using It
How to Toggle Cr/Lf in Gnu Screen
Behavior of Cd/Bash on Symbolic Links
How to Install Influxdb in Windows
Docker Copy with File Globbing
Why Is Pr_Debug of the Linux Kernel Not Giving Any Output
Using Named Pipes with Bash - Problem with Data Loss
Microsecond Accurate (Or Better) Process Timing in Linux
D-Bus Tutorial in C to Communicate with Wpa_Supplicant
Paste Two Text Lists (One List a File) into One List Separated by Semicolon
How to Install Opencv on Amazon Linux
How to Use Local and Remote Variables Within a Heredoc or Command Over Ssh