Does Awk Cr Lf Handling Break on Cygwin

Does awk CR LF handling break on cygwin?

I just checked with Arnold Robbins (the provider of gawk) and the answer is that it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3:

$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1
line2

$ echo -e "line1\r\nline2" | awk -v BINMODE=3 '1' | cat -v
line1^M
line2

See the man page for more info if interested.

Different awk results on Linux and mingw64 with CRLF line endings

After searching a while, I found this question, And from this answer :

it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3

I changed your code to:

echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 $'BEGIN { RS="\\n" } {gsub("boo","foo"); print}' | cat -v

And tried it on Unix, Linux, MacOS, and Windows, all produce this output:

foo^M
foo^M

So -v BINMODE=3 is what you are looking for.

NOTE that only -v BINMODE=3 this switch & before code way working.

Usually we can pass variable to awk by -v switch, in BEGIN block, or set it after code before files,

but in this case I tried the three ways, only -v BINMODE=3 works.

Guess it's something to do with awk's compiling process.

Example (under cygwin on Windows):

$ echo -n $'boo\r\nboo\r\n' | awk -v BINMODE=3 '1' | cat -v    
boo^M
boo^M

$ echo -n $'boo\r\nboo\r\n' | awk 'BEGIN{BINMODE=3}1' | cat -v
boo
boo

$ echo -n $'boo\r\nboo\r\n' | awk '1' BINMODE=3 | cat -v
boo
boo

Under other mentioned platforms, they all produce:

boo^M
boo^M

removing duplicates files with CRLF using awk command

You could add the CRLF sequence to the record separator:

awk -v RS='\n|\r\n' '!seen[$0]++' file

Portable way to split an external variable containing newlines in awk?

POSIX awk does not allow physical newlines in string values.

When you use C/BASH string notation like $'a\nb' then any POSIX compliant awk implementation will fail.

Even with gnu-awk, when you enable posix option following error will be returned:

awk --posix -v s=$'X\nX' 'BEGIN { print split(s,a,"\n") }'
awk: fatal: POSIX does not allow physical newlines in string values

However if you remove $'...' notation then error will not be there:

awk --posix -v s="X\nX" 'BEGIN { print split(s,a,"\n") }'
2

How to find out line-endings in a text file?

You can use the file utility to give you an indication of the type of line endings.

Unix:

$ file testfile1.txt
testfile.txt: ASCII text

"DOS":

$ file testfile2.txt
testfile2.txt: ASCII text, with CRLF line terminators

To convert from "DOS" to Unix:

$ dos2unix testfile2.txt

To convert from Unix to "DOS":

$ unix2dos testfile1.txt

Converting an already converted file has no effect so it's safe to run blindly (i.e. without testing the format first) although the usual disclaimers apply, as always.

How come the POSIX mode of GNU Awk does not consider a new line a field, when setting the RS to another thing?

When reading the POSIX standard, then we find:

The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non-<blank> non-<newline> characters. This default <blank> and <newline> field delimiter can be changed by using the FS built-in variable

If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

source: POSIX awk standard: IEEE Std 1003.1-2017

Having that said, the proper behaviour should be the following:

$ echo | awk 'BEGIN{RS="a"}{print NR,NF,length}'
1 0 1
  • a single record: no <a>-character has been encountered
  • no fields: FS is the default space so all leading and trailing <blank> and <newline> characters; are skipped
  • length one: there is only a single character in the record.

When defining the FS, the story is completely different:

$ echo | awk 'BEGIN{FS="b";RS="a"}{print NR,NF,length}'
1 1 1
$ echo | awk 'BEGIN{FS="\n";RS="a"}{print NR,NF,length}'
1 2 1

In conclusion: I believe the GNU awk documentation is wrong.



Related Topics



Leave a reply



Submit