Dos2Unix: Binary Symbol Found, Skipping Binary File

dos2unix: Binary symbol found, skipping binary file

The ^@ is Vim's representation of a null byte; cp. :help <Nul>

Ordinary text files do not contain null characters. Binary files typically have many null characters, and they would become corrupted if converted as a whole; that's why dos2unix refuses to convert it.

You have several options:

  • That null character may have been inserted by accident or is garbage. Edit the file (in Vim) or recreate it. If you're using Vim, you can do the conversion in it as well (via :help ++ff, e.g. :w ++ff=unix). Command-line tools like dos2unix still have their use for non-interactive invocations.
  • That null character belongs there. The dos2unix command has a -f|--force option to enforce conversion.

dos2unix: Binary symbol 0x04 found at line 1703

That 0x0004 character you are seeing in your file has nothing at all to do with the BOM (which is fine, by the way) -- it's an EOT (End of Transmission) character from the C0 control set, and has been at that codepoint since 7-bit ASCII was the new hotness. (It's also the familiar Control-D Unix EOF sequence.)

Unfortunately, the pre-dos2unix way of applying tr to the file to strip the carriage returns won't work directly since the file is UTF-16; since iconv works for you, though, you can use it to convert to UTF-8 (which tr will work on), and then run this tr command:

tr -d '\r' < crs_2013_data_temp.txt > crs_2013_data_unix.txt

in order to get the text file into the Unix line ending convention. You will have to keep an eye on whatever tools you're feeding the file to, though, to make sure that they don't choke on the Ctrl-D/EOT character; if they do, you can use

tr -d '\004' < crs_2013_data_unix.txt > crs_2013_data_clean.txt

to get rid of it.

As to how it got there in the first place? I blame the Belgians for letting it sneak into the data they gave the OECD, which they probably keyed in with cat - > file or some other similarly underwhelming means. Also, some text editors try to be a bit too helpful by hiding control characters, even though other tools will bail out when they see them as they think you just stuffed a binary file in that was pretending to be text for a while.

dos2unix modifies binary files - why

This is a relevant part of the source code of dos2unix program:

if ((ipFlag->Force == 0) &&
(TempChar < 32) &&
(TempChar != 0x0a) && /* Not an LF */
(TempChar != 0x0d) && /* Not a CR */
(TempChar != 0x09) && /* Not a TAB */
(TempChar != 0x0c)) { /* Not a form feed */
RetVal = -1;
ipFlag->status |= BINARY_FILE ;
if (ipFlag->verbose) {
if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
d2u_fprintf(stderr, "%s: ", progname);
d2u_fprintf(stderr, _("Binary symbol 0x00%02X found at line %u\n"),TempChar, line_nr);
}
break;
}

It seems that if the file has other control character it is considered as a binary file and is skipped, otherwise it is processed as a text file. So if the binary file (e.g. an image) doesn't contain these characters, it will be corrupted.

How to strip binary characters from a file?

There's something called ansifilter which does exactly this. I tested it out on my file and it works.

dos2unix doesn't convert the env file even with -f option

First option -f, then the file name:

sudo dos2unix -f env

Unusual ./configure error when building GDAL 2.0.0 from source

./configure seems sensitive to the source being in a sub-directory inside a VM shared folder (vmhgfs)

.host:/adam    500105212 141512588 358592624  29% /mnt/adam
  • When in /mnt/adam/gdal-2.0.0 ./configure works correctly

  • When in ~/adam/gdal/gdal-2.0.0 ./configure works correctly

  • However in /mnt/adam/gdal/gdal-2.0.0 ./configure fails with error in question.

I can only assume this is some Unix permissions issue etc.



Related Topics



Leave a reply



Submit