Difference Between Opening a File in Binary VS Text

Difference between opening a file in binary vs text

The link you gave does actually describe the differences, but it's buried at the bottom of the page:

http://www.cplusplus.com/reference/cstdio/fopen/

Text files are files containing sequences of lines of text. Depending on the environment where the application runs, some special character conversion may occur in input/output operations in text mode to adapt them to a system-specific text file format. Although on some environments no conversions occur and both text files and binary files are treated the same way, using the appropriate mode improves portability.

The conversion could be to normalize \r\n to \n (or vice-versa), or maybe ignoring characters beyond 0x7F (a-la 'text mode' in FTP). Personally I'd open everything in binary-mode and use a good Unicode or other text-encoding library for dealing with text.

Differences between writing/reading binary/text in c

Just do everything with "binary" files. Linux has no difference between "text" and "binary" in a file on OS level, there are just files with bytes in it. Ie. expect that a file contains every possible byte value, and don´t write different code for different kinds of content.

There is a difference in Windows: Text mode in Windows means that a line break (\n) in the program gets converted to/from \r\n when writing to / reading from a file. The written text file read in binary mode will contain this two bytes instead of the original \n and vice-versa. (Additionally, MS isn´t very clear in the documentation that this is the only difference, it can confuse beginners easily.)

If you use standard C fopen and fclose instead of Linux-specific open etc., you can specify to open a file in binary or text mode (on Linux too). This is because code with fopen should work on Windows and Linux without any OS-specific changes; but what you choose in fopen doesn´t matter when running on Linux (which can be verified by reading the source code of fopen etc.)

And about the sockets:

Linux: No difference (again)

Windows: No difference too. There are just bytes, and no strange line break conversions.

File Binary vs Text

As a general rule, define a text format, and use it. It's much
easier to develop and debug, and it's much easier to see what is
going wrong if it doesn't work.

If you find that the files are becoming too big, or taking to
much time to transfer over the wire, consider compressing them.
A compressed text file is often smaller than you can do with
binary. Or consider a less verbose text format; it's possible
to reliably transmit a text representation of your data with
a lot less characters than XML uses.

And finally, if you do end up having to use binary, try to chose
an existing format (e.g. Google's protocol blocks), or base your
format on an existing format. Just remember that:

  • Binary is a lot more work than text, since you practically
    have to write all of the << operators again, including those
    in the standard library.

  • Binary is a lot more difficult to debug, because you can't
    easily see what you've actually done.

Concerning your last edit:

  • Once you've encrypted, the results will be binary. You can
    use a text representation of the binary (base64 or some such),
    but the results won't be any more readable than the binary, so
    it's not worth the bother. If you're encrypting in process,
    before writing to disk, you automatically lose all of the
    advantages of text.

  • The issues concerning powering off mean that you cannot use
    ofstream directly. You must open or create the file with the
    necessary options for full transactional integrity (O_SYNC as
    a flag to open under Unix). You must write each record as
    a single write request to the system.

  • It's always a good idea to have a checksum, just in case. If
    you're worried about security, SHA1 is a good choice. But keep
    in mind that if someone has access to the file, and wants to
    intentionally change it, they can recalculate the SHA1 and
    insert the new value as well.

Difference between files written in binary and text mode

I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:

  • line feeds ('\n') will be translated to '\r\n" sequences on output
  • carriage return/line feed sequences will be translated to line feeds on input.
  • If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.

difference between text file and binary file

At the bottom level, they are all bits... true. However, some transmission channels have seven bits per byte, and other transmission channels have eight bits per byte. If you transmit ASCII text over a seven-bit channel, then all is fine. Binary data gets mangled.

Additionally, different systems use different conventions for line endings: LF and CRLF are common, but some systems use CR or NEL. A text transmission mode will convert line endings automatically, which will damage binary files.

However, this is all mostly of historical interest these days. Most transmission channels are eight bit (such as HTTP) and most users are fine with whatever line ending they get.

Some examples of 7-bit channels: SMTP (nominally, without extensions), SMS, Telnet, some serial connections. The internet wasn't always built on TCP/IP, and it shows.

Additionally, the HTTP spec states that,

When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body.

Difference between binary and text I/O in python on Windows

This mode is about conversion of line endings.

When reading in text mode, the platform's native line endings (\r\n on Windows) are converted to Python's Unix-style \n line endings. When writing in text mode, the reverse happens.

In binary mode, no such conversion is done.

Other platforms usually do fine without the conversion, because they store line endings natively as \n. (An exception is Mac OS, which used to use \r in the old days.) Code relying on this, however, is not portable.

C++ Is using files in Binary mode better or worse then text mode?

The only difference between binary and text access is who does the interpretation of a sequence of bytes in a stream.

  • When you use binary mode, the task of interpreting the sequence of bytes is entirely yours: your program gets access to "raw" bytes, and that is that.
  • When you use text mode, the standard library takes on the task of interpreting bytes as a sequence of characters for you. The standard does not guarantee cross-system portability of this interpretation, but it will be correct for the system for which your program is compiled.

Another thing to note is that all text files can be processed in binary mode, while opening binary files in text mode may be problematic.

In general, if you need a portable text encoding, you should access files in binary mode and do interpretation yourself, or use a custom library that would do it for you.



Related Topics



Leave a reply



Submit