How to Detect If a File Is Binary (Non-Text) in Python

How can I detect if a file is binary (non-text) in Python?

You can also use the mimetypes module:

import mimetypes
...
mime = mimetypes.guess_type(file)

It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.

Python 3, can I tell if a file opened in binary mode contains text?

You might try the binaryornot library.

pip install binaryornot

Then in the code:

from binaryornot.check import is_binary
is_binary(f_path)

Here is their documentation:

https://pypi.org/project/binaryornot/

How to identify binary and text files using Python?

Thanks everybody, I found a solution that suited my problem. I found this code at http://code.activestate.com/recipes/173220/ and I changed just a little piece to suit me.

It works fine.

from __future__ import division
import string

def istext(filename):
s=open(filename).read(512)
text_characters = "".join(map(chr, range(32, 127)) + list("\n\r\t\b"))
_null_trans = string.maketrans("", "")
if not s:
# Empty files are considered text
return True
if "\0" in s:
# Files with null bytes are likely binary
return False
# Get the non-text characters (maps a character to itself then
# use the 'remove' option to get rid of the text characters.)
t = s.translate(_null_trans, text_characters)
# If more than 30% non-text characters, then
# this is considered a binary file
if float(len(t))/float(len(s)) > 0.30:
return False
return True

How to determine if file is opened in binary or text mode?

File objects have a .mode attribute:

def is_binary(f):
return 'b' in f.mode

This limits the test to files; in-memory file objects like TextIO and BytesIO do not have that attribute. You could also test for the appropriate abstract base classes:

import io

def is_binary(f):
return isinstance(f, (io.RawIOBase, io.BufferedIOBase))

or the inverse

def is_binary(f):
return not isinstance(f, io.TextIOBase)

How to check if the file is a binary file and read all the files which are not?

Use utility file, sample usage:

 $ file /bin/bash
/bin/bash: Mach-O universal binary with 2 architectures
/bin/bash (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/bash (for architecture i386): Mach-O executable i386

$ file /etc/passwd
/etc/passwd: ASCII English text

$ file code.c
code.c: ASCII c program text

file manual page

How do I distinguish between 'binary' and 'text' files?

The spreadsheet software my company makes reads a number of binary file formats as well as text files.

We first look at the first few bytes for a magic number which we recognize. If we do not recognize the magic number of any of the binary types we read, then we look at up to the first 2K bytes of the file to see whether it appears to be a UTF-8, UTF-16 or a text file encoded in the current code page of the host operating system. If it passes none of these tests, we assume that it is not a file we can deal with and throw an appropriate exception.

How can I determine if a file is binary or text in c#?

I would probably look for an abundance of control characters which would typically be present in a binary file but rarely in an text file. Binary files tend to use 0 enough that just testing for many 0 bytes would probably be sufficient to catch most files. If you care about localization you'd need to test multi-byte patterns as well.

As stated though, you can always be unlucky and get a binary file that looks like text or vice versa.

How to check whether a file is empty or not

>>> import os
>>> os.stat("file").st_size == 0
True


Related Topics



Leave a reply



Submit