How can I detect if a file is binary (non-text) in Python?
You can also use the mimetypes module:
import mimetypes
...
mime = mimetypes.guess_type(file)
It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list. Python 3, can I tell if a file opened in binary mode contains text?
You might try the binaryornot library.
pip install binaryornot
Then in the code:from binaryornot.check import is_binary
is_binary(f_path)
Here is their documentation:https://pypi.org/project/binaryornot/
How to identify binary and text files using Python?
Thanks everybody, I found a solution that suited my problem. I found this code at http://code.activestate.com/recipes/173220/ and I changed just a little piece to suit me.
It works fine.
from __future__ import division
import string
def istext(filename):
s=open(filename).read(512)
text_characters = "".join(map(chr, range(32, 127)) + list("\n\r\t\b"))
_null_trans = string.maketrans("", "")
if not s:
# Empty files are considered text
return True
if "\0" in s:
# Files with null bytes are likely binary
return False
# Get the non-text characters (maps a character to itself then
# use the 'remove' option to get rid of the text characters.)
t = s.translate(_null_trans, text_characters)
# If more than 30% non-text characters, then
# this is considered a binary file
if float(len(t))/float(len(s)) > 0.30:
return False
return True
How to determine if file is opened in binary or text mode?
File objects have a .mode
attribute:
def is_binary(f):
return 'b' in f.mode
This limits the test to files; in-memory file objects like TextIO
and BytesIO
do not have that attribute. You could also test for the appropriate abstract base classes:import io
def is_binary(f):
return isinstance(f, (io.RawIOBase, io.BufferedIOBase))
or the inversedef is_binary(f):
return not isinstance(f, io.TextIOBase)
How to check if the file is a binary file and read all the files which are not?
Use utility file
, sample usage:
$ file /bin/bash
/bin/bash: Mach-O universal binary with 2 architectures
/bin/bash (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/bash (for architecture i386): Mach-O executable i386
$ file /etc/passwd
/etc/passwd: ASCII English text
$ file code.c
code.c: ASCII c program text
file
manual page How do I distinguish between 'binary' and 'text' files?
The spreadsheet software my company makes reads a number of binary file formats as well as text files.
We first look at the first few bytes for a magic number which we recognize. If we do not recognize the magic number of any of the binary types we read, then we look at up to the first 2K bytes of the file to see whether it appears to be a UTF-8, UTF-16 or a text file encoded in the current code page of the host operating system. If it passes none of these tests, we assume that it is not a file we can deal with and throw an appropriate exception.
How can I determine if a file is binary or text in c#?
I would probably look for an abundance of control characters which would typically be present in a binary file but rarely in an text file. Binary files tend to use 0 enough that just testing for many 0 bytes would probably be sufficient to catch most files. If you care about localization you'd need to test multi-byte patterns as well.
As stated though, you can always be unlucky and get a binary file that looks like text or vice versa.
How to check whether a file is empty or not
>>> import os
>>> os.stat("file").st_size == 0
True
Related Topics
Can You Make Multiple "If" Conditions in Python
Why Does Numpy.Zeros Takes Up Little Space
How to Download a File from Google Drive Using Python and the Drive API V3
Use Index in Pandas to Plot Data
Make Part of a Matplotlib Title Bold and a Different Color
How to Reverse a Dictionary That Has Repeated Values
How to Update SQLalchemy Row Entry
Suppressing Scientific Notation in Pandas
Use Scikit-Learn to Classify into Multiple Categories
Django Post_Save() Signal Implementation
"Permission Denied" Trying to Run Python on Windows 10
Detect Tap with Pyaudio from Live Mic
Python and Operator on Two Boolean Lists - How
Vectorized Numpy Linspace for Multiple Start and Stop Values