How to Find the Mime Type of a File in Python

How to find the mime type of a file in python?

The python-magic method suggested by toivotuo is outdated. Python-magic's current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.

# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'

python - How to get mimetypes from file, read metadata

(Answered here: How to find the mime type of a file in python?)


The how

Do this:

>>> pip install python-magic
>>> import magic
>>> mime = magic.Magic(mime=True)
>>> mime.from_file("testdata/test.pdf")

The why

The "mimetypes" library isn't very good, (it's unreliable). The "none" is that the specified file isn't recognized as a known filetype, (an extension don't a fileype make).


Hope this solves your issue and answers your question

How to detect mime type of files in python?

You can use the Python built-in mimetypes module for this. I believe this module relies solely on the file name and not its contents.

What is the mime type for .py files?

It should be application/x-python-code or text/x-python.

How do I find mime-type in Python

You have two options. If your lucky the client can determine the mimetype of the file and it can be included in the form post. Usually this is with the value of the an input element whose name is "filetype" or something similar.

Otherwise you can guess the mimetype from the file extension on the server. This is somewhat dependent on how up-to-date the mimetypes module is. Note that you can add types or override types in the module. Then you use the "guess_type" function that interprets the mimetype from the filename's extension.

import mimetypes
mimetypes.add_type('video/webm','.webm')

...

mimetypes.guess_type(filename)

UPDATE: If I remember correctly you can get the client's interpretation of the mimetype from the "Content-Type" header. A lot of the time this turns out to be 'application/octet-stream' which is almost useless.

So assuming your using the cgi module, and you're uploading files with the usual multipart form, the browser is going to guess the mimetype for you. It seems to do a decent job of it, and it gets passed through to the form.type parameter. So you can do something like this:

import cgi
form = cgi.FieldStorage()
files_types = {};
if form.type == 'multipart/form-data':
for part in form.keys():
files_types[form[part].filename] = form[part].type
else:
files_types[form.filename] = form.type

How to check type of files without extensions?

There are Python libraries that can recognize files based on their content (usually a header / magic number) and that don't rely on the file name or extension.

If you're addressing many different file types, you can use python-magic. That's just a Python binding for the well-established magic library. This has a good reputation and (small endorsement) in the limited use I've made of it, it has been solid.

There are also libraries for more specialized file types. For example, the Python standard library has the imghdr module that does the same thing just for image file types.

If you need dependency-free (pure Python) file type checking, see filetype.

How to know MIME-type of a file from base64 encoded data in python?

In the general case, there is no way to reliably identify the MIME type of a piece of untagged data.

Many file formats have magic markers which can be used to determine the type of the file with reasonable accuracy, but some magic markers are poorly chosen and might e.g. coincide with text in unrelated files; and of course, a completely random sequence of bits is not in any well-defined file format.

libmagic is the central component of the file command which is commonly used to perform this task. There are several Python bindings but https://pypi.org/project/python-libmagic/ seems to be the most popular and active.

Of course, base64 is just a way to encode untyped binary data. Here's a quick demo with your sample data.

import base64

import magic

encoded_data = '/9j/4AAQSkZJRgABAQEASABIAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gOTUK/9sAQwAGBAUGBQQGBgUGBwcGCAoQCgoJCQoUDg8MEBcUGBgXFB==='
with magic.Magic() as m:
print(m.from_buffer(base64.b64decode(encoded_data)))

Output:

image/jpeg

(Notice I had to fix the padding at the end of your encoded_data.)



Related Topics



Leave a reply



Submit