Why am I Getting Mime-Type of .CSV File as "Application/Octet-Stream"

Why am I getting mime-type of .csv file as application/octet-stream ?

In times like these, the official HTTP specification is always helpful. From RFC 2616 7.2.1 (my emphasis added):

Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream".

The cause of your issue is that the server accepting the file upload does not itself know what type of file has been uploaded. Why? Because it relies on the the HTTP message which sent the file to specify a Content-Type header to determine the exact mime-type. The browser has likely not sent a Content-Type header and the server has assumed application/octet-stream as per the official HTTP specification excerpt above. It's also possible that the client uploading the file opted not to determine the mime type of the file it was uploading and sent the Content-Type: application/octet-stream header itself.

Now, when we consider this in conjunction with the PHP manual entry regarding POST file uploadsdocs, we see the following:

$_FILES['userfile']['type']

The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.

So as you can see, even if $_FILES['userfile']['type'] is specified, it only corresponds to the Content-Type header sent by the client. This information can easily be faked and should not be relied upon. If you need to be sure that the uploaded file is of a specific type, you'll have to verify that yourself.

Is application/octet-stream a safe MIME type to accept when accepting CSV files?

That is correct, application/octet-stream is a generic MIME type.

You could check whether the file has the CSV extension and use the function fgetcsv() to determine whether the content of the file is valid. This function will return NULL or boolean false if there are problems reading the file as CSV.

application/octet-stream' instead of application/csv?

I assume that I defined something in the wrong way when reading in the data

No, you didn't. The Content-Type header is supposed to indicate what the response body is, but there is nothing you can do to force the server to set that to a value you expect. Some servers are just badly configured and don't play along.

application/octet-stream is the most generic content type of them all - it gives you no more info than "it's a bunch of bytes, have fun".

What's more, there isn't necessarily One True Type for each kind of content, only more-or-less widely agreed-upon conventions. For CSV, a common one would be text/csv.

So if you're sure what the content is, feel free to ignore the Content-Type header.

import requests

url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
response = requests.get(url)

filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f:
f.write(response.content)

Writing to file in binary mode is a good idea in the absence of any further information, because this will retain the original bytes exactly as they were.


In order to convert that to string, it needs to be decoded using a certain encoding. Since the Content-Type did not give any indication here (it could have said Content-Type: text/csv; charset=XYZ), the best first assumption for data from the Internet would be UTF-8:

import csv

filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
print(row)

Should that turn out to be wrong (i.e. there are decoding errors or garbled characters), you can try a different encoding until you find one that works. That would not be possible if you had written the file in text mode in the beginning, as any data corruption from wrong decoding would have made it into the file.

Setting content-type, difference between octet-stream and text/csv

Some browsers, particularly IE, completely ignore the content type header, and open the response in an application according to the file extension. In those browsers, it probably doesn't matter.

However, text/csv is the preferred content type, and should work properly with all browsers. "application/octet-stream" is very generic, and does not give any hint as to what type of application should be used to open the result.

If you were returning an MS Excel file, for instance, you would use application/vnd.ms-excel to be more specific. Since CSV is not tied to one particular application, text/csv is preferred.

Why does Laravel's getMimeType() method identify a file as application/octet-stream when the file has the type attribute of audio/mpeg ?

The UploadedFile object is ultimately extended from Symfony\Component\HttpFoundation\File\UploadedFile which get/sets the mimeType from The type of the file as provided by PHP.

To access that mimeType you would need to call $file->getClientMimeType()

However in the Symfony docblock for the function it suggests:

The client mime type is extracted from the request from which the file was uploaded, so it should not be considered as a safe value.

For a trusted mime type, use getMimeType() instead (which guesses the mime type based on the file content).

In your case however $file->getMimeType() which should be trusted and guesses the mime type from the contents, however it returns something as if it cannot determine the mime type, being "application/octet-stream"

Additional information

To help you decide. Basically getClientMimeType() would return the mime type that was set by the browser.

The getMimeType call guesses the mime type using two different techniques that I can see:

  1. Using a binary mime type technique looking at the output of the following command file -b --mime %s 2>/dev/null if it is supported.

  2. The second technique is using the finfo_open command if it does exist inside php.

If both 1. and 2. exists on your system, from what I see 2. will take preference, and 1. will be the fallback.

I personally would favour the results from getMimeType() for security. However, it would be another interesting question to ask "How reliable is browser mime type detection, and what techniques are used" :-)

Updated example

I include an example for you.

For me doing a check on a "DropboxInstalled.dmg", here are my results:

  1. using file -b --mime DropboxInstaller.dmg from a commandline (terminal) returns application/octet-stream

  2. using finfo_open functionality

$finfo = new \finfo(FILEINFO_MIME_TYPE);
echo $finfo->file('./DropboxInstaller.dmg');

returns application/x-iso9660-image

PHP file upload on server changed mime-type to application/octet-stream

The problem was that there wasn't fileinfo extension in PHP on the server.
After activating all works fine.

How to use the CSV MIME-type?

You could try to force the browser to open a "Save As..." dialog by doing something like:

header('Content-type: text/csv');
header('Content-disposition: attachment;filename=MyVerySpecial.csv');
echo "cell 1, cell 2";

Which should work across most major browsers.



Related Topics



Leave a reply



Submit