.Rar, .Zip Files Mime Type

.rar, .zip files MIME Type

The answers from freedompeace, Kiyarash and Sam Vloeberghs:

.rar    application/vnd.rar, application/x-rar-compressed, application/octet-stream
.zip application/zip, application/octet-stream, application/x-zip-compressed, multipart/x-zip

I would do a check on the file name too. Here is how you could check if the file is a RAR or ZIP file. I tested it by creating a quick command line application.

<?php

if (isRarOrZip($argv[1])) {
echo 'It is probably a RAR or ZIP file.';
} else {
echo 'It is probably not a RAR or ZIP file.';
}

function isRarOrZip($file) {
// get the first 7 bytes
$bytes = file_get_contents($file, FALSE, NULL, 0, 7);
$ext = strtolower(substr($file, - 4));

// RAR magic number: Rar!\x1A\x07\x00
// http://en.wikipedia.org/wiki/RAR
if ($ext == '.rar' and bin2hex($bytes) == '526172211a0700') {
return TRUE;
}

// ZIP magic number: none, though PK\003\004, PK\005\006 (empty archive),
// or PK\007\008 (spanned archive) are common.
// http://en.wikipedia.org/wiki/ZIP_(file_format)
if ($ext == '.zip' and substr($bytes, 0, 2) == 'PK') {
return TRUE;
}

return FALSE;
}

Notice that it still won't be 100% certain, but it is probably good enough.

$ rar.exe l somefile.zip
somefile.zip is not RAR archive

But even WinRAR detects non RAR files as SFX archives:

$ rar.exe l somefile.srr
SFX Volume somefile.srr

ZIP file content type for HTTP request

.zip    application/zip, application/octet-stream

PHP allowed zip mimetypes

Never trust the mime type, this can be easily spoofed by the client. They could submit an exe and give it a mime type of text/plain if they wanted to.

All zip files begin with a standard local file header signature (0x04034b50) so you could check that the first 4 bytes of the file match the zip signature bytes. See the PKZIP Appnote for more details.

If you have the zip extension enabled, you can go even further and attempt to open and read the zip to make sure it is a fully valid zip file.

Something like this works well:

$zip = zip_open('/path/to/file.zip');
if (is_int($zip)) {
echo "Error $zip encountered reading the file, is it a valid zip?";
} else {
echo "Thanks for uploading a valid zip file!";
}

zip_open returns a resource if opened successfully, otherwise an integer representing the error that occurred reading the file.

EDIT: To elaborate on some of your questions:

About application/octet-stream: This is as you said, a very generic type. This just means any file that contains 8-bit data which is basically everything and anything. application/zip is the de-facto standard mime-type, but some clients will use other values as you have discovered. Also given the fact that a client can easily spoof any file type to use application/zip I wouldn't rely on $_FILES['fileatt']['type'] since it can be anything.

AFIK, mime_content_type() simply looks at the file extension and maps it to a mime type from a mime.types file on the system or built into PHP. If someone put a .zip extension on an exe file it would still register as application/zip. I beleive certain extensions may examine the file header.

Zip::open() returns TRUE if the file was opened successfully, or an integer error code. Therefore, == will give you a false positive on an error because any non-zero integer will evaluate to true using == since it will cast a non-zero integer to TRUE. If you are going to check the return from Zip::open you should always use $res === true in order to check for success. You can find the meanings of the error codes here in the comment at the bottom of the page.

Bottom Line: Since you said you are already extracting the zip, it may be less of a bother to validate based on the mime type, but instead it would be easier to just attempt to open the file and go based on the return value of open. If it returns true, you can figure the file is a valid zip (there could of course be errors later in the file, but they at least uploaded something resembling a zip file).

Hope that helps you out.

Mime type missing for .rar and .tar

JQuery just wraps the underlying File API used in most browsers, so there is no difference how JQuery and Javascript handle files and mime types. Here is the File API spec:

http://www.w3.org/TR/FileAPI/#dfn-type

The File object that you are manipulating inherits the type property from the Blob object, and the browser uses the blob (byte array) to determine the mime type.

To accomplish that task each browser implements a file sniffing algorithm to "read" the mime type from the byte array, and if the mime type doesn't match, it will return an empty string like in your scenario above.

Here is the full algorithm spec:

https://mimesniff.spec.whatwg.org/

So now you are wondering why it doesn't work for TAR, ZIP and RAR files, and why does it work for some people and not for you?.. because the file sniffing algorithm is evidently not perfect.

It uses byte pattern matching, and that seems not reliable enough.

For example i have used WinRaR on my windows 8 box to compress a file, and the initial bytes of the created file are:

52 61 72 21 1A 07 00

However, to recognize it as .RAR the browser byte pattern matching algorithm expects

52 61 72 20 1A 07 00

As you see there is a slight difference, and when i uploaded my RAR file to the browser using your code above, Firefox wasn't able to recognize the Mime-Type, and i got an empty string in the type property.

However, when i packed a ZIP file using WinRar on the same machine with default settings it generates an initial byte array sequence of 50 4B 03 04 that matched with the zip byte pattern expected by the algorithm, and when i used your code above it was able to detect the mime type correctly as application/zip!

So as you see from my explanation, it is a matter of serialization, and the "imperfection" of the algorithm that matches the serialized bytes with mime extensions in the browsers.

Based on everything mentioned above, i would recommend NOT relying on the mime sniffing, and instead use your custom code to determine the mime type OR existing libraries. You can use a server-side or a client-side approach.

If you want to stick to the client you could use the following JS library:

https://github.com/rsdoiel/mimetype-js

And then discovering the mime type would be a matter of one line of code:

mimetype.lookup("myfile.rar")

Here is a working Fiddle, upgrading your example to use mimetype js:

http://jsfiddle.net/jd8h7wvs/4/



Related Topics



Leave a reply



Submit