DOCX File type in PHP finfo_file is application/zip
As far as I now the vendor specific file types (vnd.) are not standardized (by any RFC) and therefore not covered by file_info(). .docx
is a zipped xml-format and thats the reason, why file_info()
returns application_zip
(what is completely right). You may unzip the file and test the mime-type of the result, but that will lead to xml
(what is completely correct too) and other files, that are used by the document. To differ between different XML formats file_info()
had to analyze its content and it must know, how it looks, what goes just to far.
Is it right that PHP's finfo returns application/zip MimeType for a .docx?
The Word Microsoft Office Open XML Format Document format consists of a bunch of XML and other files stored in a zip file (unzip it and see). So yes, this is correct.
Correct way to detect mime type in php
Based on this I've ported it to PHP:
function getMicrosoftOfficeMimeInfo($file) {
$fileInfo = array(
'word/' => array(
'type' => 'Microsoft Word 2007+',
'mime' => 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'extension' => 'docx'
),
'ppt/' => array(
'type' => 'Microsoft PowerPoint 2007+',
'mime' => 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'extension' => 'pptx'
),
'xl/' => array(
'type' => 'Microsoft Excel 2007+',
'mime' => 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'extension' => 'xlsx'
)
);
$pkEscapeSequence = "PK\x03\x04";
$file = new BinaryFile($file);
if ($file->bytesAre($pkEscapeSequence, 0x00)) {
if ($file->bytesAre('[Content_Types].xml', 0x1E)) {
if ($file->search($pkEscapeSequence, null, 2000)) {
if ($file->search($pkEscapeSequence, null, 1000)) {
$offset = $file->tell() + 26;
foreach ($fileInfo as $searchWord => $info) {
$file->seek($offset);
if ($file->bytesAre($searchWord)) {
return $fileInfo[$searchWord];
}
}
return array(
'type' => 'Microsoft OOXML',
'mime' => null,
'extension' => null
);
}
}
}
}
return false;
}
class BinaryFile_Exception extends Exception {}
class BinaryFile_Seek_Method {
const ABSOLUTE = 1;
const RELATIVE = 2;
}
class BinaryFile {
const SEARCH_BUFFER_SIZE = 1024;
private $handle;
public function __construct($file) {
$this->handle = fopen($file, 'r');
if ($this->handle === false) {
throw new BinaryFile_Exception('Cannot open file');
}
}
public function __destruct() {
fclose($this->handle);
}
public function tell() {
return ftell($this->handle);
}
public function seek($offset, $seekMethod = null) {
if ($offset !== null) {
if ($seekMethod === null) {
$seekMethod = BinaryFile_Seek_Method::ABSOLUTE;
}
if ($seekMethod === BinaryFile_Seek_Method::RELATIVE) {
$offset += $this->tell();
}
return fseek($this->handle, $offset);
} else {
return true;
}
}
public function read($length) {
return fread($this->handle, $length);
}
public function search($string, $offset = null, $maxLength = null, $seekMethod = null) {
if ($offset !== null) {
$this->seek($offset);
} else {
$offset = $this->tell();
}
$bytesRead = 0;
$bufferSize = ($maxLength !== null ? min(self::SEARCH_BUFFER_SIZE, $maxLength) : self::SEARCH_BUFFER_SIZE);
while ($read = $this->read($bufferSize)) {
$bytesRead += strlen($read);
$search = strpos($read, $string);
if ($search !== false) {
$this->seek($offset + $search + strlen($string));
return true;
}
if ($maxLength !== null) {
$bufferSize = min(self::SEARCH_BUFFER_SIZE, $maxLength - $bytesRead);
if ($bufferSize == 0) {
break;
}
}
}
return false;
}
public function getBytes($length, $offset = null, $seekMethod = null) {
$this->seek($offset, $seekMethod);
$read = $this->read($length);
return $read;
}
public function bytesAre($string, $offset = null, $seekMethod = null) {
return ($this->getBytes(strlen($string), $offset) == $string);
}
}
Usage:$info = getMicrosoftOfficeMimeInfo('hi.docx');
/*
Array
(
[type] => Microsoft Word 2007+
[mime] => application/vnd.openxmlformats-officedocument.wordprocessingml.document
[extension] => docx
)
*/
$info = getMicrosoftOfficeMimeInfo('hi.xlsx');
/*
Array
(
[type] => Microsoft Excel 2007+
[mime] => application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
[extension] => xlsx
)
*/
$info = getMicrosoftOfficeMimeInfo('hi.pptx');
/*
Array
(
[type] => Microsoft PowerPoint 2007+
[mime] => application/vnd.openxmlformats-officedocument.presentationml.presentation
[extension] => pptx
)
*/
$info = getMicrosoftOfficeMimeInfo('hi.zip');
// bool(false)
Is it right that PHP's finfo returns application/zip MimeType for a .docx?
The Word Microsoft Office Open XML Format Document format consists of a bunch of XML and other files stored in a zip file (unzip it and see). So yes, this is correct.
Uploading .docx using mime types
If you look at the implementation of CFileValidator::validateFile()
you'll notice that Yii will either use finfo_file()
(since PHP 5.3.0) or mime_content_type()
to find out the MIME type of your file.
finfo_open()
will usually use the bundled magic database in PHP. But you can override this by setting aMAGIC
environment variable as explained here.mime_content_type()
will use the magic file as specified in themime_magic.magicfile
configuration setting
Different file mime type detected for same file
The same file can have different and mulitple mime-types, that is totally normal.
Additionally the mime-type is only meta-information next to the file itself. Theoretically you can give any file any mime-type. That would not be very useful, but it works. It's just a concept.
The finfo
library will try to obtain the mime-type of a file "magically" by looking into the file trying to identify the format. Then it will return the mime-type according to it's database.
The mime-type within the request is given by the HTTP client. It might guess as well, but often it takes the value from information the underlying operating system is giving for that file.Why is it not returning the same as while in uploading?
As you can see with your issue that the more common the file-type is, the better it will match (the images).
However as pptx and docx files are actually zip-files, the finfo
library will identify those as application/zip
because the headers of those files (magic numbers) show that it is technically a zip file.
You should not expect that the mime-type ofIs there something wrong with my code or should I expect this?
finfo
matches the request header mime-type. Those are two different things.That depends. You can decide to trust the http header, you can decide to trustHow do I decide which file type it is then?
finfo
, you can decide to compare the file extensions as well and a combination of all three.Additionally you can decide to even add more. This entirely depends on what you do with the uploaded file.
Related Topics
How to Use Xpath and Dom to Replace a Node/Element in PHP
Check and Return Duplicates Array PHP
MySQL and Query to Satisfy on Same Column
PHP Considers Null Is Equal to Zero
PHP Uploading Files - Image Only Checking
PHP Dom Textcontent VS Nodevalue
Png Transparency Resize with Simpleimage.PHP Class
Composer:How to Add a Dependency Without Network Connection
Visits Counter Without Database with PHP
How to Access Elements in an Array Returned from a Function
Symfony: Form Issue Using Return Type Hinting in Doctrine Entity Methods
PHP Using Regex to Get Substring of a String
PHP Fatal Error: Call to Undefined Function Imagettftext()
Set Maximum Execution Time for Exec() Specifically