Alternative to Findmimefromdata Method in Urlmon.Dll One Which Has More Mime Types

Alternative to FindMimeFromData method in Urlmon.dll one which has more MIME types

UPDATE: @GetoX has taken this code and wrapped it in a NuGet package for .net core! See below, cheers!!

So I was wondering if anyone could point me to another method with
more MIME types, or alternatively another method / class were I would
be able to include the MIME types I see fit.

I use a hybrid of Winista and URLMon to detect the real format of files uploaded..

Winista MIME Detection

Say someone renames a exe with a jpg extension, you can still determine the "real" file format using Binary Analysis. It doesn't detect swf's or flv's but does pretty much every other well known format + you can get a hex editor and add more files it can detect.

File Magic

Winista detects the real MIME type using an XML file "mime-type.xml" that contains information about file types and the signatures used to identify the content type.eg:

<!--
! Audio primary type
! -->

<mime-type name="audio/basic"
description="uLaw/AU Audio File">
<ext>au</ext><ext>snd</ext>
<magic offset="0" type="byte" value="2e736e64000000"/>
</mime-type>

<mime-type name="audio/midi"
description="Musical Instrument Digital Interface MIDI-sequention Sound">
<ext>mid</ext><ext>midi</ext><ext>kar</ext>
<magic offset="0" value="MThd"/>
</mime-type>

<mime-type name="audio/mpeg"
description="MPEG Audio Stream, Layer III">
<ext>mp3</ext><ext>mp2</ext><ext>mpga</ext>
<magic offset="0" value="ID3"/>
</mime-type>

When Winista fail's to detect the real file format, I've resorted back to the URLMon method:

public class urlmonMimeDetect
{
[DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
private extern static System.UInt32 FindMimeFromData(
System.UInt32 pBC,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
[MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
System.UInt32 cbSize,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
System.UInt32 dwMimeFlags,
out System.UInt32 ppwzMimeOut,
System.UInt32 dwReserverd
);

public string GetMimeFromFile(string filename)
{
if (!File.Exists(filename))
throw new FileNotFoundException(filename + " not found");

byte[] buffer = new byte[256];
using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
if (fs.Length >= 256)
fs.Read(buffer, 0, 256);
else
fs.Read(buffer, 0, (int)fs.Length);
}
try
{
System.UInt32 mimetype;
FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
System.IntPtr mimeTypePtr = new IntPtr(mimetype);
string mime = Marshal.PtrToStringUni(mimeTypePtr);
Marshal.FreeCoTaskMem(mimeTypePtr);
return mime;
}
catch (Exception e)
{
return "unknown/unknown";
}
}
}

From inside the Winista method, I fall back on the URLMon here:

   public MimeType GetMimeTypeFromFile(string filePath)
{
sbyte[] fileData = null;
using (FileStream srcFile = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
byte[] data = new byte[srcFile.Length];
srcFile.Read(data, 0, (Int32)srcFile.Length);
fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
}

MimeType oMimeType = GetMimeType(fileData);
if (oMimeType != null) return oMimeType;

//We haven't found the file using Magic (eg a text/plain file)
//so instead use URLMon to try and get the files format
Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect urlmonMimeDetect = new Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect();
string urlmonMimeType = urlmonMimeDetect.GetMimeFromFile(filePath);
if (!string.IsNullOrEmpty(urlmonMimeType))
{
foreach (MimeType mimeType in types)
{
if (mimeType.Name == urlmonMimeType)
{
return mimeType;
}
}
}

return oMimeType;
}

Wayback Machine link to the Winista utility from netomatix. AFAIK they found some "mime reader utility classes in open source Nutch crawler system" and they did a C# rewrite in the early 2000's.

I've hosted my MimeDetect project using Winista and the URLMon fall back here (please contribute new file types using a Hex editor):
https://github.com/MeaningOfLights/MimeDetect

You could also use the Registry method or .Net 4.5 method mentioned in this post linked to by Paul Zahra, but Winista is the best IMHO.

Enjoy knowing files on your systems are what they claim to be and not laden with malware!


UPDATE:

For desktop applications you may find the WindowsAPICodePack works better:

using Microsoft.WindowsAPICodePack.Shell;
using Microsoft.WindowsAPICodePack.Shell.PropertySystem;

private static string GetFilePropertyItemTypeTextValueFromShellFile(string filePathWithExtension)
{
var shellFile = ShellFile.FromFilePath(filePathWithExtension);
var prop = shellFile.Properties.GetProperty(PItemTypeTextCanonical);
return prop.FormatForDisplay(PropertyDescriptionFormatOptions.None);
}

urlmon.dll FindMimeFromData() works perfectly on 64bit desktop/console but generates errors on ASP.NET

If you used the pinvoke signature from the answer your linked, it's defined there like this:

[DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
private extern static System.UInt32 FindMimeFromData(
System.UInt32 pBC,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
[MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
System.UInt32 cbSize,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
System.UInt32 dwMimeFlags,
out System.UInt32 ppwzMimeOut,
System.UInt32 dwReserverd
);

I would rather use the defintion from pinvoke.net:

[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
static extern int FindMimeFromData(IntPtr pBC,
[MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
[MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.I1, SizeParamIndex=3)]
byte[] pBuffer,
int cbSize,
[MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
int dwMimeFlags,
out IntPtr ppwzMimeOut,
int dwReserved);

Note the difference in types for ppwzMimeOut and pBC parameter. In the former case, System.UInt32 is not a correct type for a 64-bit pointer under a 64-bit platform. For pBC, this is probably not an issue (as long as it is NULL), but it matters for ppwzMimeOut.

Refer to this implementation which appears to be correct.

Why does the FindMimeFromData function from Urlmon.dll return MIME type “application/octet-stream” for many file types?

Reading the documentation for FindMimeFromData lead me to MIME Type Detection in Internet Explorer. According to that information it is hard-coded to find 26 different MIME types, which is quite small in today's world. "audio/mp3" is not one of them.

FindMimeFromData contains hard-coded tests for (currently 26) separate MIME type (see Known MIME Types). This means that if a given buffer contains data in the format of one of these MIME types, a test exists in FindMimeFromData that is designed (by scanning through the buffer contents) to recognize the corresponding MIME type. A MIME type is known if it is one of these N MIME types. A MIME type is ambiguous if it is "text/plain," "application/octet-stream," an empty string, or null (that is, the server failed to provide it).

Unfortunately, it looks like FindMimeFromData won't be very useful for determining modern MIME types.

Why does FindMimeFromData recognize image/tiff on one host, but doesn't on another one?

I was unable to reproduce your problem, however I did some research on the subject. I believe that it is as you suspect, the problem is with step 2 of MIME Type Detection: the hard-coded tests in urlmon.dll v9 differ from those in urlmon.dll v8.

The Wikipedia article on TIFF shows how complex the format is and that is has been a problem from the very beginning:

When TIFF was introduced, its extensibility provoked compatibility problems. The flexibility in encoding gave rise to the joke that TIFF stands for Thousands of Incompatible File Formats.

The TIFF Compression Tag section clearly shows many rare compression schemes that, as I suspect, have been omitted while creating the urlmon.dll hard-coded tests in earlier versions of IE.

So, what can be done to solve this problem? I can think of three solutions, however each of them brings different kind of new problems along:

  1. Update the IE on your dev machine to version 9.
  2. Apply the latest IE 8 updates on your dev machine. It is well known that modified versions of urlmon.dll are introduced frequently (eg. KB974455). One of them may contain the updated MIME hard-coded tests.
  3. Distribute own copy of urlmon.dll with your application.

It seems that solutions 1 and 2 are the ones you should choose from. There may be a problem, however, with the production environment. As my experience shows the administrators of production env often disagree to install some updates for many reasons. It may be harder to convince an admin to update the IE to v9 and easier to install an IE8 KB update (as they are supposed to, but we all know how it is). If you're in control of the production env, I think you should go with solution 1.

The 3rd solution introduces two problems:

  • legal: It may be against the Microsoft's policies to distribute own copy of urlmon.dll
  • coding: you have to load the dll dynamically to call the FindMimeFromData function or at least customize your app's manifest file because of the Dynamic-Link Library Search Order. I assume you are aware, that it is a very bad idea just to manually copy a newer version of urlmon.dll to the system folder as other apps would most likely crash using it.

Anyway, good luck with solving your urlmon riddle.

Handler (MIME) for multimedia content not working

I think I get it now, when mimeDeclaration is empty or WRONG then you don't get the image download.

This occurs in your code because mime types for images aren't always "image/" plus the file extension:

context.Response.ContentType = "image/" + mimeDeclaration;

Eg for a .jpg image it's

image/jpeg

Otherwise it's probably because it's a tiff image in that case your else clause is setting mimeDeclaration back to an empty string.

Tip: detecting MIME types by file extension is less than ideal, check the way I do it here: Alternative to FindMimeFromData method in Urlmon.dll one which has more MIME types

Application pool crashes with URLMoniker urlmon.dll during MIME type checking

Your problem is declaration of "FindMimeFromData" method !
it works fine some where and it cause iis process crashes some where else. Take a look at: https://stackoverflow.com/a/18554243/4257500
You need to change declaration of "FindMimeFromData" as :

[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]static extern int FindMimeFromData(IntPtr pBC,
[MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
[MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.I1, SizeParamIndex=3)]
byte[] pBuffer,
int cbSize,
[MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
int dwMimeFlags,
out IntPtr ppwzMimeOut,
int dwReserved);

and also you should make some changes to call this function. for example:

 IntPtr mimeTypePtr;
FindMimeFromData(IntPtr.Zero, null,file, 256, null, 0,out mimeTypePtr, 0);
var mime = Marshal.PtrToStringUni(mimeTypePtr);

Using .NET, how can you find the mime type of a file based on the file signature not the extension

In Urlmon.dll, there's a function called FindMimeFromData.

From the documentation

MIME type detection, or "data sniffing," refers to the process of determining an appropriate MIME type from binary data. The final result depends on a combination of server-supplied MIME type headers, file extension, and/or the data itself. Usually, only the first 256 bytes of data are significant.

So, read the first (up to) 256 bytes from the file and pass it to FindMimeFromData.



Related Topics



Leave a reply



Submit