How to identify file type by Base64 encoded string of a image
I have solved my problem with using mimeType = URLConnection.guessContentTypeFromStream(inputstream);
{ //Decode the Base64 encoded string into byte array
// tokenize the data since the 64 encoded data look like this "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAoAAAAKAC"
String delims="[,]";
String[] parts = base64ImageString.split(delims);
String imageString = parts[1];
byte[] imageByteArray = Base64.decode(imageString );
InputStream is = new ByteArrayInputStream(imageByteArray);
//Find out image type
String mimeType = null;
String fileExtension = null;
try {
mimeType = URLConnection.guessContentTypeFromStream(is); //mimeType is something like "image/jpeg"
String delimiter="[/]";
String[] tokens = mimeType.split(delimiter);
fileExtension = tokens[1];
} catch (IOException ioException){
}
}
How can i check a base64 string is a file(what type?) or not?
Many filetypes have a header (the first few bytes of the file) with some fixed information by which a file can be identified as a gz, png, pdf, etc.
So every base64 encoded gz file would also start with a certain sequence of base64 characters, by which it can be recognized.
A gzip-file always starts with the two byte sequence 0x1f 0x1b, which in base64 encoding is H4
plus a third character in the range of s
to v
.
The reason is, that every base64 character represents 6 bits of the original bytes, so the two bytes 0x1f 0x1b
are encoded with two base64 characters (12 bits) plus the first 4 bits of the third character.
Based on that, I would say that's no base64 encoded gzip that you show there.
other examples are:
png
starts with:
0x89 0x50 0x4e 0x47 0x0d 0x0a 0x1a 0x0a
base64 encoded:
iVBORw0KGg...
jpg
starts with:
0xFF 0xD8 0xFF 0xD0
base64 encoded:
/9j/4...
gif
starts with:
GIF
base64 encoded:
R0lG
tif
a) little endian:
starts with:0x49 0x49 0x2A 0x00
base64 encoded:
SUkqA
b) big endian:
starts with:0x4D 0x4D 0x00 0x2A
base64 encoded:
TU0AK
flv
starts with
FLV
base64 encoded:
RkxW
wav/avi/webp and others
several audio/video/image/graphic -formats are base on RIFF(Resource Interchange Format)
The common part is that all files start withRIFF
base64 encoded:
UklGR
After the
RIFF
header, you'll find the specific format starting in the 4 bytes starting at the 9th byte.
In the following_
is used as a placeholder for any character.wav
starts with:RIFF____WAVE
base64 encoded:UklGR______XQVZF
webp
starts with:RIFF____WEBP
base64 encoded:UklGR______XRUJQ
avi
starts with:RIFF____AVI
base64 encoded:UklGR______BVkkg
Regarding the specific example in the question:
in the updated question there's a hint in the attached picture that
the data is first base32 encoded and then base64 encoded.
When we feed an online base32 decoder with the string given in the question (JA2HGSKBJI4DSZ2WGRAS...
), we get:
H4sIAJ89gV4A/+1ZURaEIAi8SkfQ+1/O3f7MtEBfMgz9rC/diXmIA5hSzun3HNdBbgbtVP2v/2+LowM837wFHKxZbmE9pQfsLOaiLAL8kvIk4MBma17ufHQbIJCXoWNZZKGPWB5QljvXIuXOmm0SgLixJw8HRC8Tbmz7x5eIspypaZHSWbj8cAhdjli2WUkR1sv2dZmwXhZlDnIcCl0GyrFX6fKkBEBTBsq+9uY2Ecug2Rf0xtaJlNdYJuxjP9kcd1LOW/fQXtb1sd3fSTGXFTx3UjfGFx6uJGjeIAAA
It starts with H4s
, so according to what I wrote about how to recognize file types in base64 encoding, it's a base64 encoded gzip file.
This can be saved in a text file and then uploaded on base64decode.org where it will be converted into a gzip file. When you download and open that gzip file it contains a file with text like this:
00110000 00110000 00110001 00110001 00110000 00110001 00110000 00110000 00100000 00110000 00110000 00110001 00110001 00110000 00110001 00110000 00110001 00100000 ...
Conclusion for this case: The original string/file is a gzip file that was first base64 encoded and the base64 encoded part was again encoded with base32.
How do I know file type encrypted in base64 string
In relation to your switch statement, the string for a WAV file would be "UklGR" and the string for an MP3 file would be "SUQzB".
These strings are the bytes of the file itself and so this string is essentially the first part of the file header.
Python, can someone guess the type of a file only by its base64 encoding?
You can't, at least not without decoding, because the bytes that help identify the filetype are spread across the base64 characters, which don't directly align with whole bytes. Each character encodes 6 bits, which means that for every 4 characters, there are 3 bytes encoded.
Identifying a filetype requires access to those bytes in different block sizes. A JPEG image for example, can be identified from the bytes FF D8 or FF D9, but that's two bytes; the third byte that follows must also be encoded as part of the 4-character block.
What you can do is decode just enough of the base64 string to do your filetype fingerprinting. So you can decode the first 4 characters to get the 3 bytes, and then use the first two to see if the object is a JPEG image. A large number of file formats can be identified from just the first or last series of bytes (a PNG image can be identified by the first 8 bytes, a GIF by the first 6, etc.). Decoding just those bytes from the base64 string is trivial.
Your sample is a PNG image; you can test for image types using the imghdr
module:
>>> import imghdr
>>> image_data = """iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="""
>>> sample = image_data[:44].decode('base64') # 33 bytes / 3 times 4 is 44 base64 chars
>>> for tf in imghdr.tests:
... res = tf(sample, None)
... if res:
... break
...
>>> print res
png
I only used the first 33 bytes from the base64 data, to echo what the imghdr.what()
function will read from the file you pass it (it reads 32 bytes, but that number doesn't divide by 3).
There is an equivalent soundhdr
module, and there is also the python-magic
project that lets you pass in a number of bytes to determine a file type.
How to know MIME-type of a file from base64 encoded data in python?
In the general case, there is no way to reliably identify the MIME type of a piece of untagged data.
Many file formats have magic markers which can be used to determine the type of the file with reasonable accuracy, but some magic markers are poorly chosen and might e.g. coincide with text in unrelated files; and of course, a completely random sequence of bits is not in any well-defined file format.
libmagic
is the central component of the file
command which is commonly used to perform this task. There are several Python bindings but https://pypi.org/project/python-libmagic/ seems to be the most popular and active.
Of course, base64 is just a way to encode untyped binary data. Here's a quick demo with your sample data.
import base64
import magic
encoded_data = '/9j/4AAQSkZJRgABAQEASABIAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gOTUK/9sAQwAGBAUGBQQGBgUGBwcGCAoQCgoJCQoUDg8MEBcUGBgXFB==='
with magic.Magic() as m:
print(m.from_buffer(base64.b64decode(encoded_data)))
Output:
image/jpeg
(Notice I had to fix the padding at the end of your encoded_data
.)
Javascript - get extension from base64 image
For a String
(which you can parse out of an image) you can do this:
// Create Base64 Object
var Base64={_keyStr:"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",encode:function(e){var t="";var n,r,i,s,o,u,a;var f=0;e=Base64._utf8_encode(e);while(f<e.length){n=e.charCodeAt(f++);r=e.charCodeAt(f++);i=e.charCodeAt(f++);s=n>>2;o=(n&3)<<4|r>>4;u=(r&15)<<2|i>>6;a=i&63;if(isNaN(r)){u=a=64}else if(isNaN(i)){a=64}t=t+this._keyStr.charAt(s)+this._keyStr.charAt(o)+this._keyStr.charAt(u)+this._keyStr.charAt(a)}return t},decode:function(e){var t="";var n,r,i;var s,o,u,a;var f=0;e=e.replace(/[^A-Za-z0-9\+\/\=]/g,"");while(f<e.length){s=this._keyStr.indexOf(e.charAt(f++));o=this._keyStr.indexOf(e.charAt(f++));u=this._keyStr.indexOf(e.charAt(f++));a=this._keyStr.indexOf(e.charAt(f++));n=s<<2|o>>4;r=(o&15)<<4|u>>2;i=(u&3)<<6|a;t=t+String.fromCharCode(n);if(u!=64){t=t+String.fromCharCode(r)}if(a!=64){t=t+String.fromCharCode(i)}}t=Base64._utf8_decode(t);return t},_utf8_encode:function(e){e=e.replace(/\r\n/g,"\n");var t="";for(var n=0;n<e.length;n++){var r=e.charCodeAt(n);if(r<128){t+=String.fromCharCode(r)}else if(r>127&&r<2048){t+=String.fromCharCode(r>>6|192);t+=String.fromCharCode(r&63|128)}else{t+=String.fromCharCode(r>>12|224);t+=String.fromCharCode(r>>6&63|128);t+=String.fromCharCode(r&63|128)}}return t},_utf8_decode:function(e){var t="";var n=0;var r=c1=c2=0;while(n<e.length){r=e.charCodeAt(n);if(r<128){t+=String.fromCharCode(r);n++}else if(r>191&&r<224){c2=e.charCodeAt(n+1);t+=String.fromCharCode((r&31)<<6|c2&63);n+=2}else{c2=e.charCodeAt(n+1);c3=e.charCodeAt(n+2);t+=String.fromCharCode((r&15)<<12|(c2&63)<<6|c3&63);n+=3}}return t}}
// Define the string, also meaning that you need to know the file extension
var encoded = "Base64 encoded image returned from your service";
// Decode the string
var decoded = Base64.decode(encoded);
console.log(decoded);
// if the file extension is unknown
var extension = undefined;
// do something like this
var lowerCase = decoded.toLowerCase();
if (lowerCase.indexOf("png") !== -1) extension = "png"
else if (lowerCase.indexOf("jpg") !== -1 || lowerCase.indexOf("jpeg") !== -1)
extension = "jpg"
else extension = "tiff";
// and then to display the image
var img = document.createElement("img");
img.src = decoded;
// alternatively, you can do this
img.src = "data:image/" + extension + ";base64," + encoded;
For completion's sake here's the source and I hope this helps!
Retrieve MIME type from Base64 encoded String
In general, a base 64-encoded string could contain absolutely any data, so there is no way to know its file type.
To determine if it is an instance of a JPEG image, you'd need to base64-decode it, and then do something like checking its magic number, which is useful in telling you what the file isn't. You'd still need to do more work to determine if it is a valid JPEG image.
How to find file extension of base64 encoded image in Python
It is best practices to examine the file's contents rather than rely on something external to the file. Many emails attacks, for example, rely on mis-identifying the mime type so that an unsuspecting computer executes a file that it shouldn't. Fortunately, most image file extensions can be determined by looking at the first few bytes (after decoding the base64). Best practices, though, might be to use file magic which can be accessed via a python packages such as this one or this one.
Most image file extensions are obvious from the mimetype. For gif, pxc, png, tiff, and jpeg, the file extension is just whatever follows the 'image/' part of the mime type. To handle the obscure types also, python does provide a standard package:
>>> from mimetypes import guess_extension
>>> guess_extension('image/x-corelphotopaint')
'.cpt'
>>> guess_extension('image/png')
'.png'
Related Topics
How to Mock a Rest Template Exchange
How to Close a Javafx Application on Window Close
Convert Localdatetime to Localdatetime in Utc
Http 415 Unsupported Media Type Error With Json
How to Valid @Requestheader in Spring Boot
How Is This Wrong - Hackerrank Loop in Java
How to Import Two Classes With the Same Name in Different Packages
Spring Security - 405 Request Method 'Post' Not Supported
Splitting Data Inside Quotes and Comma Using Regex
How Many Times Will This Loop Execute
How to Connect Java Backend With Html/Css Frontend
How to Check Whether a Field Exists or Not in Mongodb
Reverse a String Without Affecting Special Characters
Set Layout Width Percentage of the Total Screen Width
Extract Text Br Tags in Selenium Java
Java - How to Find Students With Their Highest Marks Writing a Method in a Student Class
Getresourceasstream Returns Null When Reading Properties File