C#: Class for decoding Quoted-Printable encoding?
There is functionality in the framework libraries to do this, but it doesn't appear to be cleanly exposed. The implementation is in the internal class System.Net.Mime.QuotedPrintableStream
. This class defines a method called DecodeBytes
which does what you want. The method appears to be used by only one method which is used to decode MIME headers. This method is also internal, but is called fairly directly in a couple of places, e.g., the Attachment.Name
setter. A demonstration:
using System;
using System.Net.Mail;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Attachment attachment = Attachment.CreateAttachmentFromString("", "=?iso-8859-1?Q?=A1Hola,_se=F1or!?=");
Console.WriteLine(attachment.Name);
}
}
}
Produces the output:
¡Hola,_señor!
You may have to do some testing to ensure carriage returns, etc are treated correctly although in a quick test I did they seem to be. However, it may not be wise to rely on this functionality unless your use-case is close enough to decoding of a MIME header string that you don't think it will be broken by any changes made to the library. You might be better off writing your own quoted-printable decoder.
How to decode the email quoted printable encoded in C#
Try the below Snippet to decode Quoted Printable encoding
class Program
{
public static string DecodeQuotedPrintable(string input, string charSet)
{
Encoding enc;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, new String(hexChar));
}
catch
{ ;}
}
input = input.Replace("?=", "");
return input;
}
static void Main(string[] args)
{
string sData = @"*** Hello, World *** =0D=0AURl: http://www.example.com?id=3D=27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello@test.com=0D=0A=0D=0A";
Console.WriteLine(DecodeQuotedPrintable(sData,"utf-8"));
Console.ReadLine();
}
}
Running code is avaliable in dotnetfiddle
Taken the Snippet from this link
Decoding Quoted printable message
try this
var str = Decode(inp, "Shift_JIS");
or
var str = Decode(inp, "sjis");
UTF8 (Quoted Printable) conversion in C# question
That's not UTF-8. That's quoted printable, which quite isn't the same sort of encoding as UTF-8 - it's more an "ASCII text to Unicode text" encoding.
Quoted printable will effectively allow you to convert the ASCII message into a byte array which can then be decoded as UTF-8.
I'm not sure whether there's any direct support in .NET for quoted printable encoding, which is somewhat bizarre... I may well have missed something.
How to decode a quoted printable e-mail header (with MimeKit)
When you parse a message with MimeMessage.Load(), you don't need to decode headers because MimeKit will do it for you.
Secondly, your example header is not encoded using quoted-printable, it's encoded using rfc2047 tokens which would need to be decoded using Rfc2047.DecodeText():
var decoded = Rfc2047.DecodeText (Encoding.ASCII.GetBytes ("Hello =?UTF-8?B?TmFtZSDDpMO2w7w=?= more words"));
Decode quoted printable correct
The text you’re trying to decode is typically found in MIME headers, and is encoded according to the specification defined in the following Internet standard: RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text.
There is a sample implementation for such a decoder on GitHub; maybe you can draw some ideas from it: RFC2047 decoder in C#.
You can also use this online tool for comparing your results: Online MIME Headers Decoder.
Note that your sample text is incorrect. The specification declares:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
Per the specification, any encoded word must end in ?=
. Thus, your sample must be corrected from:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=
…to (scroll to the far right):
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt?=
Strictly speaking, your sample is also invalid because it exceeds the 75-character limit imposed on any encoded word; however, most decoders tend to be tolerant of this non-conformity.
Decoding Base64 / Quoted Printable encoded UTF8 string
This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.
This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.
Consecutive control characters in Quoted Printable not decoding correctly
Here is a piece of code I found on SO looking for quoted printable :
private static string Decode(string input, string bodycharset)
{
var i = 0;
var output = new List<byte>();
while (i < input.Length)
{
if (input[i] == '=' && input[i + 1] == '\r' && input[i + 2] == '\n')
{
//Skip
i += 3;
}
else if (input[i] == '=')
{
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
}
else
{
output.Add((byte)input[i]);
i++;
}
}
if (String.IsNullOrEmpty(bodycharset))
return Encoding.UTF8.GetString(output.ToArray());
else
return Encoding.GetEncoding(bodycharset).GetString(output.ToArray());
}
Source : Decoding Quoted printable message
Decode("Elke=E2=80=99s motto", "utf-8")
-> Elke’s motto
How to Decode =?utf-8?B?...?= to string in C#
Let's have a look at the meaning of the MIME encoding:
=?utf-8?B?...something...?=
^ ^
| +--- The bytes are Base64 encoded
|
+---- The string is UTF-8 encoded
So, to decode this, take the ...something...
out of your string (2LPZhNin2YU=
in your case) and then
reverse the Base64 encoding
var bytes = Convert.FromBase64String("2LPZhNin2YU=");
interpret the bytes as a UTF8 string
var text = Encoding.UTF8.GetString(bytes);
text
should now contain the desired result.
A description of this format can be found in Wikipedia:
- http://en.wikipedia.org/wiki/MIME#Encoded-Word
Related Topics
Select Multiple Items from a Datagrid in an Mvvm Wpf Project
Is Using a Mutex to Prevent Multiple Instances of the Same Program from Running Safe
How to Force All Referenced Assemblies to Be Loaded into the App Domain
C# Export Private/Public Rsa Key from Rsacryptoserviceprovider to Pem String
Does .Net Have a Way to Check If List a Contains All Items in List B
Merging Multiple PDFs Using Itextsharp in C#.Net
ASP.NET Core 1.0 on Iis Error 502.5
Multipart Forms from C# Client
Difference Between Forward Slash (/) and Backslash (\) in File Path
Hosting External App in Wpf Window
Fastest Way to Serialize and Deserialize .Net Objects
Swap Two Variables Without Using a Temporary Variable
Is Ruby's Code Block Same as C#'s Lambda Expression
Asp.Net: Invalid Postback or Callback Argument