C#: Class for Decoding Quoted-Printable Encoding

C#: Class for decoding Quoted-Printable encoding?

There is functionality in the framework libraries to do this, but it doesn't appear to be cleanly exposed. The implementation is in the internal class System.Net.Mime.QuotedPrintableStream. This class defines a method called DecodeBytes which does what you want. The method appears to be used by only one method which is used to decode MIME headers. This method is also internal, but is called fairly directly in a couple of places, e.g., the Attachment.Name setter. A demonstration:

using System;
using System.Net.Mail;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Attachment attachment = Attachment.CreateAttachmentFromString("", "=?iso-8859-1?Q?=A1Hola,_se=F1or!?=");
Console.WriteLine(attachment.Name);
}
}
}

Produces the output:

¡Hola,_señor!

You may have to do some testing to ensure carriage returns, etc are treated correctly although in a quick test I did they seem to be. However, it may not be wise to rely on this functionality unless your use-case is close enough to decoding of a MIME header string that you don't think it will be broken by any changes made to the library. You might be better off writing your own quoted-printable decoder.

How to decode the email quoted printable encoded in C#

Try the below Snippet to decode Quoted Printable encoding

class Program
{

public static string DecodeQuotedPrintable(string input, string charSet)
{
Encoding enc;

try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}

var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);

foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, new String(hexChar));
}
catch
{ ;}
}
input = input.Replace("?=", "");

return input;
}

static void Main(string[] args)
{
string sData = @"*** Hello, World *** =0D=0AURl: http://www.example.com?id=3D=27a9dca9-5d61-477c-8e73-a76666b5b1bf=0D=0A=0D=0A
Name: Hello World=0D=0A
Phone: 61234567890=0D=0A
Email: hello@test.com=0D=0A=0D=0A";

Console.WriteLine(DecodeQuotedPrintable(sData,"utf-8"));
Console.ReadLine();
}

}

Running code is avaliable in dotnetfiddle

Taken the Snippet from this link

Decoding Quoted printable message

try this

var str = Decode(inp, "Shift_JIS");

or

var str = Decode(inp, "sjis");

UTF8 (Quoted Printable) conversion in C# question

That's not UTF-8. That's quoted printable, which quite isn't the same sort of encoding as UTF-8 - it's more an "ASCII text to Unicode text" encoding.

Quoted printable will effectively allow you to convert the ASCII message into a byte array which can then be decoded as UTF-8.

I'm not sure whether there's any direct support in .NET for quoted printable encoding, which is somewhat bizarre... I may well have missed something.

How to decode a quoted printable e-mail header (with MimeKit)

When you parse a message with MimeMessage.Load(), you don't need to decode headers because MimeKit will do it for you.

Secondly, your example header is not encoded using quoted-printable, it's encoded using rfc2047 tokens which would need to be decoded using Rfc2047.DecodeText():

var decoded = Rfc2047.DecodeText (Encoding.ASCII.GetBytes ("Hello =?UTF-8?B?TmFtZSDDpMO2w7w=?= more words"));

Decode quoted printable correct

The text you’re trying to decode is typically found in MIME headers, and is encoded according to the specification defined in the following Internet standard: RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text.

There is a sample implementation for such a decoder on GitHub; maybe you can draw some ideas from it: RFC2047 decoder in C#.

You can also use this online tool for comparing your results: Online MIME Headers Decoder.

Note that your sample text is incorrect. The specification declares:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

Per the specification, any encoded word must end in ?=. Thus, your sample must be corrected from:

=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=

…to (scroll to the far right):

=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt?=

Strictly speaking, your sample is also invalid because it exceeds the 75-character limit imposed on any encoded word; however, most decoders tend to be tolerant of this non-conformity.

Decoding Base64 / Quoted Printable encoded UTF8 string

This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.

This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.

Consecutive control characters in Quoted Printable not decoding correctly

Here is a piece of code I found on SO looking for quoted printable :

private static string Decode(string input, string bodycharset)
{
var i = 0;
var output = new List<byte>();
while (i < input.Length)
{
if (input[i] == '=' && input[i + 1] == '\r' && input[i + 2] == '\n')
{
//Skip
i += 3;
}
else if (input[i] == '=')
{
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
}
else
{
output.Add((byte)input[i]);
i++;
}
}
if (String.IsNullOrEmpty(bodycharset))
return Encoding.UTF8.GetString(output.ToArray());
else
return Encoding.GetEncoding(bodycharset).GetString(output.ToArray());
}

Source : Decoding Quoted printable message

Decode("Elke=E2=80=99s motto", "utf-8") -> Elke’s motto

How to Decode =?utf-8?B?...?= to string in C#

Let's have a look at the meaning of the MIME encoding:

=?utf-8?B?...something...?=
^ ^
| +--- The bytes are Base64 encoded
|
+---- The string is UTF-8 encoded

So, to decode this, take the ...something... out of your string (2LPZhNin2YU= in your case) and then

  1. reverse the Base64 encoding

    var bytes = Convert.FromBase64String("2LPZhNin2YU=");
  2. interpret the bytes as a UTF8 string

    var text = Encoding.UTF8.GetString(bytes);

text should now contain the desired result.


A description of this format can be found in Wikipedia:

  • http://en.wikipedia.org/wiki/MIME#Encoded-Word


Related Topics



Leave a reply



Submit