How to Transform String to Utf-8 in C#

How to convert a string to UTF8?

This snippet makes an array of bytes with your string encoded in UTF-8:

UTF8Encoding utf8 = new UTF8Encoding();
string unicodeString = "Quick brown fox";
byte[] encodedBytes = utf8.GetBytes(unicodeString);

Transforming string to UTF8

Finally solved the problem (+), As you know UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string, So we should verify that each code unit is within the range of a byte, First we should copy those values into bytes and then convert the new UTF-8 byte sequence into UTF-16:

byte[] utf8Bytes = new byte[utf8String.Length];
for (int i=0;i<utf8String.Length;++i) {
utf8Bytes[i] = (byte)utf8String[i];
}
var result = Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);

So for this input:

اÙزاÙØ´ تسÙÙÙات \r\n\r\n\r\n<p>باسÙا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´Ùار٠53018  ÙربÙØ· ب٠د بÙرخاÙ٠ستاد Ù٠باشد ÙØ·Ùا اصÙاح ÙرÙائÙد\r\n\r\n

I get the correct result:

افزايش تسهيلات \r\n\r\n\r\n<p>باسلام همكار گرامي نامه شماره 53018  مربوط به د بيرخانه ستاد مي باشد لطفا اصلاح فرمائيد\r\n\r\n \r\n\r\n

PS: for removing extra characters I use this code:

result = result.Replace('\r', ' ').Replace('\n', ' ').ToString();

How to convert a string to UTF-8 string format in WP7

You can't convert the byte array to a string by using Convert.ToString. You need to decode it because it's an UTF8-encoded byte array.

string jsonStringUTF8 = Encoding.UTF8.GetString(encodedBytes, 0,encodedBytes.Length)

Storing a string as UTF8 in C#

As you've found, the CLR uses UTF-16 for character encoding. Your best bet may be to use the Encoding classes & a BitConverter to handle the text. This question has some good examples for converting between the two encodings:

Convert String (UTF-16) to UTF-8 in C#

How to convert string to UTF-8 string representation?

In this specific case Base64 encoded text exists in this MIME encoded string after the B?upto the next ?

string mimed = "=?utf-8?B?Rlc6IELFgsSFZCB6YWvFgmFkYW5pYSBGQS9QQQ==?=";

mimed = mimed.Substring(10, mimed.IndexOf("?", 10) - 10);

string result = Encoding.UTF8.GetString(Convert.FromBase64String(mimed));

The reverse:

result = string.Format("=?utf-8?B?{0}?=", Convert.ToBase64String(Encoding.UTF8.GetBytes(@"FW: Błąd zakładania FA/PA")));

How to convert string from one encoding to another

Code page 1252 is 8-bit. The visible escaping (%DC) looks more like it's URL encoded. See RFC3986 You can decode it like this:

    using System.Web;

string inputString = "C:/Users/%DCser";
string decoded = HttpUtility.UrlDecode(inputString, Encoding.GetEncoding(1252));
Console.WriteLine(decoded);

The code above should output "c:/Users/Üser" without quotes. The string in this example will be UTF16-encoded since that's .NET's default encoding. So from here you can convert it to your destination encoding.

How to convert from string ( ASCII type) to UTF-8 in C#

All strings in C# are encoded as UTF16 Little Endian, even if you read a file in UTF8, it gets converted to UTF16LE, don't fight the system, if you need to convert it to UTF8 before writing to a file (there are options to select the target encoding) or sending to a webservice (you will need to send as raw bytes), we need to know what you are trying to accomplish.



Related Topics



Leave a reply



Submit