C# Convert String from Utf-8 to Iso-8859-1 (Latin1) H

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);

Converting from UTF8 to ISO-8859-1 doesn't work

Your string doesn't contain a-umlaut.

It contains "Unicode replacement character".

Whatever conversion happened before you got byte[] bytes has already lost your a-umlaut.

Convert a literal improperly encoded string (e.g., ñ ) to ISO-8859-1 (Latin1) H

Ideally this would be fixed in the api you are calling so it is returning the expected encoding. But you should be able to fix it this way:

byte[] bytes = Encoding.GetEncoding(1252).GetBytes(Name);
var nameFixed = Encoding.UTF8.GetString(bytes);

Converting Html utf-8 charset to ISO-8859-1 via C#

After a small test, you can see that the string is not properly getting Encoded back to its original form.

Sample test:

 var item = "Administração - São Paulo - Diurno";
Console.WriteLine(item);

var buffer = Encoding.UTF8.GetBytes(item);
var item2 = Encoding.Default.GetString(buffer);
Console.WriteLine(item2);

This prints:

Administraçao - Sao Paulo - Diurno
AdministraA§A£o - SA£o Paulo - Diurno

As you can see, the original string is being converted to bytes using UTF8, but then it is being converted back to a string using Default encoding.

This is wrong.

If WebRequest.GetResponse() is returning the string as the wrong value, then there is a problem with that method. Try setting the TransferEncoding property on the HttpWebRequest to UTF8.

Before you can set the TransferEncoding property, you must first set the SendChunked property to true. Clearing TransferEncoding by setting it to null has no effect on the value of SendChunked. Values assigned to the TransferEncoding property replace any existing contents.

Or you can try to set the Encoding to UTF8 on the StreamReader you open. Can I see your code?

How to convert string to iso-8859-1 ?

Try:

        System.Text.Encoding iso_8859_1 = System.Text.Encoding.GetEncoding("iso-8859-1");
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;

// Unicode string.
string s_unicode = "abcéabc";

// Convert to ISO-8859-1 bytes.
byte[] isoBytes = iso_8859_1.GetBytes(s_unicode);

// Convert to UTF-8.
byte[] utf8Bytes = System.Text.Encoding.Convert(iso_8859_1, utf_8, isoBytes);

iso-8859-1 to utf-8 in c#

The simplest way would be to load it and then save it with one of the XML APIs available. That way any extra transformations (e.g. the XML declaration) will be handled appropriately. For example:

using System;
using System.Text;
using System.Xml.Linq;

class Test
{
static void Main(string[] args)
{
XDocument doc = XDocument.Load("test.xml");
XDeclaration declaration = doc.Declaration;
if (declaration != null) {
declaration.Encoding = "utf-8";
}
doc.Save("test-utf8.xml");
}
}

Note that I think this may end up changing some things around indentation etc, unless you specify some extra options. Is that likely to be a problem for you?

You could potentially just load the whole file as text (using Encoding.GetEncoding(28591)), modify the declaration part yourself and resave it in UTF-8 yourself. I suspect there may be some corner cases where that would cause a problem though.



Related Topics



Leave a reply



Submit