How to Decode HTML Characters in C#

How can I decode HTML characters in C#?

You can use HttpUtility.HtmlDecode

If you are using .NET 4.0+ you can also use WebUtility.HtmlDecode which does not require an extra assembly reference as it is available in the System.Net namespace.

How to decode HTML special character into their actual value?

Use HttpUtility.HtmlDecode or WebUtility.HtmlDecode

string s = "Antique bronze of an archer by Franz Iffland Literature:
“Bronzes, sculptors and founders” by H. Berman, Abage. 
“Dictionnaire illustré des sculpteurs animaliers & fondeurs de l’antiquité à nos jours “ by Jean Charles Hachet. Argus Valentines. 
“The dictionary of sculptors in bronze” by James Mackay. Antique collectors club. 
 Fedex shipping: $ 185";
var s2 = HttpUtility.HtmlDecode(s);
var s3 = WebUtility.HtmlDecode(s);

Decoding all HTML Entities

Then maybe you will need the HttpUtility.HtmlDecode?.
It should work, you just need to add a reference to System.Web.
At least this was the way in .Net Framework < 4.

For example the following code:

MessageBox.Show(HttpUtility.HtmlDecode("&©"));

Worked and the output was as expected (ampersand and copyright symbol).
Are you sure the problem is within HtmlDecode and not something else?

UPDATE: Another class capable of doing the job, WebUtility (again HtmlDecode method) came in the newer versions of .Net. However, there seem to be some problems with it. See the HttpUtility vs. WebUtility question.

Decode HTML entities

To decode the string, use WebUtility.HtmlDecode.

Here's a sample LINQPad program that demonstrates:

void Main()
{
string s = "Feel";
string decoded = WebUtility.HtmlDecode(s);
decoded.Dump();
}

Output:

Feel

Note: You're missing a semicolon from the string you've presented in the question. Without the final semicolon, the output will be:

Feel

Decode HTML escaped characters back to normal string in C#

use System.Web.HttpUtility.HtmlDecode or System.Net.WebUtility.HtmlDecode

var decoded = HttpUtility.HtmlDecode("< > &");

C# how to decode html character squared ($sup2)

Html entity should end with ";" character. This should work fine:

System.Net.WebUtility.HtmlDecode("lb/in²");

If you are getting string exactly as you posted - you should fix the side which you are getting this string from, because "²" is not html entity, it's just a "²" string.

Decoding string using HtmlDecode or Escape

You can use HtmlEncode to encode the string and then you can use HtmlDecode to return the original value:

string x = "éí&";
string encoded = System.Web.HttpUtility.HtmlEncode(x);
Console.WriteLine(encoded); //éí&

string decoded = System.Web.HttpUtility.HtmlDecode(encoded);
Console.WriteLine(decoded); //éí&

With your update, you just need to decode the string:

String decoded = System.Web.HttpUtility.HtmlDecode("November is Fruit's Fresh.");
Console.WriteLine(decoded); //November is Fruit's Fresh.

Decode HTML 5 Character set

As commented by Svein this is an issue with the .NET Framework not supporting HTML5 entities.

Since the .NET Framework has gone open source, you can check the code and change it to reflect the necessary changes, as someone did already. If you check out that pull request, you see the problem: there is a breaking change between HTML4 entities and HTML5 entities, which they didn't agree on how to fix. That simply means that the .NET Framework will not support HTML5 entities until a design decision is made.

For you, in the meantime, you could take the diff of the commit, and create your own HTML5 entity parser (which is simply a string replacement and some dictionary lookup).

How to decode HTML encoded character embedded in a json string

ok, i have very superficial knowledge about C#, and none about the .NET API, but intuitively HtmlDecode should decode HTML entities (please excuse me if i'm wrong on that one) ... encoding is quite a b*tch, i know, so i will try to clearly explain the differences between what you have, what you tried, and what should work ...

the correct HTML entity would be ' and not \x27 ... \x27 is a hexadecimal ASCII escape-sequence, as accepted by some JSON decoders and many programming languages, but is completely unrelated to HTML ...

and also, it has nothing to do with JSON, which is the problem ... JSON specs for strings do not allow hexadecimal ASCII escape-sequences, but only Unicode escape-sequences, which is why the escape sequence is unrecognized and which is why using \u0027 instead should work ... now you could blindly replace \x with \u00 (this should perfectly work on valid JSON, although some comments may get damaged in theory, but who cares ... :D)

but personally, if you have access to the source, you should modify it, to make it output valid JSON to match the specs ...

greetz

back2dos



Related Topics



Leave a reply



Submit