How can I decode HTML characters in C#?
You can use HttpUtility.HtmlDecode
If you are using .NET 4.0+ you can also use WebUtility.HtmlDecode
which does not require an extra assembly reference as it is available in the System.Net
namespace.
How to decode HTML special character into their actual value?
Use HttpUtility.HtmlDecode
or WebUtility.HtmlDecode
string s = "Antique bronze of an archer by Franz Iffland Literature:
“Bronzes, sculptors and founders” by H. Berman, Abage.
“Dictionnaire illustré des sculpteurs animaliers & fondeurs de l’antiquité à nos jours “ by Jean Charles Hachet. Argus Valentines.
“The dictionary of sculptors in bronze” by James Mackay. Antique collectors club.
Fedex shipping: $ 185";
var s2 = HttpUtility.HtmlDecode(s);
var s3 = WebUtility.HtmlDecode(s);
Decoding all HTML Entities
Then maybe you will need the HttpUtility.HtmlDecode?.
It should work, you just need to add a reference to System.Web.
At least this was the way in .Net Framework < 4.
For example the following code:
MessageBox.Show(HttpUtility.HtmlDecode("&©"));
Worked and the output was as expected (ampersand and copyright symbol).
Are you sure the problem is within HtmlDecode and not something else?
UPDATE: Another class capable of doing the job, WebUtility (again HtmlDecode method) came in the newer versions of .Net. However, there seem to be some problems with it. See the HttpUtility vs. WebUtility question.
Decode HTML entities
To decode the string, use WebUtility.HtmlDecode.
Here's a sample LINQPad program that demonstrates:
void Main()
{
string s = "Feel";
string decoded = WebUtility.HtmlDecode(s);
decoded.Dump();
}
Output:
Feel
Note: You're missing a semicolon from the string you've presented in the question. Without the final semicolon, the output will be:
Feel
Decode HTML escaped characters back to normal string in C#
use System.Web.HttpUtility.HtmlDecode
or System.Net.WebUtility.HtmlDecode
var decoded = HttpUtility.HtmlDecode("< > &");
C# how to decode html character squared ($sup2)
Html entity should end with ";" character. This should work fine:
System.Net.WebUtility.HtmlDecode("lb/in²");
If you are getting string exactly as you posted - you should fix the side which you are getting this string from, because "²" is not html entity, it's just a "²" string.
Decoding string using HtmlDecode or Escape
You can use HtmlEncode
to encode the string and then you can use HtmlDecode
to return the original value:
string x = "éí&";
string encoded = System.Web.HttpUtility.HtmlEncode(x);
Console.WriteLine(encoded); //éí&
string decoded = System.Web.HttpUtility.HtmlDecode(encoded);
Console.WriteLine(decoded); //éí&
With your update, you just need to decode the string:
String decoded = System.Web.HttpUtility.HtmlDecode("November is Fruit's Fresh.");
Console.WriteLine(decoded); //November is Fruit's Fresh.
Decode HTML 5 Character set
As commented by Svein this is an issue with the .NET Framework not supporting HTML5 entities.
Since the .NET Framework has gone open source, you can check the code and change it to reflect the necessary changes, as someone did already. If you check out that pull request, you see the problem: there is a breaking change between HTML4 entities and HTML5 entities, which they didn't agree on how to fix. That simply means that the .NET Framework will not support HTML5 entities until a design decision is made.
For you, in the meantime, you could take the diff of the commit, and create your own HTML5 entity parser (which is simply a string replacement and some dictionary lookup).
How to decode HTML encoded character embedded in a json string
ok, i have very superficial knowledge about C#
, and none about the .NET
API, but intuitively HtmlDecode
should decode HTML entities (please excuse me if i'm wrong on that one) ... encoding is quite a b*tch, i know, so i will try to clearly explain the differences between what you have, what you tried, and what should work ...
the correct HTML entity would be '
and not \x27
... \x27
is a hexadecimal ASCII escape-sequence, as accepted by some JSON
decoders and many programming languages, but is completely unrelated to HTML ...
and also, it has nothing to do with JSON
, which is the problem ... JSON specs for strings do not allow hexadecimal ASCII escape-sequences, but only Unicode escape-sequences, which is why the escape sequence is unrecognized and which is why using \u0027
instead should work ... now you could blindly replace \x
with \u00
(this should perfectly work on valid JSON
, although some comments may get damaged in theory, but who cares ... :D)
but personally, if you have access to the source, you should modify it, to make it output valid JSON
to match the specs ...
greetz
back2dos
Related Topics
Implementing Inotifypropertychanged - Does a Better Way Exist
How to Implement Custom Jsonconverter in Json.Net
Why Not Inherit from List≪T≫
Converting a String to Datetime
C# Variance Problem: Assigning List≪Derived≫ as List≪Base≫
How to Convert a Unix Timestamp to Datetime and Vice Versa
Webbrowser Control in a New Thread
Mvvm: Tutorial from Start to Finish
Why Saving Changes to a Database Fails
What Does the [Flags] Enum Attribute Mean in C#
How to Parse a Json String That Would Cause Illegal C# Identifiers
How to Strip HTML Tags from a String in Asp.Net