What's the right way to decode a string that has special HTML entities in it?
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Unescape HTML entities in JavaScript?
EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.
The following snippet is the old answer's code with a small modification: using a textarea
instead of a div
reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.
function htmlDecode(input){
var e = document.createElement('textarea');
e.innerHTML = input;
// handle case of empty input
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
htmlDecode("<img src='myimage.jpg'>");
// returns "<img src='myimage.jpg'>"
Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.
It will work cross-browser (including older browsers) and accept all the HTML Character Entities.
EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.
UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.
HTML Entity Decode
You could try something like:
var Title = $('<textarea />').html("Chris' corner").text();
console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
How to decode HTML entities using jQuery?
Security note: using this answer (preserved in its original form below) may introduce an XSS vulnerability into your application. You should not use this answer. Read lucascaro's answer for an explanation of the vulnerabilities in this answer, and use the approach from either that answer or Mark Amery's answer instead.
Actually, try
var encodedStr = "This is fun & stuff";
var decoded = $("<div/>").html(encodedStr).text();
console.log(decoded);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div/>
Decode HTML entities in Python string?
Python 3.4+
Use html.unescape()
:
import html
print(html.unescape('£682m'))
FYI html.parser.HTMLParser.unescape
is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.
Python 2.6-3.3
You can use HTMLParser.unescape()
from the standard library:
- For Python 2.6-2.7 it's in
HTMLParser
- For Python 3 it's in
html.parser
>>> try:
... # Python 2.6-2.7
... from HTMLParser import HTMLParser
... except ImportError:
... # Python 3
... from html.parser import HTMLParser
...
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m
You can also use the six
compatibility library to simplify the import:
>>> from six.moves.html_parser import HTMLParser
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m
Native JavaScript or ES6 way to encode and decode HTML entities?
There is no native function in the JavaScript API that convert ASCII characters to their "html-entities" equivalent.
Here is a beginning of a solution and an easy trick that you may like
Encode HTML entities in JavaScript
You can use regex to replace any character in a given unicode range with its html entity equivalent. The code would look something like this:
var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/g, function(i) {
return ''+i.charCodeAt(0)+';';
});
This code will replace all characters in the given range (unicode 00A0 - 9999, as well as ampersand, greater & less than) with their html entity equivalents, which is simply nnn;
where nnn
is the unicode value we get from charCodeAt
.
See it in action here: http://jsfiddle.net/E3EqX/13/ (this example uses jQuery for element selectors used in the example. The base code itself, above, does not use jQuery)
Making these conversions does not solve all the problems -- make sure you're using UTF8 character encoding, make sure your database is storing the strings in UTF8. You still may see instances where the characters do not display correctly, depending on system font configuration and other issues out of your control.
Documentation
String.charCodeAt
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt- HTML Character entities - http://www.chucke.com/entities.html
How do I decode HTML entities in Swift?
This answer was last revised for Swift 5.2 and iOS 13.4 SDK.
There's no straightforward way to do that, but you can use NSAttributedString
magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).
Remember to initialize NSAttributedString
from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.
// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>‘King Of The Fall’</em>"
guard let data = htmlEncodedString.data(using: .utf8) else {
return
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}
// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)
Convert HTML entities in plain text to characters
To decode HTML Entities like of your example you could use the following code.
html_encoded = 'Motorists could be charged for every mile they drive to raise €35bn'
import html
html_decoded = html.unescape(html_encoded)
print(html_decoded)
Related Topics
Why Does Chrome Debugger Think Closed Local Variable Is Undefined
How to Calculate Date Difference in JavaScript
Count Down Timer with Circular Progress Bar
Bootstrap Modal: Background Jumps to Top on Toggle
Add CSS Rule via Jquery for Future Created Elements
Simplest Way to Detect a Pinch
How to Zip Two Arrays in JavaScript
How to Define Global Variables in Coffeescript
What's the Difference Between Event.Stoppropagation and Event.Preventdefault
How to Add Number of Days to Today'S Date
Correct Try...Catch Syntax Using Async/Await
Check If a Value Is an Object in JavaScript
Simulate Background-Size:Cover on <Video> or <Img>
How to Animate a Progressive Drawing of Svg Path
Dynamically Loading CSS File Using JavaScript with Callback Without Jquery