Unescape HTML Entities in JavaScript

Unescape HTML entities in JavaScript?

EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

function htmlDecode(input){
var e = document.createElement('textarea');
e.innerHTML = input;
// handle case of empty input
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("<img src='myimage.jpg'>");
// returns "<img src='myimage.jpg'>"

Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.

It will work cross-browser (including older browsers) and accept all the HTML Character Entities.

EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.

UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.

HTML Entity Decode

You could try something like:

var Title = $('<textarea />').html("Chris' corner").text();console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Unescape html character entities

Just reverse the function:

function unescapeHtml(unsafe) {
return unsafe
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, "\"")
.replace(/'/g, "'");
}

DEMO: http://jsfiddle.net/wazXb/

A plain JavaScript way to decode HTML entities, works on both browsers and Node

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

For html codes like   < > ' and even Chinese characters.

I suggest to use this function. (Inspired by some other answers)

function decodeEntities(encodedString) {
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {
"nbsp":" ",
"amp" : "&",
"quot": "\"",
"lt" : "<",
"gt" : ">"
};
return encodedString.replace(translate_re, function(match, entity) {
return translate[entity];
}).replace(/&#(\d+);/gi, function(match, numStr) {
var num = parseInt(numStr, 10);
return String.fromCharCode(num);
});
}

This implement also works in Node.js environment.

decodeEntities("哈哈 '这个'&"那个"好玩<>") //哈哈 '这个'&"那个"好玩<>

As a new user, I only have 1 reputation :(

I can't make comments or answers to existing posts so that's the only way I can do for now.

Edit 1

I think this answer works even better than mine. Although no one gave him up vote.

What's the right way to decode a string that has special HTML entities in it?

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Native JavaScript or ES6 way to encode and decode HTML entities?

There is no native function in the JavaScript API that convert ASCII characters to their "html-entities" equivalent.
Here is a beginning of a solution and an easy trick that you may like

How to unescape html in javascript?

Change your test string to <b><<&&&</b> to get a better handle on what the risk is... (or better, <img src='http://www.spam.com/ASSETS/0EE75B480E5B450F807117E06219CDA6/spamReg.png' onload='alert(document.cookie);'> for cookie-stealing spam)

See the example at http://jsbin.com/uveme/139/ (based on your example, using prototype for the unescaping.) Try clicking the four different buttons to see the different effects. Only the last one is a security risk. (You can view/edit the source at http://jsbin.com/uveme/139/edit) The example doesn't actually steal your cookies...

  1. If your text is coming from a known-safe source and is not based on any user input, then you are safe.
  2. If you are using createTextNode to create a text node and appendChild to insert that unaltered node object directly into your document, you are safe.
  3. Otherwise, you need to take appropriate measures to ensure that unsafe content can't make it to your viewer's browser.

Note: As pointed out by Ben Vinegar Using createTextNode is not a magic bullet: using it to escape the string, then using textContent or innerHTML to get the escaped text out and doing other stuff with it does not protect you in your subsequent uses. In particluar, the escapeHtml method in Peter Brown's answer below is insecure if used to populate attributes.

Decoded string with html entities does not equal string literal

As pointed out by Andreas in the comments, I forgot about the byte representation of the string.

Look at this example:

toBytes('foo bar') -> Uint8Array(7) [102, 111, 111, 32, 98, 97, 114]
toBytes(decodeHtml('foo bar')) -> Uint8Array(8) [102, 111, 111, 194, 160, 98, 97, 114]

In hindsight it pretty obvious because the breaking space and the non breaking space are (of course) different characters.

Decode HTML Entities in JS to Textbox Value

You need to a use DOMParser as referenced here.