JavaScript Decoding HTML Entities

HTML Entity Decode

You could try something like:

var Title = $('<textarea />').html("Chris' corner").text();console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

What's the right way to decode a string that has special HTML entities in it?

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

A plain JavaScript way to decode HTML entities, works on both browsers and Node

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

For html codes like   < > ' and even Chinese characters.

I suggest to use this function. (Inspired by some other answers)

function decodeEntities(encodedString) {
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {
"nbsp":" ",
"amp" : "&",
"quot": "\"",
"lt" : "<",
"gt" : ">"
};
return encodedString.replace(translate_re, function(match, entity) {
return translate[entity];
}).replace(/&#(\d+);/gi, function(match, numStr) {
var num = parseInt(numStr, 10);
return String.fromCharCode(num);
});
}

This implement also works in Node.js environment.

decodeEntities("哈哈 '这个'&"那个"好玩<>") //哈哈 '这个'&"那个"好玩<>

As a new user, I only have 1 reputation :(

I can't make comments or answers to existing posts so that's the only way I can do for now.

Edit 1

I think this answer works even better than mine. Although no one gave him up vote.

Encode HTML entities in JavaScript

You can use regex to replace any character in a given unicode range with its html entity equivalent. The code would look something like this:

var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/g, function(i) {
return '&#'+i.charCodeAt(0)+';';
});

This code will replace all characters in the given range (unicode 00A0 - 9999, as well as ampersand, greater & less than) with their html entity equivalents, which is simply &#nnn; where nnn is the unicode value we get from charCodeAt.

See it in action here: http://jsfiddle.net/E3EqX/13/ (this example uses jQuery for element selectors used in the example. The base code itself, above, does not use jQuery)

Making these conversions does not solve all the problems -- make sure you're using UTF8 character encoding, make sure your database is storing the strings in UTF8. You still may see instances where the characters do not display correctly, depending on system font configuration and other issues out of your control.

Documentation

  • String.charCodeAt - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt
  • HTML Character entities - http://www.chucke.com/entities.html

Native JavaScript or ES6 way to encode and decode HTML entities?

There is no native function in the JavaScript API that convert ASCII characters to their "html-entities" equivalent.
Here is a beginning of a solution and an easy trick that you may like

Unescape HTML entities in JavaScript?

EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

function htmlDecode(input){
var e = document.createElement('textarea');
e.innerHTML = input;
// handle case of empty input
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("<img src='myimage.jpg'>");
// returns "<img src='myimage.jpg'>"

Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.

It will work cross-browser (including older browsers) and accept all the HTML Character Entities.

EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.

UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.

How to decode HTML entities using jQuery?

Security note: using this answer (preserved in its original form below) may introduce an XSS vulnerability into your application. You should not use this answer. Read lucascaro's answer for an explanation of the vulnerabilities in this answer, and use the approach from either that answer or Mark Amery's answer instead.

Actually, try

var encodedStr = "This is fun & stuff";
var decoded = $("<div/>").html(encodedStr).text();
console.log(decoded);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div/>

Javascript decoding html entities

var text = '<p>name</p><p><span style="font-size:xx-small;">ajde</span></p><p><em>da</em></p>';
var decoded = $('<textarea/>').html(text).text();
alert(decoded);

This sets the innerHTML of a new element (not appended to the page), causing jQuery to decode it into HTML, which is then pulled back out with .text().

Live demo.

Decode HTML entities in JavaScript?

I have on my utility belt this tiny function always:

function htmlDecode(input){
var e = document.createElement('div');
e.innerHTML = input;
return e.childNodes[0].nodeValue;
}

htmlDecode("&"); // "&"
htmlDecode(">"); // ">"

It will work for all HTML Entities.

Edit: Since you aren't in a DOM environment, I think you will have to do it by the "hard" way:

function htmlDecode (input) {
return input.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">");
//...
}

If you don't like the chained replacements, you could build an object to store your entities, e.g.:

function htmlDecode (input) {
var entities= {
"&": "&",
"<": "<",
">": ">"
//....
};

for (var prop in entities) {
if (entities.hasOwnProperty(prop)) {
input = input.replace(new RegExp(prop, "g"), entities[prop]);
}
}
return input;
}

Decode HTML Entities in JS to Textbox Value

You need to a use DOMParser as referenced here.

//this function does the unescaping
function htmlDecode(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}

var str="Stars & Stripes"; //arbitrary escaped string

//render the escaped string into the div as-is
document.getElementById("printhere_div").innerHTML = str;

//set the textarea value using the unescaped string
document.getElementById("printhere_textarea").value = htmlDecode(str);
<div id="printhere_div">target div</div>
<textarea id="printhere_textarea">target textarea</textarea>


Related Topics



Leave a reply



Submit