How to Reverse HTMLentities()

How to reverse htmlentities()?

If you use htmlentities() to encode, you can use html_entity_decode() to reverse the process:

html_entity_decode()

Convert all HTML entities to their applicable characters.

html_entity_decode() is the opposite of htmlentities() in that it converts all HTML entities in the string to their applicable characters.

e.g.

$myCaption = 'áéí';

//encode
$myCaptionEncoded = htmlentities($myCaption, ENT_QUOTES);

//reverse (decode)
$myCaptionDecoded = html_entity_decode($myCaptionEncoded);

htmlentities and html_entity_decode do not behave as reverse

htmlentities uses your default_charset php.ini value for its encoding by default. If you aren't using a charset that supports the entities you're converting, it may not behave as expected. Try this and see if you get different results.

htmlentities($str, null, 'utf-8');

html_entity_decode($str, null, 'utf-8');

mb_substr($str, 0, 25, 'utf-8');

http://php.net/htmlentities

http://php.net/html_entity_decode

http://php.net/manual/en/function.mb-substr.php

HTML Entity Decode

You could try something like:

var Title = $('<textarea />').html("Chris' corner").text();console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

HTML entities to normal strings in PHP

See: http://php.net/manual/en/function.html-entity-decode.php

The function html_entity_decode().

This function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

Inverse htmlentities / html_entity_decode

My version using regular expressions:

$string = '<code> <div> blabla </div> </code>';
$new_string = preg_replace(
'/(.*?)(<.*?>|$)/se',
'html_entity_decode("$1").htmlentities("$2")',
$string
);

It tries to match every tag and textnode and then apply htmlentities and html_entity_decode respectively.

Decode HTML entities in Python string?

Python 3.4+

Use html.unescape():

import html
print(html.unescape('£682m'))

FYI html.parser.HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.


Python 2.6-3.3

You can use HTMLParser.unescape() from the standard library:

  • For Python 2.6-2.7 it's in HTMLParser
  • For Python 3 it's in html.parser
>>> try:
... # Python 2.6-2.7
... from HTMLParser import HTMLParser
... except ImportError:
... # Python 3
... from html.parser import HTMLParser
...
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m

You can also use the six compatibility library to simplify the import:

>>> from six.moves.html_parser import HTMLParser
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m

How to decode HTML entities using jQuery?

Security note: using this answer (preserved in its original form below) may introduce an XSS vulnerability into your application. You should not use this answer. Read lucascaro's answer for an explanation of the vulnerabilities in this answer, and use the approach from either that answer or Mark Amery's answer instead.

Actually, try

var encodedStr = "This is fun & stuff";
var decoded = $("<div/>").html(encodedStr).text();
console.log(decoded);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div/>

How do I encode/decode HTML entities in Ruby?

HTMLEntities can do it:

: jmglov@laurana; sudo gem install htmlentities
Successfully installed htmlentities-4.2.4
: jmglov@laurana; irb
irb(main):001:0> require 'htmlentities'
=> []
irb(main):002:0> HTMLEntities.new.decode "¡I'm highly annoyed with character references!"
=> "¡I'm highly annoyed with character references!"

What's the right way to decode a string that has special HTML entities in it?

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

htmlentities() makes Chinese characters unusable

Have you tried using htmlspecialchars?

I currently use that in production and it's fine.

$foo = "我的名字叫萨沙"
echo '<textarea>' . htmlspecialchars($foo) . '</textarea>';

Alternately,

$str = “你好”;
echo mb_convert_encoding($str, ‘UTF-8′, ‘HTML-ENTITIES’);

As found on http://www.techiecorner.com/129/php-how-to-convert-iso-character-htmlentities-to-utf-8/



Related Topics



Leave a reply



Submit