Decoding Numeric HTML Entities via PHP

Decoding numeric html entities via PHP

html_entity_decode already does what you're looking for:

$string = '’';

echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

It will return the character:

’   binary hex: c292

Which is PRIVATE USE TWO (U+0092). As it's private use, your PHP configuration/version/compile might not return it at all.

Also there are some more quirks:

But in HTML (other than XHTML, which uses XML rules), it's a long-standing browser quirk that character references in the range to Ÿ are misinterpreted to mean the characters associated with bytes 128 to 159 in the Windows Western code page (cp1252) instead of the Unicode characters with those code points. The HTML5 standard finally documents this behaviour.

See: ’ is getting converted as “\u0092” by nokogiri in ruby on rails

convert arabic numeric html entities to chars via PHP

Your PHP version is probably older than 5.4.0, thus html_entity_decode is not using UTF-8.

Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

http://php.net/manual/en/function.html-entity-decode.php


Try the following:

$decoded_string = html_entity_decode($string, ENT_COMPAT | ENT_HTML401, "UTF-8");

View output here on Codepad

PHP How to encode text to numeric entity?

Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.

You should be able to use htmlentities:

htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');

http://pt1.php.net/htmlentities

You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.

As an alternative, you could use strtr to convert the characters to something you specify:

$chars = array(
"\x8484" => "蒄"
...
);

$convertedXML = strtr($xml, $chars);

http://php.net/strtr

Someone has done something similar on GitHub.

HTML entities to normal strings in PHP

See: http://php.net/manual/en/function.html-entity-decode.php

The function html_entity_decode().

This function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

force html entity to display although encoding is enabled

There is actually no need of bypass the sanitizing of a html entity. It's there for a purpose.

When you have to use values on server side/other functions you need to decode values again to original values

In Js:

decodeHtml('string1 string2')

Live Example:http://jsfiddle.net/pranavq212/xasjyjtk/1/

function decodeHtml(html) {    var txt = document.createElement("textarea");    txt.innerHTML = html;    return txt.value;}document.getElementById('form').onsubmit = function(e) {    e.preventDefault();    var input = document.getElementById('input').value;    var output = decodeHtml(input);    alert(output);}
input {    width: 100%;    display: block;}
<form id="form">    <input type="text" id="input" placeholder="input" value="string1  string2"><input type="submit" value="alert(input)"></form>

Convert HTML entities in Json back to characters

There is the solution. I needed to

  1. convert & to & to standardize encoding systems;
  2. convert all applicable characters to HTML entities.

There is the final code. Many thanks to all for all your comments and suggestions.

Full code and online test here: https://www.tehplayground.com/zythX4MUdF3ric4l

array_walk_recursive($data, function(&$item, $key) {
if(is_string($item)) {
$item = str_replace("&", "&", $item); // 1. Replace & by &
$item = html_entity_decode($item); // 2. Convert HTML entities to their corresponding characters
}
});


Related Topics



Leave a reply



Submit