How to Decode Numeric HTML Entities in PHP

Decoding numeric html entities via PHP

html_entity_decode already does what you're looking for:

$string = '’';

echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

It will return the character:

’   binary hex: c292

Which is PRIVATE USE TWO (U+0092). As it's private use, your PHP configuration/version/compile might not return it at all.

Also there are some more quirks:

But in HTML (other than XHTML, which uses XML rules), it's a long-standing browser quirk that character references in the range to Ÿ are misinterpreted to mean the characters associated with bytes 128 to 159 in the Windows Western code page (cp1252) instead of the Unicode characters with those code points. The HTML5 standard finally documents this behaviour.

See: ’ is getting converted as “\u0092” by nokogiri in ruby on rails

PHP How to encode text to numeric entity?

Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.

You should be able to use htmlentities:

htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');

http://pt1.php.net/htmlentities

You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.

As an alternative, you could use strtr to convert the characters to something you specify:

$chars = array(
"\x8484" => "蒄"
...
);

$convertedXML = strtr($xml, $chars);

http://php.net/strtr

Someone has done something similar on GitHub.

HTML entities to normal strings in PHP

See: http://php.net/manual/en/function.html-entity-decode.php

The function html_entity_decode().

This function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

convert arabic numeric html entities to chars via PHP

Your PHP version is probably older than 5.4.0, thus html_entity_decode is not using UTF-8.

Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

http://php.net/manual/en/function.html-entity-decode.php


Try the following:

$decoded_string = html_entity_decode($string, ENT_COMPAT | ENT_HTML401, "UTF-8");

View output here on Codepad

Convert html special characters into numeric codes with PHP

Expanding on my comment: Look at http://php.net/manual/en/function.ord.php

$a = "à";
$b = ord($a); //224


Related Topics



Leave a reply



Submit