Decoding numeric html entities via PHP
html_entity_decode
already does what you're looking for:
$string = '';
echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');
It will return the character:
’ binary hex: c292
Which is PRIVATE USE TWO (U+0092). As it's private use, your PHP configuration/version/compile might not return it at all.
Also there are some more quirks:
But in HTML (other than XHTML, which uses XML rules), it's a long-standing browser quirk that character references in the range
to
are misinterpreted to mean the characters associated with bytes 128 to 159 in the Windows Western code page (cp1252) instead of the Unicode characters with those code points. The HTML5 standard finally documents this behaviour.
See: is getting converted as “\u0092” by nokogiri in ruby on rails
PHP How to encode text to numeric entity?
Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.
You should be able to use htmlentities
:
htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');
http://pt1.php.net/htmlentities
You can change ENT_XML1
to ENT_SUBSTITUTE
and it will return Unicode Replacement Characters or Hex character references.
As an alternative, you could use strtr
to convert the characters to something you specify:
$chars = array(
"\x8484" => "蒄"
...
);
$convertedXML = strtr($xml, $chars);
http://php.net/strtr
Someone has done something similar on GitHub.
HTML entities to normal strings in PHP
See: http://php.net/manual/en/function.html-entity-decode.php
The function
html_entity_decode()
.This function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.
convert arabic numeric html entities to chars via PHP
Your PHP version is probably older than 5.4.0, thus html_entity_decode
is not using UTF-8.
Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
http://php.net/manual/en/function.html-entity-decode.php
Try the following:
$decoded_string = html_entity_decode($string, ENT_COMPAT | ENT_HTML401, "UTF-8");
View output here on Codepad
Convert html special characters into numeric codes with PHP
Expanding on my comment: Look at http://php.net/manual/en/function.ord.php
$a = "à";
$b = ord($a); //224
Related Topics
PHP Get How Many Days and Hours Left from a Date
How to See the Actual Xml Generated by PHP Soap Client Class
A _Construct on an Eloquent Laravel Model
Mysql_Connect (Localhost/127.0.0.1) Slow on Windows Platform
Safe Alternatives to PHP Globals (Good Coding Practices)
If You Create a Variable Inside a If Statement Is It Available Outside the If Statement
Kill MySQL Query on User Abort
PHP Array Printing Using a Loop
How to Include Config.PHP Efficiently
How to Send the Values of an Array of Checkboxes Through Ajax Using Jquery
Multidimensional Array PHP Implode
Fatal Error: Allowed Memory Size of 268435456 Bytes Exhausted (Tried to Allocate 71 Bytes)
Soft Delete Best Practices (Php/Mysql)
Partially Hide Email Address in PHP