How to convert all characters to their html entity equivalent using PHP
Here it goes (assumes UTF-8, but it's trivial to change):
function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);
$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "" . $cur . ";";
}
return $res;
}
EDIT Recommended alternative using unpack
:
function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "$n;"; }, $t);
return implode("", $t);
}
PHP: convert all characters to HTML entities
There are no (named) entities for those characters.
You can see the list here. If you want to convert to numerical entities, see this answer.
PHP - Convert Non-ASCII Characters to hex Entities Without mbstring
THIS IS NOT MY CODE.
I did a simple Google check using "php convert unicode to html" and found this:
https://af-design.com/2010/08/17/escaping-unicode-characters-to-html-entities-in-php/
Which had this:
function unicode_escape_sequences($str)
{
$working = json_encode($str);
$working = preg_replace('/\\\u([0-9a-z]{4})/', '$1;', $working);
return json_decode($working);
}
That web page also had a lot of other examples on it but this one looked like what you were looking for.
How to convert HTML entities like – to their character equivalents?
You need to define the target character set. –
is not a valid character in the default ISO-8859-1 character set, so it's not decoded. Define UTF-8 as the output charset and it will decode:
echo html_entity_decode('–', ENT_NOQUOTES, 'UTF-8');
If at all possible, you should avoid HTML entities to begin with. I don't know where that encoded data comes from, but if you're storing it like this in the database or elsewhere, you're doing it wrong. Always store data UTF-8 encoded and only convert to HTML entities or otherwise escape for output when necessary.
Convert special characters to HTML entities
Your test HTML page is not encoded in UTF-8; therefore, when mb_convert_encoding
sees the copyright character (ordinal value 169) it doesn't know what to do with what it perceives as an invalid UTF-8 sequence.
You should therefore specify the correct input encoding when calling mb_convert_encoding
:
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'ISO-8859-1');
Alternatively, you can use something like
$html = htmlentities($html, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1');
Note: I am answering your question directly, but you don't say what you need the conversion for. It's possible that there may be a better way to achieve your goal.
Related Topics
PHP MySQL_Real_Escape_String() -> Stripslashes() Leaving Multiple Slashes
Getting List Ips from Cidr Notation in PHP
PHP How to Remove Any Last Commas
How to Extend a Class Dynamically
How Is an Array in a PHP Foreach Loop Read
Does PHP Optimize Tail Recursion
Swift_Transportexception Connection Could Not Be Established with Host Smtp.Gmail.Com
Utf8_(En|De)Code Removed from PHP7
Paypal Sandbox Ipn Always Returns Invalid
Why Does PHP Allow "Incompatible" Constructors
Differencebetween Null and Empty
Magento - Load Only Configurable Products
How to Detect Non-Ascii Characters in a String
How Would I Stop This Foreach Loop After 3 Iterations
Laravel Cannot Delete or Update a Parent Row: a Foreign Key Constraint Fails