How to Convert All Characters to Their HTML Entity Equivalent Using PHP

How to convert all characters to their html entity equivalent using PHP

Here it goes (assumes UTF-8, but it's trivial to change):

function encode($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
$split = str_split($str, 4);

$res = "";
foreach ($split as $c) {
$cur = 0;
for ($i = 0; $i < 4; $i++) {
$cur |= ord($c[$i]) << (8*(3 - $i));
}
$res .= "&#" . $cur . ";";
}
return $res;
}

EDIT Recommended alternative using unpack:

function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}

PHP: convert all characters to HTML entities

There are no (named) entities for those characters.

You can see the list here. If you want to convert to numerical entities, see this answer.

PHP - Convert Non-ASCII Characters to hex Entities Without mbstring

THIS IS NOT MY CODE.

I did a simple Google check using "php convert unicode to html" and found this:

https://af-design.com/2010/08/17/escaping-unicode-characters-to-html-entities-in-php/

Which had this:

function unicode_escape_sequences($str)
{
$working = json_encode($str);
$working = preg_replace('/\\\u([0-9a-z]{4})/', '&#x$1;', $working);
return json_decode($working);
}

That web page also had a lot of other examples on it but this one looked like what you were looking for.

How to convert HTML entities like – to their character equivalents?

You need to define the target character set. is not a valid character in the default ISO-8859-1 character set, so it's not decoded. Define UTF-8 as the output charset and it will decode:

echo html_entity_decode('–', ENT_NOQUOTES, 'UTF-8');

If at all possible, you should avoid HTML entities to begin with. I don't know where that encoded data comes from, but if you're storing it like this in the database or elsewhere, you're doing it wrong. Always store data UTF-8 encoded and only convert to HTML entities or otherwise escape for output when necessary.

Convert special characters to HTML entities

Your test HTML page is not encoded in UTF-8; therefore, when mb_convert_encoding sees the copyright character (ordinal value 169) it doesn't know what to do with what it perceives as an invalid UTF-8 sequence.

You should therefore specify the correct input encoding when calling mb_convert_encoding:

$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'ISO-8859-1');

Alternatively, you can use something like

$html = htmlentities($html, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1');

Note: I am answering your question directly, but you don't say what you need the conversion for. It's possible that there may be a better way to achieve your goal.



Related Topics



Leave a reply



Submit