PHP: Convert Unicode Codepoint to Utf-8

PHP: Convert unicode codepoint to UTF-8

$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');

is probably the simplest solution.

Convert unicode special characters to UTF-8

At the very least your regular expression is looking for an uppercase U, while all your escape sequences use lower-case.

But your conversion script goes from javascript-escaped unicode characters, to HTML entities, back to a PHP string. This might be a saner solution (for this string):

$unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';
echo json_decode('"' . $unicode . '"');

Be careful though, as this might break if the input string contains newlines or quotes.

How to Convert string to utf-8 codepoint in php

I found the answer but it return array here

I Edit the function to return String.

function utf8_to_unicode($str) {

$unicode = array();
$values = array();
$lookingFor = 1;

for ($i = 0; $i < strlen($str); $i++) {

$thisValue = ord($str[$i]);

if ($thisValue < 128)
$unicode[] = str_pad(dechex($thisValue), 4, "0", STR_PAD_LEFT);
else {
if (count($values) == 0) $lookingFor = ($thisValue < 224) ? 2 : 3;
$values[] = $thisValue;
if (count($values) == $lookingFor) {
$number = ($lookingFor == 3) ?
(($values[0] % 16) * 4096) + (($values[1] % 64) * 64) + ($values[2] % 64):
(($values[0] % 32) * 64) + ($values[1] % 64);
$number = strtoupper(dechex($number));
$unicode[] = str_pad($number, 4, "0", STR_PAD_LEFT);
$values = array();
$lookingFor = 1;
} // if
} // if
} // for
$str="";
foreach ($unicode as $key => $value) {
$str .= $value;
}


return ($str);
} // utf8_to_unicode

UTF-8 to Unicode Code Points

Converting one character set to another can be done with iconv:

http://php.net/manual/en/function.iconv.php

Note that UTF is already an Unicode encoding.

Another way is simply using htmlentities with the right character set:

http://php.net/manual/en/function.htmlentities.php

PHP Unicode codepoint to character

You don't need to convert integer to hexadecimal string, instead use IntlChar::chr:

echo IntlChar::chr(127468);

Directly from docs of IntlChar::chr:

Return Unicode character by code point value

How to convert a UTF-8 string to HEX codepoint in PHP?

I take json_encode for multibyte characters and assemble it for the ASCII characters.

function utf8toUnicode($str){
$unicode = "";
$len = mb_strlen($str);
for($i=0;$i<$len;$i++){
$utf8char = mb_substr($str,$i,1);
$unicode .= strlen($utf8char)>1
?trim(json_encode($utf8char),'"')
:('\\u00'.bin2hex($utf8char))
;
}
return $unicode;
}

$str = 'sÆs';

echo utf8toUnicode($str); // \u0073\u00c6\u0073

PHP - convert unicode to character

"%uXXXX" is a non-standard scheme for URL-encoding Unicode characters. Apparently it was proposed but never really used. As such, there's hardly any standard function that can decode it into an actual UTF-8 sequence.

It's not too difficult to do it yourself though:

$string = '%u05E1%u05E2';
$string = preg_replace('/%u([0-9A-F]+)/', '&#x$1;', $string);
echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

This converts the %uXXXX notation to HTML entity notation &#xXXXX;, which can be decoded to actual UTF-8 by html_entity_decode. The above outputs the characters "סע" in UTF-8 encoding.



Related Topics



Leave a reply



Submit