How to Get the Character from Unicode Code Point in PHP

PHP Unicode codepoint to character

You don't need to convert integer to hexadecimal string, instead use IntlChar::chr:

echo IntlChar::chr(127468);

Directly from docs of IntlChar::chr:

Return Unicode character by code point value

How to get the character from unicode code point in PHP?

header('Content-Encoding: UTF-8');

function mb_html_entity_decode($string)
{
if (extension_loaded('mbstring') === true)
{
mb_language('Neutral');
mb_internal_encoding('UTF-8');
mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'));

return mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
}

return html_entity_decode($string, ENT_COMPAT, 'UTF-8');
}

function mb_ord($string)
{
if (extension_loaded('mbstring') === true)
{
mb_language('Neutral');
mb_internal_encoding('UTF-8');
mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'));

$result = unpack('N', mb_convert_encoding($string, 'UCS-4BE', 'UTF-8'));

if (is_array($result) === true)
{
return $result[1];
}
}

return ord($string);
}

function mb_chr($string)
{
return mb_html_entity_decode('&#' . intval($string) . ';');
}

var_dump(hexdec('010F'));

var_dump(mb_ord('ó')); // 243
var_dump(mb_chr(243)); // ó

Unicode character in PHP string

Because JSON directly supports the \uxxxx syntax the first thing that comes into my mind is:

$unicodeChar = '\u1000';
echo json_decode('"'.$unicodeChar.'"');

Another option would be to use mb_convert_encoding()

echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');

or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:

echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');

How to get code point number for a given character in a utf-8 string?

Scott Reynen wrote a function to convert UTF-8 into Unicode. I found it looking at the PHP documentation.

function utf8_to_unicode( $str ) {

$unicode = array();
$values = array();
$lookingFor = 1;

for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
// exclude 0-9
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
// number
$unicode[] = chr($thisValue);
}
else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
$values = array();
$lookingFor = 1;
} // if
} // if
}
} // for
return implode("",$unicode);

} // utf8_to_unicode

can I get the unicode value of a character or vise versa with php?

function _uniord($c) {
if (ord($c[0]) >=0 && ord($c[0]) <= 127)
return ord($c[0]);
if (ord($c[0]) >= 192 && ord($c[0]) <= 223)
return (ord($c[0])-192)*64 + (ord($c[1])-128);
if (ord($c[0]) >= 224 && ord($c[0]) <= 239)
return (ord($c[0])-224)*4096 + (ord($c[1])-128)*64 + (ord($c[2])-128);
if (ord($c[0]) >= 240 && ord($c[0]) <= 247)
return (ord($c[0])-240)*262144 + (ord($c[1])-128)*4096 + (ord($c[2])-128)*64 + (ord($c[3])-128);
if (ord($c[0]) >= 248 && ord($c[0]) <= 251)
return (ord($c[0])-248)*16777216 + (ord($c[1])-128)*262144 + (ord($c[2])-128)*4096 + (ord($c[3])-128)*64 + (ord($c[4])-128);
if (ord($c[0]) >= 252 && ord($c[0]) <= 253)
return (ord($c[0])-252)*1073741824 + (ord($c[1])-128)*16777216 + (ord($c[2])-128)*262144 + (ord($c[3])-128)*4096 + (ord($c[4])-128)*64 + (ord($c[5])-128);
if (ord($c[0]) >= 254 && ord($c[0]) <= 255) // error
return FALSE;
return 0;
} // function _uniord()

and

function _unichr($o) {
if (function_exists('mb_convert_encoding')) {
return mb_convert_encoding('&#'.intval($o).';', 'UTF-8', 'HTML-ENTITIES');
} else {
return chr(intval($o));
}
} // function _unichr()

PHP: Convert unicode codepoint to UTF-8

$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');

is probably the simplest solution.

Unicode codepoint escape syntax

You cannot use "\u{}" notation for conversion, use mb_chr() instead.

Example:

$unicode= 0x1f606;
echo mb_chr($unicode);

PHP - convert unicode to character

"%uXXXX" is a non-standard scheme for URL-encoding Unicode characters. Apparently it was proposed but never really used. As such, there's hardly any standard function that can decode it into an actual UTF-8 sequence.

It's not too difficult to do it yourself though:

$string = '%u05E1%u05E2';
$string = preg_replace('/%u([0-9A-F]+)/', '&#x$1;', $string);
echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

This converts the %uXXXX notation to HTML entity notation &#xXXXX;, which can be decoded to actual UTF-8 by html_entity_decode. The above outputs the characters "סע" in UTF-8 encoding.

convert any string or a character in php to unicode code points

Use this function

function Unicode_decode($text) {
return implode(unpack('H*', iconv("UTF-8", "UCS-4BE", $text)));
}

If you want to have U+0000 use this :

for ($i=0; $i < strlen($word); $i++)
{
$wordConvert = Unicode_decode($word[$i]);
$result .= "U+" . substr($wordConvert, -4, 4) . "<br/>";

}

echo $result;


Related Topics



Leave a reply



Submit