How to Get the Unicode Value of a Character or Vise Versa with PHP

can I get the unicode value of a character or vise versa with php?

function _uniord($c) {
if (ord($c[0]) >=0 && ord($c[0]) <= 127)
return ord($c[0]);
if (ord($c[0]) >= 192 && ord($c[0]) <= 223)
return (ord($c[0])-192)*64 + (ord($c[1])-128);
if (ord($c[0]) >= 224 && ord($c[0]) <= 239)
return (ord($c[0])-224)*4096 + (ord($c[1])-128)*64 + (ord($c[2])-128);
if (ord($c[0]) >= 240 && ord($c[0]) <= 247)
return (ord($c[0])-240)*262144 + (ord($c[1])-128)*4096 + (ord($c[2])-128)*64 + (ord($c[3])-128);
if (ord($c[0]) >= 248 && ord($c[0]) <= 251)
return (ord($c[0])-248)*16777216 + (ord($c[1])-128)*262144 + (ord($c[2])-128)*4096 + (ord($c[3])-128)*64 + (ord($c[4])-128);
if (ord($c[0]) >= 252 && ord($c[0]) <= 253)
return (ord($c[0])-252)*1073741824 + (ord($c[1])-128)*16777216 + (ord($c[2])-128)*262144 + (ord($c[3])-128)*4096 + (ord($c[4])-128)*64 + (ord($c[5])-128);
if (ord($c[0]) >= 254 && ord($c[0]) <= 255) // error
return FALSE;
return 0;
} // function _uniord()

and

function _unichr($o) {
if (function_exists('mb_convert_encoding')) {
return mb_convert_encoding('&#'.intval($o).';', 'UTF-8', 'HTML-ENTITIES');
} else {
return chr(intval($o));
}
} // function _unichr()

How to convert unicode in php?

iconv — Convert string to requested character encoding

http://php.net/manual/en/function.iconv.php

Detect Unicode Character Range in PHP

I've worked on something. This will detect the range of each character. I've only put Armenian, Latin and Russian in to start with. If anyone else has need for this, you'll need to find the character ranges to the detectRanges function from a source like: http://jrgraphix.net/r/Unicode/ I'd like to see if there is a better way of doing that part. Make sure any alphabetic character in the ranges are lower case.

mb_internal_encoding("UTF-8");
echo header( "Content-Type: text/html;charset=UTF-8", true );

class DetectUnicodeRanges
{
function entityToUTF8( $number )
{
if( $number < 0 )
return false;

# Replace ASCII characters
if( $number < 128 )
return chr( $number );

# Replace illegal Windows characters
if( $number < 160 )
{
switch( $number )
{
case 128: $conversion = 8364; break;
case 129: $conversion = 160; break;
case 130: $conversion = 8218; break;
case 131: $conversion = 402; break;
case 132: $conversion = 8222; break;
case 133: $conversion = 8230; break;
case 134: $conversion = 8224; break;
case 135: $conversion = 8225; break;
case 136: $conversion = 710; break;
case 137: $conversion = 8240; break;
case 138: $conversion = 352; break;
case 139: $conversion = 8249; break;
case 140: $conversion = 338; break;
case 141: $conversion = 160; break;
case 142: $conversion = 381; break;
case 143: $conversion = 160; break;
case 144: $conversion = 160; break;
case 145: $conversion = 8216; break;
case 146: $conversion = 8217; break;
case 147: $conversion = 8220; break;
case 148: $conversion = 8221; break;
case 149: $conversion = 8226; break;
case 150: $conversion = 8211; break;
case 151: $conversion = 8212; break;
case 152: $conversion = 732; break;
case 153: $conversion = 8482; break;
case 154: $conversion = 353; break;
case 155: $conversion = 8250; break;
case 156: $conversion = 339; break;
case 157: $conversion = 160; break;
case 158: $conversion = 382; break;
case 159: $conversion = 376; break;
}

return $conversion;
}

if ( $number < 2048 )
return chr( ($number >> 6 ) + 192 ) . chr( ( $number & 63 ) + 128 );
if ( $number < 65536 )
return chr( ( $number >> 12 ) + 224 ) . chr( ( ( $number >> 6 ) & 63 ) + 128 ) . chr( ( $number & 63 ) + 128 );
if ( $number < 2097152 )
return chr( ( $number >> 18 ) + 240 ) . chr( ( ( $number >> 12 ) & 63 ) + 128 ) . chr( ( ( $number >> 6 ) & 63 ) + 128 ) . chr( ( $number & 63 ) + 128 );

return false;
}

function MBStrToHexes( $str )
{
$str = mb_convert_encoding( $str, 'UCS-4BE' );
$hexs = array();
for( $i = 0; $i < mb_strlen( $str, 'UCS-4BE' ); $i++ )
{
$s2 = mb_substr( $str, $i, 1, 'UCS-4BE' );
$val = unpack( 'N', $s2 );
$hexs[] = str_pad( dechex( $val[1] ), 4, 0, STR_PAD_LEFT );
}
return( $hexs );
}

function detectRanges( $str )
{
$hexes = $this->MBStrToHexes( $str );
foreach( $hexes as $hex )
{
if( ( $hex >= '0041' ) && ( $hex <= '024f' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Latin<br />';
elseif( ( $hex >= '0400' ) && ( $hex <= '04ff' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Cyrillic<br />';
elseif( ( $hex >= '0530' ) && ( $hex <= '058f' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Armenian<br />';
else
echo $this->entityToUTF8( $hex ) . ' - Some Other Range<br />';
}
}

}

#$strB = 'Cornelius Trow';
$strB = 'Cornelius Српски Հայաստանի';
#$strB = 'Հայաստանի Հանրապետություն';
echo 'Testing String: ' . $strB . '<br />';
$dur = new DetectUnicodeRanges();
$dur->detectRanges( $strB );

How to convert a UTF-8 string to HEX codepoint in PHP?

I take json_encode for multibyte characters and assemble it for the ASCII characters.

function utf8toUnicode($str){
$unicode = "";
$len = mb_strlen($str);
for($i=0;$i<$len;$i++){
$utf8char = mb_substr($str,$i,1);
$unicode .= strlen($utf8char)>1
?trim(json_encode($utf8char),'"')
:('\\u00'.bin2hex($utf8char))
;
}
return $unicode;
}

$str = 'sÆs';

echo utf8toUnicode($str); // \u0073\u00c6\u0073

PHP to detect and convert Special Characters?

$s = "This is a sample string with œ and š";
echo html_entity_decode($s, ENT_COMPAT, 'UTF-8');

PHP UTF-8 mb_convert_encode and Internet-Explorer

Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.

$_GET['c'] = utf8_encode($_GET['c']);

Split string into array based on a unicode character range in PHP

You have to check also with a look ahead if the next character is a cyrrilic one. This code will do the job:

$t = preg_split ('/(?<=[^а-я])(?=[а-я]+)/ius', $text, NULL, PREG_SPLIT_NO_EMPTY);

It gives this output:

Array
(
[0] => «
[1] => Добрый
[2] => день!» -
[3] => сказал
[4] => он,
[5] => потянувшись…
)

Here you can try it.

How to display the (extended) ASCII representation of a special character in PHP 5.6?

You're on the right path with bin2hex, what you're confused about is merely the encoding. Currently you're seeing the hex value of ß for the UTF-8 encoding, because your string is encoded in UTF-8. What you want is the hex value for that string in some other encoding. Let's assume "Extended ASCII" refers to ISO-8859-1, as it colloquially often does (but doesn't have to):

echo bin2hex(iconv('UTF-8', 'ISO-8859-1', 'ß'));

Now, having said that, I have no idea what you'd use that information for. There are many valid "hex values" for the character ß in various different encodings; "Extended ASCII" is just one possible answer, and it's a vague answer to be sure, since "Extended ASCII" has very little practical meaning with hundreds of different "Extended ASCII" charsets available.



Related Topics



Leave a reply



Submit