can I get the unicode value of a character or vise versa with php?
function _uniord($c) {
if (ord($c[0]) >=0 && ord($c[0]) <= 127)
return ord($c[0]);
if (ord($c[0]) >= 192 && ord($c[0]) <= 223)
return (ord($c[0])-192)*64 + (ord($c[1])-128);
if (ord($c[0]) >= 224 && ord($c[0]) <= 239)
return (ord($c[0])-224)*4096 + (ord($c[1])-128)*64 + (ord($c[2])-128);
if (ord($c[0]) >= 240 && ord($c[0]) <= 247)
return (ord($c[0])-240)*262144 + (ord($c[1])-128)*4096 + (ord($c[2])-128)*64 + (ord($c[3])-128);
if (ord($c[0]) >= 248 && ord($c[0]) <= 251)
return (ord($c[0])-248)*16777216 + (ord($c[1])-128)*262144 + (ord($c[2])-128)*4096 + (ord($c[3])-128)*64 + (ord($c[4])-128);
if (ord($c[0]) >= 252 && ord($c[0]) <= 253)
return (ord($c[0])-252)*1073741824 + (ord($c[1])-128)*16777216 + (ord($c[2])-128)*262144 + (ord($c[3])-128)*4096 + (ord($c[4])-128)*64 + (ord($c[5])-128);
if (ord($c[0]) >= 254 && ord($c[0]) <= 255) // error
return FALSE;
return 0;
} // function _uniord()
and
function _unichr($o) {
if (function_exists('mb_convert_encoding')) {
return mb_convert_encoding(''.intval($o).';', 'UTF-8', 'HTML-ENTITIES');
} else {
return chr(intval($o));
}
} // function _unichr()
How to convert unicode in php?
iconv — Convert string to requested character encoding
http://php.net/manual/en/function.iconv.php
Detect Unicode Character Range in PHP
I've worked on something. This will detect the range of each character. I've only put Armenian, Latin and Russian in to start with. If anyone else has need for this, you'll need to find the character ranges to the detectRanges function from a source like: http://jrgraphix.net/r/Unicode/ I'd like to see if there is a better way of doing that part. Make sure any alphabetic character in the ranges are lower case.
mb_internal_encoding("UTF-8");
echo header( "Content-Type: text/html;charset=UTF-8", true );
class DetectUnicodeRanges
{
function entityToUTF8( $number )
{
if( $number < 0 )
return false;
# Replace ASCII characters
if( $number < 128 )
return chr( $number );
# Replace illegal Windows characters
if( $number < 160 )
{
switch( $number )
{
case 128: $conversion = 8364; break;
case 129: $conversion = 160; break;
case 130: $conversion = 8218; break;
case 131: $conversion = 402; break;
case 132: $conversion = 8222; break;
case 133: $conversion = 8230; break;
case 134: $conversion = 8224; break;
case 135: $conversion = 8225; break;
case 136: $conversion = 710; break;
case 137: $conversion = 8240; break;
case 138: $conversion = 352; break;
case 139: $conversion = 8249; break;
case 140: $conversion = 338; break;
case 141: $conversion = 160; break;
case 142: $conversion = 381; break;
case 143: $conversion = 160; break;
case 144: $conversion = 160; break;
case 145: $conversion = 8216; break;
case 146: $conversion = 8217; break;
case 147: $conversion = 8220; break;
case 148: $conversion = 8221; break;
case 149: $conversion = 8226; break;
case 150: $conversion = 8211; break;
case 151: $conversion = 8212; break;
case 152: $conversion = 732; break;
case 153: $conversion = 8482; break;
case 154: $conversion = 353; break;
case 155: $conversion = 8250; break;
case 156: $conversion = 339; break;
case 157: $conversion = 160; break;
case 158: $conversion = 382; break;
case 159: $conversion = 376; break;
}
return $conversion;
}
if ( $number < 2048 )
return chr( ($number >> 6 ) + 192 ) . chr( ( $number & 63 ) + 128 );
if ( $number < 65536 )
return chr( ( $number >> 12 ) + 224 ) . chr( ( ( $number >> 6 ) & 63 ) + 128 ) . chr( ( $number & 63 ) + 128 );
if ( $number < 2097152 )
return chr( ( $number >> 18 ) + 240 ) . chr( ( ( $number >> 12 ) & 63 ) + 128 ) . chr( ( ( $number >> 6 ) & 63 ) + 128 ) . chr( ( $number & 63 ) + 128 );
return false;
}
function MBStrToHexes( $str )
{
$str = mb_convert_encoding( $str, 'UCS-4BE' );
$hexs = array();
for( $i = 0; $i < mb_strlen( $str, 'UCS-4BE' ); $i++ )
{
$s2 = mb_substr( $str, $i, 1, 'UCS-4BE' );
$val = unpack( 'N', $s2 );
$hexs[] = str_pad( dechex( $val[1] ), 4, 0, STR_PAD_LEFT );
}
return( $hexs );
}
function detectRanges( $str )
{
$hexes = $this->MBStrToHexes( $str );
foreach( $hexes as $hex )
{
if( ( $hex >= '0041' ) && ( $hex <= '024f' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Latin<br />';
elseif( ( $hex >= '0400' ) && ( $hex <= '04ff' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Cyrillic<br />';
elseif( ( $hex >= '0530' ) && ( $hex <= '058f' ) )
echo $this->entityToUTF8( hexdec($hex) ) . ' - Armenian<br />';
else
echo $this->entityToUTF8( $hex ) . ' - Some Other Range<br />';
}
}
}
#$strB = 'Cornelius Trow';
$strB = 'Cornelius Српски Հայաստանի';
#$strB = 'Հայաստանի Հանրապետություն';
echo 'Testing String: ' . $strB . '<br />';
$dur = new DetectUnicodeRanges();
$dur->detectRanges( $strB );
How to convert a UTF-8 string to HEX codepoint in PHP?
I take json_encode for multibyte characters and assemble it for the ASCII characters.
function utf8toUnicode($str){
$unicode = "";
$len = mb_strlen($str);
for($i=0;$i<$len;$i++){
$utf8char = mb_substr($str,$i,1);
$unicode .= strlen($utf8char)>1
?trim(json_encode($utf8char),'"')
:('\\u00'.bin2hex($utf8char))
;
}
return $unicode;
}
$str = 'sÆs';
echo utf8toUnicode($str); // \u0073\u00c6\u0073
PHP to detect and convert Special Characters?
$s = "This is a sample string with œ and š";
echo html_entity_decode($s, ENT_COMPAT, 'UTF-8');
PHP UTF-8 mb_convert_encode and Internet-Explorer
Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c']
to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);
Split string into array based on a unicode character range in PHP
You have to check also with a look ahead if the next character is a cyrrilic one. This code will do the job:
$t = preg_split ('/(?<=[^а-я])(?=[а-я]+)/ius', $text, NULL, PREG_SPLIT_NO_EMPTY);
It gives this output:
Array
(
[0] => «
[1] => Добрый
[2] => день!» -
[3] => сказал
[4] => он,
[5] => потянувшись…
)
Here you can try it.
How to display the (extended) ASCII representation of a special character in PHP 5.6?
You're on the right path with bin2hex
, what you're confused about is merely the encoding. Currently you're seeing the hex value of ß for the UTF-8 encoding, because your string is encoded in UTF-8. What you want is the hex value for that string in some other encoding. Let's assume "Extended ASCII" refers to ISO-8859-1, as it colloquially often does (but doesn't have to):
echo bin2hex(iconv('UTF-8', 'ISO-8859-1', 'ß'));
Now, having said that, I have no idea what you'd use that information for. There are many valid "hex values" for the character ß in various different encodings; "Extended ASCII" is just one possible answer, and it's a vague answer to be sure, since "Extended ASCII" has very little practical meaning with hundreds of different "Extended ASCII" charsets available.
Related Topics
How to Handle Double Quotes in String Before Xpath Evaluation
What Does Double Question Mark () Operator Mean in PHP
How to Make Pdo Run Set Names Utf8 Each Time I Connect, in Zendframework
Uploading a File in Chunks Using HTML5
Detect Clients with Proxy Servers via PHP
PHP 7.4 Deprecated Get_Magic_Quotes_Gpc Function Alternative
Php:Convert a Blob into an Image File
Permission Denied - PHP Unlink
Simple PHP Contact Form with Firebase Hosting
PHP Simplexml Namespace Problem
How to Trim White Spaces of Array Values in PHP
Disabling Strict Standards in PHP 5.4
What Is the Best Practice to Export Canvas with High Quality Images
PHP Function for Get All Mondays Within Date Range
Can You Append Strings to Variables in PHP
Jquery - Uncaught Typeerror: Cannot Use 'In' Operator to Search for '324' In