convert emoji character to Unicode codepoint number in php
I found a simple way to solve, so I will answer my own question, but if somebody would like to improve this function, would be cool.
<?php
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
$var = ";
echo emoji_to_unicode($var);
?>
PHP convert emojis in string to unicode
Emojis are not 1-byte characters like 123abc@#$^, they are characters with 4 bytes so you can't remove them with unicode range or something like this. But you can select every character with 4 bytes:
function to_unicode($text) {
$str = preg_replace_callback(
"%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs",
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
},
$text
);
return $str;
}
echo to_unicode( 'hello world ' );
output is hello world U+1F600
How it's working
First of all, you have to check 4 bytes characters with regex:%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs
and using preg_replace_callback
function.then use a callback function to encode selected character:
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
}
resources:Detect emoji (stackoverflow)
What is emoji?
How to convert Emoji from Unicode in PHP?
PHP 5
JSON's \u
can only handle one UTF-16 code unit at a time, so you need to write the surrogate pair instead. For U+1F600
this is \uD83D\uDE00
, which works:
echo json_decode('"\uD83D\uDE00"');
br>
PHP 7You now no longer need to use json_decode
and can just use the \u
and the unicode literal:
echo "\u{1F30F}";
br>
Convert emoji code into emoticon character in PHP
You can do a preg_replace()
:
$str = "Hello U+1F600";
$str = preg_replace("/U\+([0-9a-f]{4,5})/mi", '${1}', $str);
echo $str;
It will display Hello br>
PHP keep emoji in unicode but also keep text as plain text
The Intl extension provides functions to work with unicode codepoints and blocks that will allow you to determine if the current character is an emoticon or not.
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]{3}/","U+",bin2hex($emoji)));
return $unicode;
}
$var = ("xtext here");
$out = '';
for ($i = 0; $i < mb_strlen($var); $i++) {
$char = mb_substr($var, $i, 1);
$isEmoji = IntlChar::getBlockCode(IntlChar::ord($char)) == IntlChar::BLOCK_CODE_EMOTICONS;
$out .= $isEmoji ? emoji_to_unicode($char) : $char;
}
echo $out;
Here's the list of predefined constants where you can find all blocks. PHP emoji to unicode not converting more than one emoji appropriately
One way to do this is to iterate over each character in $var
, converting it as you go. Note that to make the function more robust, you should only replace 3 leading zeros (so as not to mess up values that e.g. start with 4). That way the function will work with all characters. I've also added a check (using mb_ord
) that the character needs conversion, so that it works with plain text too:
function emoji_to_unicode($emoji) {
if (mb_ord($emoji) < 256) return $emoji;
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]{3}/","U+",bin2hex($emoji)));
return $unicode;
}
$var = ("xhello");
$out = '';
for ($i = 0; $i < mb_strlen($var); $i++) {
$out .= emoji_to_unicode(mb_substr($var, $i, 1));
}
echo "$out\n";
Output:U+1F600xU+1F600hello
Demo on 3v4l.org Php function to decode UTF-16 Unicode to emoji
this may be one way to work around your %u
syntax
$emoji = '%uD83D%uDE0B';
print 'This is my emoji: '. json_decode('"' . str_replace('%', '\\', $emoji) . '"');
base on Print Unicode characters PHP Improve function to recognize and convert unicode emojis
Your current use of preg_replace_callback()
assumes that all regex matches will be replaced with a link. Since the emojis will not be used as part of a link, the simples solution is to leave the preg_replace_callback()
as-is, an add an extra step after that where we do the unicode replacement.
function convertAll($str) {
$regex = "/[@#](\w+)/";
//type and links
$hrefs = [
'#' => 'hashtag?tag',
'@' => 'profile?username'
];
$result = preg_replace_callback($regex, function($matches) use ($hrefs) {
return sprintf(
'<a href="%s=%s">%s</a>',
$hrefs[$matches[0][0]],
$matches[1],
$matches[0]
);
}, $str);
$result = preg_replace("/U\+([A-F0-9]{5})/", '\u{${1}}', $result);
return($result);
}
The regex part of the preg_replace()
is saying to match a literal "U" followed by a literal "+" followed by 5 instances of any characters A-Z or 0-9. We are capturing those 5 characters and putting them after a literal "\u{" and then following them with a literal "}".DEMO
There may be a way to do this within preg_replace_callback()
, but that seemed a bit more effort than I was willing to put in right now. If someone comes up with an answer that does that, I'd love to see it.
To replace with HTML entities use this preg_replace
instead:
$result = preg_replace("/U\+([A-F0-9]{5})/", "\\1;", $result);
PHP Convert emojis from UTF-8 to UTF-8 Bytes (UTF-16)
\ud83d\ude01
is an escape sequence for 16-bit Unicode characters and what you apparently want is an 8-bit character escape sequence (using hex digits).
As already pointed out, you can use json_decode()
to get the actual emoji from your unicode escape sequence:
$str = "\ud83d\ude01";
$str = json_decode('"' . $str . '"');
echo $str; // br>
You can then make use of str_split()
to get every byte of that emoji in an array as mentioned in the documentation:In order to convert every byte to its hex representation, usestr_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string.
ord()
and dechex()
:$bytes = str_split($str);
for ($i = 0; $i < count($bytes); $i++) {
$bytes[$i] = "\x" . dechex(ord($bytes[$i]));
}
$str = implode('',$bytes);
Note that you need to add \x
in front of every hex digit by yourself to get the desired sequence.Everything put together:
$str = "\ud83d\ude01";
$str = json_decode('"' . $str . '"');
$bytes = str_split($str);
for ($i = 0; $i < count($bytes); $i++) {
$bytes[$i] = "\x" . dechex(ord($bytes[$i]));
}
$str = implode('',$bytes);
echo $str; // \xf0\x9f\x98\x81
https://3v4l.org/A1PEn
Related Topics
How to Get the Absolute Path to the Public_HTML Folder
Iconv_Strlen Function Causing Execution Timeout, Running on Mamp
PHP Utf-8 to Windows Command Line Encoding
Type Hinting: Default Parameters
Why Is Object Oriented PHP with MySQLi Better Than the Procedural Approach
Use PHP to Check If Page Was Accessed with Ssl
Resource Interpreted as Image But Transferred with Mime Type Text/HTML - Magento
How to Deal with "Method Not Found in Class" Warning for Magically Implemented Methods
Changes to Upload_Max_Filesize in Ubuntu PHP.Ini Will Not Take Effect
Sort Array Based on the Datetime in PHP
Call to Undefined Function Curl_Init() - with Wamp
Flip (Transpose) the Rows and Columns of a 2D Array Without Changing the Number of Columns
Symfony 3.1.5 Warning: Sessionhandler::Read(): Session Data File Is Not Created by Your Uid
Does Die() End Your Session in PHP