How to Convert Emoji from Unicode in PHP

convert emoji character to Unicode codepoint number in php

I found a simple way to solve, so I will answer my own question, but if somebody would like to improve this function, would be cool.

<?php

function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}

$var = ";
echo emoji_to_unicode($var);

?>

PHP convert emojis in string to unicode

Emojis are not 1-byte characters like 123abc@#$^, they are characters with 4 bytes so you can't remove them with unicode range or something like this. But you can select every character with 4 bytes:

function to_unicode($text) {
$str = preg_replace_callback(
"%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs",
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
},
$text
);
return $str;
}

echo to_unicode( 'hello world ' );

output is hello world U+1F600

How it's working

First of all, you have to check 4 bytes characters with regex:

%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs

and using preg_replace_callback function.

then use a callback function to encode selected character:

function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
}

resources:

Detect emoji (stackoverflow)

What is emoji?

How to convert Emoji from Unicode in PHP?

PHP 5

JSON's \u can only handle one UTF-16 code unit at a time, so you need to write the surrogate pair instead. For U+1F600 this is \uD83D\uDE00, which works:

echo json_decode('"\uD83D\uDE00"');
br>

PHP 7

You now no longer need to use json_decode and can just use the \u and the unicode literal:

echo "\u{1F30F}";
br>

Convert emoji code into emoticon character in PHP

You can do a preg_replace() :

$str = "Hello U+1F600";
$str = preg_replace("/U\+([0-9a-f]{4,5})/mi", '&#x${1}', $str);
echo $str;

It will display

Hello br>

PHP keep emoji in unicode but also keep text as plain text

The Intl extension provides functions to work with unicode codepoints and blocks that will allow you to determine if the current character is an emoticon or not.

function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]{3}/","U+",bin2hex($emoji)));
return $unicode;
}

$var = ("xtext here");
$out = '';
for ($i = 0; $i < mb_strlen($var); $i++) {
$char = mb_substr($var, $i, 1);
$isEmoji = IntlChar::getBlockCode(IntlChar::ord($char)) == IntlChar::BLOCK_CODE_EMOTICONS;
$out .= $isEmoji ? emoji_to_unicode($char) : $char;
}

echo $out;

Here's the list of predefined constants where you can find all blocks.

PHP emoji to unicode not converting more than one emoji appropriately

One way to do this is to iterate over each character in $var, converting it as you go. Note that to make the function more robust, you should only replace 3 leading zeros (so as not to mess up values that e.g. start with 4). That way the function will work with all characters. I've also added a check (using mb_ord) that the character needs conversion, so that it works with plain text too:

function emoji_to_unicode($emoji) {
if (mb_ord($emoji) < 256) return $emoji;
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]{3}/","U+",bin2hex($emoji)));
return $unicode;
}

$var = ("xhello");
$out = '';
for ($i = 0; $i < mb_strlen($var); $i++) {
$out .= emoji_to_unicode(mb_substr($var, $i, 1));
}
echo "$out\n";

Output:

U+1F600xU+1F600hello

Demo on 3v4l.org

Php function to decode UTF-16 Unicode to emoji

this may be one way to work around your %u syntax

$emoji = '%uD83D%uDE0B';
print 'This is my emoji: '. json_decode('"' . str_replace('%', '\\', $emoji) . '"');

base on Print Unicode characters PHP

Improve function to recognize and convert unicode emojis

Your current use of preg_replace_callback() assumes that all regex matches will be replaced with a link. Since the emojis will not be used as part of a link, the simples solution is to leave the preg_replace_callback() as-is, an add an extra step after that where we do the unicode replacement.

function convertAll($str) {
$regex = "/[@#](\w+)/";
//type and links
$hrefs = [
'#' => 'hashtag?tag',
'@' => 'profile?username'
];

$result = preg_replace_callback($regex, function($matches) use ($hrefs) {
return sprintf(
'<a href="%s=%s">%s</a>',
$hrefs[$matches[0][0]],
$matches[1],
$matches[0]
);
}, $str);

$result = preg_replace("/U\+([A-F0-9]{5})/", '\u{${1}}', $result);

return($result);
}

The regex part of the preg_replace() is saying to match a literal "U" followed by a literal "+" followed by 5 instances of any characters A-Z or 0-9. We are capturing those 5 characters and putting them after a literal "\u{" and then following them with a literal "}".

DEMO

There may be a way to do this within preg_replace_callback(), but that seemed a bit more effort than I was willing to put in right now. If someone comes up with an answer that does that, I'd love to see it.

To replace with HTML entities use this preg_replace instead:

$result = preg_replace("/U\+([A-F0-9]{5})/", "&#x\\1;", $result);

PHP Convert emojis from UTF-8 to UTF-8 Bytes (UTF-16)

\ud83d\ude01 is an escape sequence for 16-bit Unicode characters and what you apparently want is an 8-bit character escape sequence (using hex digits).

As already pointed out, you can use json_decode() to get the actual emoji from your unicode escape sequence:

$str = "\ud83d\ude01";
$str = json_decode('"' . $str . '"');
echo $str; // br>

You can then make use of str_split() to get every byte of that emoji in an array as mentioned in the documentation:

str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string.

In order to convert every byte to its hex representation, use ord() and dechex():

$bytes = str_split($str);
for ($i = 0; $i < count($bytes); $i++) {
$bytes[$i] = "\x" . dechex(ord($bytes[$i]));
}
$str = implode('',$bytes);

Note that you need to add \x in front of every hex digit by yourself to get the desired sequence.

Everything put together:

$str = "\ud83d\ude01";
$str = json_decode('"' . $str . '"');
$bytes = str_split($str);
for ($i = 0; $i < count($bytes); $i++) {
$bytes[$i] = "\x" . dechex(ord($bytes[$i]));
}
$str = implode('',$bytes);

echo $str; // \xf0\x9f\x98\x81

https://3v4l.org/A1PEn



Related Topics



Leave a reply



Submit