PHP: Writing a Simple Removeemoji Function

PHP : writing a simple removeEmoji function

I think the preg_replace function is the simpliest solution.

As EaterOfCode suggests, I read the wiki page and coded new regex since none of SO (or other websites) answers seemed to work for Instagram photo captions (API returning format) . Note: /u identifier is mandatory to match \x unicode chars.

public static function removeEmoji($text) {

$clean_text = "";

// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clean_text = preg_replace($regexEmoticons, '', $text);

// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clean_text = preg_replace($regexSymbols, '', $clean_text);

// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$clean_text = preg_replace($regexTransport, '', $clean_text);

// Match Miscellaneous Symbols
$regexMisc = '/[\x{2600}-\x{26FF}]/u';
$clean_text = preg_replace($regexMisc, '', $clean_text);

// Match Dingbats
$regexDingbats = '/[\x{2700}-\x{27BF}]/u';
$clean_text = preg_replace($regexDingbats, '', $clean_text);

return $clean_text;
}

The function does not remove all emojis since there are many more, but you get the point.

Please refer to unicode.org - full emoji list (thanks Epoc)

Remove emojis from string

I used this function, customized with additional Unicode ranges (including more emojis and country flags), You can do the same using the details on this page http://unicode.org/emoji/charts/full-emoji-list.html.

Here is the result:

function remove_emoji($string)
{
// Match Enclosed Alphanumeric Supplement
$regex_alphanumeric = '/[\x{1F100}-\x{1F1FF}]/u';
$clear_string = preg_replace($regex_alphanumeric, '', $string);

// Match Miscellaneous Symbols and Pictographs
$regex_symbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clear_string = preg_replace($regex_symbols, '', $clear_string);

// Match Emoticons
$regex_emoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clear_string = preg_replace($regex_emoticons, '', $clear_string);

// Match Transport And Map Symbols
$regex_transport = '/[\x{1F680}-\x{1F6FF}]/u';
$clear_string = preg_replace($regex_transport, '', $clear_string);

// Match Supplemental Symbols and Pictographs
$regex_supplemental = '/[\x{1F900}-\x{1F9FF}]/u';
$clear_string = preg_replace($regex_supplemental, '', $clear_string);

// Match Miscellaneous Symbols
$regex_misc = '/[\x{2600}-\x{26FF}]/u';
$clear_string = preg_replace($regex_misc, '', $clear_string);

// Match Dingbats
$regex_dingbats = '/[\x{2700}-\x{27BF}]/u';
$clear_string = preg_replace($regex_dingbats, '', $clear_string);

return $clear_string;
}

How to remove ⭕️ and ♛ emoji from the beginning of the string PHP?

There is no need for regex here, just use ltrim():

$str = '⭕️ and ♛ emoji from t';
var_dump(ltrim($str, '⭕️♛'));

Result:

string(21) " and ♛ emoji from t"

An example.

Removing emojis from variable

The emoji are encoded in the block U+1F300–U+1F5FF.

preg_replace('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', '', $first_name)

this will strip those out

How to remove emoji code using javascript?

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.

More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.

You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:

return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')

However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.

PHP convert emojis in string to unicode

Emojis are not 1-byte characters like 123abc@#$^, they are characters with 4 bytes so you can't remove them with unicode range or something like this. But you can select every character with 4 bytes:

function to_unicode($text) {
$str = preg_replace_callback(
"%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs",
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
},
$text
);
return $str;
}

echo to_unicode( 'hello world ' );

output is hello world U+1F600

How it's working

First of all, you have to check 4 bytes characters with regex:

%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs

and using preg_replace_callback function.

then use a callback function to encode selected character:

function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
}

resources:

Detect emoji (stackoverflow)

What is emoji?



Related Topics



Leave a reply



Submit