PHP : writing a simple removeEmoji function
I think the preg_replace function is the simpliest solution.
As EaterOfCode suggests, I read the wiki page and coded new regex since none of SO (or other websites) answers seemed to work for Instagram photo captions (API returning format) . Note: /u identifier is mandatory to match \x unicode chars.
public static function removeEmoji($text) {
$clean_text = "";
// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clean_text = preg_replace($regexEmoticons, '', $text);
// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clean_text = preg_replace($regexSymbols, '', $clean_text);
// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$clean_text = preg_replace($regexTransport, '', $clean_text);
// Match Miscellaneous Symbols
$regexMisc = '/[\x{2600}-\x{26FF}]/u';
$clean_text = preg_replace($regexMisc, '', $clean_text);
// Match Dingbats
$regexDingbats = '/[\x{2700}-\x{27BF}]/u';
$clean_text = preg_replace($regexDingbats, '', $clean_text);
return $clean_text;
}
The function does not remove all emojis since there are many more, but you get the point.
Please refer to unicode.org - full emoji list (thanks Epoc)
Remove emojis from string
I used this function, customized with additional Unicode ranges (including more emojis and country flags), You can do the same using the details on this page http://unicode.org/emoji/charts/full-emoji-list.html.
Here is the result:
function remove_emoji($string)
{
// Match Enclosed Alphanumeric Supplement
$regex_alphanumeric = '/[\x{1F100}-\x{1F1FF}]/u';
$clear_string = preg_replace($regex_alphanumeric, '', $string);
// Match Miscellaneous Symbols and Pictographs
$regex_symbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clear_string = preg_replace($regex_symbols, '', $clear_string);
// Match Emoticons
$regex_emoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clear_string = preg_replace($regex_emoticons, '', $clear_string);
// Match Transport And Map Symbols
$regex_transport = '/[\x{1F680}-\x{1F6FF}]/u';
$clear_string = preg_replace($regex_transport, '', $clear_string);
// Match Supplemental Symbols and Pictographs
$regex_supplemental = '/[\x{1F900}-\x{1F9FF}]/u';
$clear_string = preg_replace($regex_supplemental, '', $clear_string);
// Match Miscellaneous Symbols
$regex_misc = '/[\x{2600}-\x{26FF}]/u';
$clear_string = preg_replace($regex_misc, '', $clear_string);
// Match Dingbats
$regex_dingbats = '/[\x{2700}-\x{27BF}]/u';
$clear_string = preg_replace($regex_dingbats, '', $clear_string);
return $clear_string;
}
How to remove ⭕️ and ♛ emoji from the beginning of the string PHP?
There is no need for regex here, just use ltrim()
:
$str = '⭕️ and ♛ emoji from t';
var_dump(ltrim($str, '⭕️♛'));
Result:
string(21) " and ♛ emoji from t"
An example.
Removing emojis from variable
The emoji are encoded in the block U+1F300–U+1F5FF.
preg_replace('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', '', $first_name)
this will strip those out
How to remove emoji code using javascript?
The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF]
in a regex engine that supported non-BMP characters, but JavaScript's RegExp
is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.
PHP convert emojis in string to unicode
Emojis are not 1-byte characters like 123abc@#$^, they are characters with 4 bytes so you can't remove them with unicode range or something like this. But you can select every character with 4 bytes:
function to_unicode($text) {
$str = preg_replace_callback(
"%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs",
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
},
$text
);
return $str;
}
echo to_unicode( 'hello world ' );
output is hello world U+1F600
How it's working
First of all, you have to check 4 bytes characters with regex:
%(?:\xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2})%xs
and using preg_replace_callback
function.
then use a callback function to encode selected character:
function($emoji){
$emojiStr = mb_convert_encoding($emoji[0], 'UTF-32', 'UTF-8');
return strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emojiStr)));
}
resources:
Detect emoji (stackoverflow)
What is emoji?
Related Topics
How to Make Dot Match Newline Characters Using Regular Expressions
How to Make This Preg_Match Case Insensitive
Sort and Display Directory List Alphabetically Using Opendir() in PHP
Detect Exact Os Version from Browser
PHPunit Assert That an Exception Was Thrown
PHP Substring Extraction. Get the String Before the First '/' or the Whole String
Error Logging, in a Smooth Way
Stop People Uploading Malicious PHP Files Via Forms
Shortcomings of MySQL_Real_Escape_String
Insert Query on Page Load, Inserts Twice
How to Convert Many Statement MySQL to Laravel Eloquent
Create a Folder If It Doesn't Already Exist
Upstream Sent Too Big Header While Reading Response Header from Upstream