How to Remove Emoji from String

How do I remove emoji from string

Karol S already provided a solution, but the reason might not be clear:

"\u1F600" is actually "\u1F60" followed by "0":

"\u1F60"    # => "ὠ"
"\u1F600" # => "ὠ0"

You have to use curly braces for code points above FFFF:

"\u{1F600}" #=> "br>

Therefore the character class [\u1F600-\u1F6FF] is interpreted as [\u1F60 0-\u1F6F F], i.e. it
matches "\u1F60", the range "0".."\u1F6F" and "F".

Using curly braces solves the issue:

/[\u{1F600}-\u{1F6FF}]/

This matches (emoji) characters in these unicode blocks:

  • U+1F600..U+1F64F Emoticons
  • U+1F650..U+1F67F Ornamental Dingbats
  • U+1F680..U+1F6FF Transport and Map Symbols

You can also use unpack, pack, and between? to achieve a similar result. This also works for Ruby 1.8.7 which doesn't support Unicode in regular expressions.

s = 'Hi!'
#=> "Hi!\360\237\230\200"

s.unpack('U*').reject{ |e| e.between?(0x1F600, 0x1F6FF) }.pack('U*')
#=> "Hi!"

Regarding your Rubular example – Emoji are single characters:

".length  #=> 1
".chars #=> ["]

Whereas kaomoji are a combination of multiple characters:

"^_^".length #=> 3
"^_^".chars #=> ["^", "_", "^"]

Matching these is a very different task (and you should ask that in a separate question).

Remove emojis from string

Here is my code that I am currently using in my project to remove all emojis from a string.

function remove_emoji($string) {

// Match Emoticons
$regex_emoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clear_string = preg_replace($regex_emoticons, '', $string);

// Match Miscellaneous Symbols and Pictographs
$regex_symbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clear_string = preg_replace($regex_symbols, '', $clear_string);

// Match Transport And Map Symbols
$regex_transport = '/[\x{1F680}-\x{1F6FF}]/u';
$clear_string = preg_replace($regex_transport, '', $clear_string);

// Match Miscellaneous Symbols
$regex_misc = '/[\x{2600}-\x{26FF}]/u';
$clear_string = preg_replace($regex_misc, '', $clear_string);

// Match Dingbats
$regex_dingbats = '/[\x{2700}-\x{27BF}]/u';
$clear_string = preg_replace($regex_dingbats, '', $clear_string);

return $clear_string;
}

Remove Emoji from string in Lua

In general I see three things you could do. You're currently asking how to solve 3.

  1. prevent emojis from being entered by ignoring anything you don't want to be entered

  2. if something you don't want has been entered, deny that input with an error message

  3. remove anything you don't want from the string befor you process it

To remove something from a Lua string you can simply replace it with an empty string.
Use string.gsub and a pattern that matches all emojis.

I suggest you give this a read http://lua-users.org/wiki/LuaUnicode

How to remove emoji code using javascript?

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.

More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.

You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:

return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')

However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.

Remove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java strings

Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don't need to worry about every new emoji being added.

String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
String emotionless = aString.replaceAll(characterFilter,"");

So:

  • [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s] is a range representing all numeric (\\p{N}), letter (\\p{L}), mark (\\p{M}), punctuation (\\p{P}), whitespace/separator (\\p{Z}), other formatting (\\p{Cf}) and other characters above U+FFFF in Unicode (\\p{Cs}), and newline (\\s) characters. \\p{L} specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc.
  • The ^ in the regex character set negates the match.

Example:

String str = "hello world _# 皆さん、こんにちは! 私はジョンと申します。;
System.out.print(str.replaceAll("[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]",""));
// Output:
// "hello world _# 皆さん、こんにちは! 私はジョンと申します。"

If you need more information, check out the Java documentation for regexes.

Remove emoji from string doesn't works for some cases

check out this answer, the emoji python package seems like the best way to solve this problem.

to convert any emoji/character into UTF-8 do this:

import emoji
s = ''
print(s.encode('unicode-escape').decode('ASCII'))

it'd print \U0001f600

How to remove Emoji from string using VB

Your current regex matches any char but a line break and ASCII alphanumeric chars. It does not match emojis because VBScript ECMA-262 3rd edition based regex engine cannot match astral plane chars with a mere . pattern.

If you want to just add the emoji matching support to your current pattern, you can replace the . with (?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]) pattern and use

objRegExp.Pattern = "(?:(?![a-zA-Z0-9])(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]))+"

See the regex demo

If you just want to remove all but ASCII chars, you can use

objRegExp.Pattern = "objRegExp.Pattern = "(?:(?![ -~])[\s\S])+"

The pattern matches any one or more (+) chars ([\s\S] matches any whitespace and non-whitespace chars) that does not equal the printable ASCII chars.



Related Topics



Leave a reply



Submit