How do I remove emoji from string
Karol S already provided a solution, but the reason might not be clear:
"\u1F600"
is actually "\u1F60"
followed by "0"
:
"\u1F60" # => "ὠ"
"\u1F600" # => "ὠ0"
You have to use curly braces for code points above FFFF:
"\u{1F600}" #=> "br>
Therefore the character class [\u1F600-\u1F6FF]
is interpreted as [\u1F60 0-\u1F6F F]
, i.e. it
matches "\u1F60"
, the range "0"
.."\u1F6F"
and "F"
.
Using curly braces solves the issue:
/[\u{1F600}-\u{1F6FF}]/
This matches (emoji) characters in these unicode blocks:
- U+1F600..U+1F64F Emoticons
- U+1F650..U+1F67F Ornamental Dingbats
- U+1F680..U+1F6FF Transport and Map Symbols
You can also use unpack
, pack
, and between?
to achieve a similar result. This also works for Ruby 1.8.7 which doesn't support Unicode in regular expressions.
s = 'Hi!'
#=> "Hi!\360\237\230\200"
s.unpack('U*').reject{ |e| e.between?(0x1F600, 0x1F6FF) }.pack('U*')
#=> "Hi!"
Regarding your Rubular example – Emoji are single characters:
".length #=> 1
".chars #=> ["]
Whereas kaomoji are a combination of multiple characters:
"^_^".length #=> 3
"^_^".chars #=> ["^", "_", "^"]
Matching these is a very different task (and you should ask that in a separate question).
Remove emojis from string
Here is my code that I am currently using in my project to remove all emojis from a string.
function remove_emoji($string) {
// Match Emoticons
$regex_emoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clear_string = preg_replace($regex_emoticons, '', $string);
// Match Miscellaneous Symbols and Pictographs
$regex_symbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clear_string = preg_replace($regex_symbols, '', $clear_string);
// Match Transport And Map Symbols
$regex_transport = '/[\x{1F680}-\x{1F6FF}]/u';
$clear_string = preg_replace($regex_transport, '', $clear_string);
// Match Miscellaneous Symbols
$regex_misc = '/[\x{2600}-\x{26FF}]/u';
$clear_string = preg_replace($regex_misc, '', $clear_string);
// Match Dingbats
$regex_dingbats = '/[\x{2700}-\x{27BF}]/u';
$clear_string = preg_replace($regex_dingbats, '', $clear_string);
return $clear_string;
}
Remove Emoji from string in Lua
In general I see three things you could do. You're currently asking how to solve 3.
prevent emojis from being entered by ignoring anything you don't want to be entered
if something you don't want has been entered, deny that input with an error message
remove anything you don't want from the string befor you process it
To remove something from a Lua string you can simply replace it with an empty string.
Use string.gsub and a pattern that matches all emojis.
I suggest you give this a read http://lua-users.org/wiki/LuaUnicode
How to remove emoji code using javascript?
The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF]
in a regex engine that supported non-BMP characters, but JavaScript's RegExp
is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.
Remove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java strings
Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don't need to worry about every new emoji being added.
String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
String emotionless = aString.replaceAll(characterFilter,"");
So:
[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]
is a range representing all numeric (\\p{N}
), letter (\\p{L}
), mark (\\p{M}
), punctuation (\\p{P}
), whitespace/separator (\\p{Z}
), other formatting (\\p{Cf}
) and other characters aboveU+FFFF
in Unicode (\\p{Cs}
), and newline (\\s
) characters.\\p{L}
specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc.- The
^
in the regex character set negates the match.
Example:
String str = "hello world _# 皆さん、こんにちは! 私はジョンと申します。;
System.out.print(str.replaceAll("[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]",""));
// Output:
// "hello world _# 皆さん、こんにちは! 私はジョンと申します。"
If you need more information, check out the Java documentation for regexes.
Remove emoji from string doesn't works for some cases
check out this answer, the emoji
python package seems like the best way to solve this problem.
to convert any emoji/character into UTF-8 do this:
import emoji
s = ''
print(s.encode('unicode-escape').decode('ASCII'))
it'd print \U0001f600
How to remove Emoji from string using VB
Your current regex matches any char but a line break and ASCII alphanumeric chars. It does not match emojis because VBScript ECMA-262 3rd edition based regex engine cannot match astral plane chars with a mere .
pattern.
If you want to just add the emoji matching support to your current pattern, you can replace the .
with (?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])
pattern and use
objRegExp.Pattern = "(?:(?![a-zA-Z0-9])(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]))+"
See the regex demo
If you just want to remove all but ASCII chars, you can use
objRegExp.Pattern = "objRegExp.Pattern = "(?:(?![ -~])[\s\S])+"
The pattern matches any one or more (+
) chars ([\s\S]
matches any whitespace and non-whitespace chars) that does not equal the printable ASCII chars.
Related Topics
How to Parse Json With Ruby on Rails
Git, Heroku: Pre-Receive Hook Declined
Difference Between Rake Db:Migrate Db:Reset and Db:Schema:Load
How to Uninstall Ruby Installed by Ruby-Install
Why Use Ruby'S Attr_Accessor, Attr_Reader and Attr_Writer
Check If a String Contains Only Digits in Ruby
How to Redirect to a 404 in Rails
Difference Between ≪%, ≪%=, ≪%# and -%≫ in Erb in Rails
What Does ||= (Or-Equals) Mean in Ruby
How to Find Where a Method Is Defined At Runtime
Ssl_Connect Returned=1 Errno=0 State=Sslv3 Read Server Certificate B: Certificate Verify Failed
How to Call Methods Dynamically Based on Their Name
What Is the Easiest Way to Remove the First Character from a String