Get Description of Emoji Character

Get Description of Emoji Character

The Core Foundation function CFStringTransform() has transformations that
determine the Unicode standard name for special characters. Example:

let c : Character = "br>
let cfstr = NSMutableString(string: String(c)) as CFMutableString
var range = CFRangeMake(0, CFStringGetLength(cfstr))
CFStringTransform(cfstr, &range, kCFStringTransformToUnicodeName, false)
print(cfstr)

Output:

\N{SMILING FACE WITH OPEN MOUTH AND SMILING EYES}

See http://nshipster.com/cfstringtransform/ for more information about
CFStringTransform().

How to replace emoji characters with their descriptions in a Swift string

Simply do not use a Character in the first place but use a String as input:

let cfstr = NSMutableString(string: "This is my string ) as CFMutableString

that will finally output

This {SMILING FACE WITH OPEN MOUTH AND SMILING EYES} is my string {SMILING FACE WITH OPEN MOUTH AND SMILING EYES}

Put together:

func transformUnicode(input : String) -> String {
let cfstr = NSMutableString(string: input) as CFMutableString
var range = CFRangeMake(0, CFStringGetLength(cfstr))
CFStringTransform(cfstr, &range, kCFStringTransformToUnicodeName, Bool(0))
let newStr = "\(cfstr)"
return newStr.stringByReplacingOccurrencesOfString("\\N", withString:"")
}

transformUnicode("This is my string )

Find out if Character in String is emoji?

What I stumbled upon is the difference between characters, unicode scalars and glyphs.

For example, the glyph ‍‍‍ consists of 7 unicode scalars:

  • Four emoji characters: /li>
  • In between each emoji is a special character, which works like character glue; see the specs for more info

Another example, the glyph consists of 2 unicode scalars:

  • The regular emoji: /li>
  • A skin tone modifier: /li>

Last one, the glyph 1️⃣ contains three unicode characters:

  • The digit one: 1
  • The variation selector
  • The Combining Enclosing Keycap:

So when rendering the characters, the resulting glyphs really matter.

Swift 5.0 and above makes this process much easier and gets rid of some guesswork we needed to do. Unicode.Scalar's new Property type helps is determine what we're dealing with.
However, those properties only make sense when checking the other scalars within the glyph. This is why we'll be adding some convenience methods to the Character class to help us out.

For more detail, I wrote an article explaining how this works.

For Swift 5.0, this leaves you with the following result:

extension Character {
/// A simple emoji is one scalar and presented to the user as an Emoji
var isSimpleEmoji: Bool {
guard let firstScalar = unicodeScalars.first else { return false }
return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
}

/// Checks if the scalars will be merged into an emoji
var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }

var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}

extension String {
var isSingleEmoji: Bool { count == 1 && containsEmoji }

var containsEmoji: Bool { contains { $0.isEmoji } }

var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } }

var emojiString: String { emojis.map { String($0) }.reduce("", +) }

var emojis: [Character] { filter { $0.isEmoji } }

var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }
}

Which will give you the following results:

"A̛͚̖".containsEmoji // false
"3".containsEmoji // false
"A̛͚̖▶️".unicodeScalars // [65, 795, 858, 790, 9654, 65039]
"A̛͚̖▶️".emojiScalars // [9654, 65039]
"3️⃣".isSingleEmoji // true
"3️⃣".emojiScalars // [51, 65039, 8419]
"quot;.isSingleEmoji // true
"‍♂️".isSingleEmoji // true
"quot;.isSingleEmoji // true
"⏰".isSingleEmoji // true
"quot;.isSingleEmoji // true
"‍‍‍quot;.isSingleEmoji // true
"quot;.isSingleEmoji // true
"quot;.containsOnlyEmoji // true
"‍‍‍quot;.containsOnlyEmoji // true
"Hello ‍‍‍quot;.containsOnlyEmoji // false
"Hello ‍‍‍quot;.containsEmoji // true
" Héllo ‍‍‍quot;.emojiString // "‍‍‍quot;
"‍‍‍quot;.count // 1

" Héllœ ‍‍‍quot;.emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]
" Héllœ ‍‍‍quot;.emojis // ["quot;, "‍‍‍quot;]
" Héllœ ‍‍‍quot;.emojis.count // 2

"‍‍‍‍‍quot;.isSingleEmoji // false
"‍‍‍‍‍quot;.containsOnlyEmoji // true

For older Swift versions, check out this gist containing my old code.

How can get unicode of an emoji in ios swift

use Swift Unicode escape sequence concept:

let emojiString = "\u{1F4C4}"

and if you want to get all emoji's Unicode then try this

let emojiRanges = [
0x1F601...0x1F64F,
0x2702...0x27B0,
0x1F680...0x1F6C0,
0x1F170...0x1F251
]

for range in emojiRanges {
for i in range {
var c = String(UnicodeScalar(i))
print(c)
}
}

How to find a textual description of emoticons, unicode characters and emoji in a string (python, perl)?

perl example using charnames:

use 5.014;
use strict;
use warnings;
use utf8;
use open qw(:std :utf8);
use charnames ':full';

my @faces = split //, '';
for (@faces) {
say sprintf "U+%05X %s %s",
ord($_), $_, charnames::viacode(ord($_));
}

prints

U+1F604 SMILING FACE WITH OPEN MOUTH AND SMILING EYES
U+1F600 GRINNING FACE
U+1F608 SMILING FACE WITH HORNS

How could I get Apple emoji name instead of Unicode name?

The first one is the Unicode name, though the correct name is:

SMILING FACE WITH OPEN MOUTH AND SMILING EYES

The fact that it's uppercase matters. It's a Unicode identifier. It's permanent and it's unique. (It's really permanent, even if they misspell a word like "BRAKCET" in "PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET", that name is forever).

The second name is the "Apple Name." These are localized names. On Mac, the English version is stored in:

/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/en.lproj/AppleName.strings

You can dump this file with plutil, or read it using PropertyListDecoder.

$ plutil -p AppleName.strings
{
"〰" => "wavy dash"
"‼️" => "red double exclamation mark"
"⁉️" => "red exclamation mark and question mark"
"*️⃣" => "keycap asterisk"
"#️⃣" => "number sign"
"〽️" => "part alternation mark"
"©" => "copyright sign"
...

That said, unless you absolutely need to match Apple, I'd recommend the CLDR (Common Locale Data Repository) annotation short name. That's the Unicode source for localized names. They're not promised to be unique, though. Their biggest purpose is for supporting text-to-speech.

For the current list in XML, it's most convenient on GitHub. Or you can browse the v37 table or download the raw data.



Related Topics



Leave a reply



Submit