How to Know If Two Emojis Will Be Displayed as One Emoji

How to know if two emojis will be displayed as one emoji?

Update for Swift 4 (Xcode 9)

As of Swift 4, a "Emoji sequence" is treated as a single grapheme
cluster (according to the Unicode 9 standard):

let s = "ab‍❤️‍‍br>print(s.count) // 4

so the other workarounds are not needed anymore.


(Old answer for Swift 3 and earlier:)

A possible option is to enumerate and count the
"composed character sequences" in the string:

let s = "ab‍❤️‍‍br>var count = 0
s.enumerateSubstringsInRange(s.startIndex.. options: .ByComposedCharacterSequences) {
(char, _, _, _) in
if let char = char {
count += 1
}
}
print(count) // 4

Another option is to find the range of the composed character
sequences at a given index:

let s = "‍❤️‍‍br>if s.rangeOfComposedCharacterSequenceAtIndex(s.startIndex) == s.characters.indices {
print("This is a single composed character")
}

As String extension methods:

// Swift 2.2:
extension String {
var composedCharacterCount: Int {
var count = 0
enumerateSubstringsInRange(characters.indices, options: .ByComposedCharacterSequences) {
(_, _, _, _) in count += 1
}
return count
}

var isSingleComposedCharacter: Bool {
return rangeOfComposedCharacterSequenceAtIndex(startIndex) == characters.indices
}
}

// Swift 3:
extension String {
var composedCharacterCount: Int {
var count = 0
enumerateSubstrings(in: startIndex.. (_, _, _, _) in count += 1
}
return count
}

var isSingleComposedCharacter: Bool {
return rangeOfComposedCharacterSequence(at: startIndex) == startIndex.. }
}

Examples:

".composedCharacterCount // 1
".characters.count // 2

"‍❤️‍‍.composedCharacterCount // 1
"‍❤️‍‍.characters.count // 4

".composedCharacterCount // 2
".characters.count // 1

As you see, the number of Swift characters (extended grapheme clusters) can be more or less than
the number of composed character sequences.

How can you verify a multiple code point emoji is supported?

According to Emoji 14.0's data files, an emoji is either a basic emoji, a keycap sequence, a flag, a modifier sequence, or a ZWJ sequence. In each of those cases, there will be at least one code point in the sequence for which isEmoji returns true, and the sequence will form a single glyph.

So, you should make a string out of the unicode scalars first:

let scalars = [UnicodeScalar(0x1F415)!, UnicodeScalar(0x200D)!, UnicodeScalar(0x1F9BA)!]
let scalarView = String.UnicodeScalarView(scalars)
let string = String(scalarView)

Then, you can check if it is a emoji like this:

CTLineGetGlyphCount(CTLineCreateWithAttributedString(
NSAttributedString(string: string)
)) == 1 &&
string.unicodeScalars.contains { $0.properties.isEmoji }

Alternatively, since you just want to check if the emoji can be displayed properly, you can use CTFontGetGlyphsForCharacters to see if Apple Color Emoji supports the characters.

let font = UIFont(name: "AppleColorEmoji", size: 20)! as CTFont
var text = Array(string.utf16)
var glyphs = Array(repeating: 0 as CGGlyph, count: text.count)
let isEmoji = CTFontGetGlyphsForCharacters(font, &text, &glyphs, text.count) &&
CTLineGetGlyphCount(CTLineCreateWithAttributedString(
NSAttributedString(string: string)
)) == 1

Note that both of these methods will return false positives (non-emojis like ASCII letters being reported as emojis), but will not return false negatives.

Find out if Character in String is emoji?

What I stumbled upon is the difference between characters, unicode scalars and glyphs.

For example, the glyph ‍‍‍ consists of 7 unicode scalars:

  • Four emoji characters: /li>
  • In between each emoji is a special character, which works like character glue; see the specs for more info

Another example, the glyph consists of 2 unicode scalars:

  • The regular emoji: /li>
  • A skin tone modifier: /li>

Last one, the glyph 1️⃣ contains three unicode characters:

  • The digit one: 1
  • The variation selector
  • The Combining Enclosing Keycap:

So when rendering the characters, the resulting glyphs really matter.

Swift 5.0 and above makes this process much easier and gets rid of some guesswork we needed to do. Unicode.Scalar's new Property type helps is determine what we're dealing with.
However, those properties only make sense when checking the other scalars within the glyph. This is why we'll be adding some convenience methods to the Character class to help us out.

For more detail, I wrote an article explaining how this works.

For Swift 5.0, this leaves you with the following result:

extension Character {
/// A simple emoji is one scalar and presented to the user as an Emoji
var isSimpleEmoji: Bool {
guard let firstScalar = unicodeScalars.first else { return false }
return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
}

/// Checks if the scalars will be merged into an emoji
var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }

var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}

extension String {
var isSingleEmoji: Bool { count == 1 && containsEmoji }

var containsEmoji: Bool { contains { $0.isEmoji } }

var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } }

var emojiString: String { emojis.map { String($0) }.reduce("", +) }

var emojis: [Character] { filter { $0.isEmoji } }

var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }
}

Which will give you the following results:

"A̛͚̖".containsEmoji // false
"3".containsEmoji // false
"A̛͚̖▶️".unicodeScalars // [65, 795, 858, 790, 9654, 65039]
"A̛͚̖▶️".emojiScalars // [9654, 65039]
"3️⃣".isSingleEmoji // true
"3️⃣".emojiScalars // [51, 65039, 8419]
".isSingleEmoji // true
"‍♂️".isSingleEmoji // true
".isSingleEmoji // true
"⏰".isSingleEmoji // true
".isSingleEmoji // true
"‍‍‍.isSingleEmoji // true
".isSingleEmoji // true
".containsOnlyEmoji // true
"‍‍‍.containsOnlyEmoji // true
"Hello ‍‍‍.containsOnlyEmoji // false
"Hello ‍‍‍.containsEmoji // true
" Héllo ‍‍‍.emojiString // "‍‍‍br>"‍‍‍.count // 1

" Héllœ ‍‍‍.emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]
" Héllœ ‍‍‍.emojis // [", "‍‍‍]
" Héllœ ‍‍‍.emojis.count // 2

"‍‍‍‍‍.isSingleEmoji // false
"‍‍‍‍‍.containsOnlyEmoji // true

For older Swift versions, check out this gist containing my old code.

Two different eye emojis?

f0 9f 91 80 is the UTF-8 encoded form of codepoint U+1F440.

f0 9f 91 81 is the UTF-8 encoded form of codepoint U+1F441.

f0 9f 91 81 ef b8 8f is the UTF-8 encoded form of codepoints U+1F441 U+FE0F.

U+FE0F is a Variation Selector:

Variation Selectors is a Unicode block containing 16 Variation Selector format characters (designated VS1 through VS16). They are used to specify a specific glyph variant for a Unicode character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1, VS15 and VS16 have been defined.

Where U+FE0F is VARIATION SELECTOR-16:

U+FE0F was added to Unicode in version 3.2 (2002). It belongs to the block Variation Selectors in the Basic Multilingual Plane.

This character is a Nonspacing Mark and inherits its script property from the preceding character.

The glyph is not a composition. It has a Ambiguous East Asian Width. In bidirectional context it acts as Nonspacing Mark and is not mirrored. In text U+FE0F behaves as Combining Mark regarding line breaks. It has type Extend for sentence and Extend for word breaks. The Grapheme Cluster Break is Extend.

This codepoint may change the appearance of the preceding character. If that is a symbol, dingbat or emoji, U+FE0F forces it to be rendered as a colorful image as compared to a monochrome text variant. The Unicode standard defines some standardized variants. See also “Unicode symbol as text or emoji” for a discussion of this codepoint.

In other words, U+FE0F tells VS-aware software to render U+1F441 as a colorful emoji instead of as monochromatic text.

Shown emojis are different to them chosen, why?

The TextView you must be creating in the xml should be EmojiconTextView instead of normal TextView. Same way make your EditText as EmojiconEditText so that it can read the emoji as well.

If you want to understand why this is happening, then the answer is, Emojis are nothing but unicodes. Your TextViews and EditTexts needs to be taught how to read them. If you use normal EditText, it will read it the Android way, if you use EmojiconEditText, it will read it the library way.

Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?

This has to do with how the String type works in Swift, and how the contains(_:) method works.

The '‍‍‍ ' is what's known as an emoji sequence, which is rendered as one visible character in a string. The sequence is made up of Character objects, and at the same time it is made up of UnicodeScalar objects.

If you check the character count of the string, you'll see that it is made up of four characters, while if you check the unicode scalar count, it will show you a different result:

print("‍‍‍.characters.count)     // 4
print("‍‍‍.unicodeScalars.count) // 7

Now, if you parse through the characters and print them, you'll see what seems like normal characters, but in fact the three first characters contain both an emoji as well as a zero-width joiner in their UnicodeScalarView:

for char in "‍‍‍.characters {
print(char)

let scalars = String(char).unicodeScalars.map({ String($0.value, radix: 16) })
print(scalars)
}

// ‍
// ["1f469", "200d"]
// ‍
// ["1f469", "200d"]
// ‍
// ["1f467", "200d"]
// br>// ["1f466"]

As you can see, only the last character does not contain a zero-width joiner, so when using the contains(_:) method, it works as you'd expect. Since you aren't comparing against emoji containing zero-width joiners, the method won't find a match for any but the last character.

To expand on this, if you create a String which is composed of an emoji character ending with a zero-width joiner, and pass it to the contains(_:) method, it will also evaluate to false. This has to do with contains(_:) being the exact same as range(of:) != nil, which tries to find an exact match to the given argument. Since characters ending with a zero-width joiner form an incomplete sequence, the method tries to find a match for the argument while combining characters ending with a zero-width joiners into a complete sequence. This means that the method won't ever find a match if:

  1. the argument ends with a zero-width joiner, and
  2. the string to parse doesn't contain an incomplete sequence (i.e. ending with a zero-width joiner and not followed by a compatible character).

To demonstrate:

let s = "\u{1f469}\u{200d}\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}" // ‍‍‍br>
s.range(of: "\u{1f469}\u{200d}") != nil // false
s.range(of: "\u{1f469}\u{200d}\u{1f469}") != nil // false

However, since the comparison only looks ahead, you can find several other complete sequences within the string by working backwards:

s.range(of: "\u{1f466}") != nil                                    // true
s.range(of: "\u{1f467}\u{200d}\u{1f466}") != nil // true
s.range(of: "\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}") != nil // true

// Same as the above:
s.contains("\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}") // true

The easiest solution would be to provide a specific compare option to the range(of:options:range:locale:) method. The option String.CompareOptions.literal performs the comparison on an exact character-by-character equivalence. As a side note, what's meant by character here is not the Swift Character, but the UTF-16 representation of both the instance and comparison string – however, since String doesn't allow malformed UTF-16, this is essentially equivalent to comparing the Unicode scalar representation.

Here I've overloaded the Foundation method, so if you need the original one, rename this one or something:

extension String {
func contains(_ string: String) -> Bool {
return self.range(of: string, options: String.CompareOptions.literal) != nil
}
}

Now the method works as it "should" with each character, even with incomplete sequences:

s.contains(")          // true
s.contains("\u{200d}") // true
s.contains("\u{200d}") // true


Related Topics



Leave a reply



Submit