Why Are Emoji Characters Like 👩‍👩‍👧‍👦 Treated So Strangely in Swift Strings

Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?

This has to do with how the String type works in Swift, and how the contains(_:) method works.

The '‍‍‍ ' is what's known as an emoji sequence, which is rendered as one visible character in a string. The sequence is made up of Character objects, and at the same time it is made up of UnicodeScalar objects.

If you check the character count of the string, you'll see that it is made up of four characters, while if you check the unicode scalar count, it will show you a different result:

print("‍‍‍.characters.count)     // 4
print("‍‍‍.unicodeScalars.count) // 7

Now, if you parse through the characters and print them, you'll see what seems like normal characters, but in fact the three first characters contain both an emoji as well as a zero-width joiner in their UnicodeScalarView:

for char in "‍‍‍.characters {
print(char)

let scalars = String(char).unicodeScalars.map({ String($0.value, radix: 16) })
print(scalars)
}

// ‍
// ["1f469", "200d"]
// ‍
// ["1f469", "200d"]
// ‍
// ["1f467", "200d"]
// br>// ["1f466"]

As you can see, only the last character does not contain a zero-width joiner, so when using the contains(_:) method, it works as you'd expect. Since you aren't comparing against emoji containing zero-width joiners, the method won't find a match for any but the last character.

To expand on this, if you create a String which is composed of an emoji character ending with a zero-width joiner, and pass it to the contains(_:) method, it will also evaluate to false. This has to do with contains(_:) being the exact same as range(of:) != nil, which tries to find an exact match to the given argument. Since characters ending with a zero-width joiner form an incomplete sequence, the method tries to find a match for the argument while combining characters ending with a zero-width joiners into a complete sequence. This means that the method won't ever find a match if:

  1. the argument ends with a zero-width joiner, and
  2. the string to parse doesn't contain an incomplete sequence (i.e. ending with a zero-width joiner and not followed by a compatible character).

To demonstrate:

let s = "\u{1f469}\u{200d}\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}" // ‍‍‍br>
s.range(of: "\u{1f469}\u{200d}") != nil // false
s.range(of: "\u{1f469}\u{200d}\u{1f469}") != nil // false

However, since the comparison only looks ahead, you can find several other complete sequences within the string by working backwards:

s.range(of: "\u{1f466}") != nil                                    // true
s.range(of: "\u{1f467}\u{200d}\u{1f466}") != nil // true
s.range(of: "\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}") != nil // true

// Same as the above:
s.contains("\u{1f469}\u{200d}\u{1f467}\u{200d}\u{1f466}") // true

The easiest solution would be to provide a specific compare option to the range(of:options:range:locale:) method. The option String.CompareOptions.literal performs the comparison on an exact character-by-character equivalence. As a side note, what's meant by character here is not the Swift Character, but the UTF-16 representation of both the instance and comparison string – however, since String doesn't allow malformed UTF-16, this is essentially equivalent to comparing the Unicode scalar representation.

Here I've overloaded the Foundation method, so if you need the original one, rename this one or something:

extension String {
func contains(_ string: String) -> Bool {
return self.range(of: string, options: String.CompareOptions.literal) != nil
}
}

Now the method works as it "should" with each character, even with incomplete sequences:

s.contains(")          // true
s.contains("\u{200d}") // true
s.contains("\u{200d}") // true

How is the 🇩🇪 character represented in Swift strings?

let flag = "\u{1f1e9}\u{1f1ea}"

then flag is .

For more regional indicator symbols, see:

http://en.wikipedia.org/wiki/Regional_Indicator_Symbol

Swift Prevent Emoji Representation of Unicode Character

Pick a font containing the glyph that you want, like Lucida Grande or Menlo.

Swift String contain , or . but not both

You can use SetAlgebra:

func validate(_ string: String) -> Bool {
let allowed = Set(",.")
return !allowed.isDisjoint(with: string) && !allowed.isSubset(of: string)
}

validate("1,2922.3") // false
validate("1,29223") // true
validate("12922.3") // true
validate("129223") // false

To explain a bit:

  • !allowed.isDisjoint(with: string) because you want to exclude strings that contain neither . and ,.
  • !allowed.isSubset(of: string) because you want to exclude strings that contain both . and ,.

If the text contains emoji, Range is nil from swift

"❣️" is a single “extended grapheme cluster”, but two UTF-16 code units:

print("❣️".count)       // 1
print("❣️".utf16.count) // 2

NSRange counts UTF-16 code units (which are the “characters” in an NSString) , therefore the correct way to create an NSRange comprising the complete range of a Swift string is

let range = NSRange(location: 0, length: testStr.utf16.count)

or better (since Swift 4):

let range = NSRange(testStr.startIndex..., in: testStr)

Explanation: In your code (simplified here)

let testStr = "❣️"
let range = NSRange(location: 0, length: testStr.count)
print(range) // {0, 1}

creates an NSRange describing a single UTF-16 code unit. This cannot be converted to a Range<String.Index> in testStr because its first Character consists of two UTF-16 code units:

let wrapRange = Range(range, in: testStr)
print(wrapRange) // nil


Related Topics



Leave a reply



Submit