Swift Countelements() Return Incorrect Value When Count Flag Emoji

Swift countElements() return incorrect value when count flag emoji

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9
standard:

let str1 = "br>print(str1.count) // 5
print(Array(str1)) // [", ", ", ", "]

Also String is a collection of its characters (again), so one can
obtain the character count with str1.count.


(Old answer for Swift 3 and older:)

From "3 Grapheme Cluster Boundaries"
in the "Standard Annex #29 UNICODE TEXT SEGMENTATION":
(emphasis added):

A legacy grapheme cluster is defined as a base (such as A or カ)
followed by zero or more continuing characters. One way to think of
this is as a sequence of characters that form a “stack”.

The base can be single characters, or be any sequence of Hangul Jamo
characters that form a Hangul Syllable, as defined by D133 in The
Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji
national flag symbols corresponding to ISO country codes. Sequences of
more than two RI characters should be separated by other characters,
such as U+200B ZWSP.

(Thanks to @rintaro for the link).

A Swift Character represents an extended grapheme cluster, so it is (according
to this reference) correct that any sequence of regional indicator symbols
is counted as a single character.

You can separate the "flags" by a ZERO WIDTH NON-JOINER:

let str1 = "\u{200C}br>print(str1.characters.count) // 2

or insert a ZERO WIDTH SPACE:

let str2 = "\u{200B}br>print(str2.characters.count) // 3

This solves also possible ambiguities, e.g. should "​​​br>be "​​ or "​ ?

See also How to know if two emojis will be displayed as one emoji? about a possible method
to count the number of "composed characters" in a Swift string,
which would return 5 for your let str1 = "/code>.

How to remove a flag from the end of a string?

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) flags (i.e. pairs of regional
indicators) are treated as a single grapheme cluster, as mandated by
the Unicode 9 standard. So counting flags and removing the last
character (wether it is a flag or not) is now as simply as:

var flags = "br>print(flags.count) // 6

flags.removeLast()
print(flags.count) // 5
print(flags) // br>

(Old answer for Swift 3 and earlier:)

There is no bug. A sequence of "Regional Indicator" characters is
a single "extended grapheme cluster", that is why

var flag = "br>print(flag.characters.count)

prints 1 (compare Swift countElements() return incorrect value when count flag emoji).

On the other hand, the above string consists of 12 Unicode scalars
( is + ), and each of them needs two UTF-16 code points.

To separate the string into "visible entities" you have to
consider "composed character sequences", compare How to know if two emojis will be displayed as one emoji?.

I do not have an elegant solution (perhaps someone has a better one).
But one option would be to separate the string into an array
of composed characters, remove elements from the array if necessary,
and then combine the strings again.

Example:

extension String {

func composedCharacters() -> [String] {
var result: [String] = []
enumerateSubstringsInRange(characters.indices, options: .ByComposedCharacterSequences) {
(subString, _, _, _) in
if let s = subString { result.append(s) }
}
return result
}
}

var flags = "br>var chars = flags.composedCharacters()
print(chars.count) // 6
chars.removeLast()
flags = chars.joinWithSeparator("")
print(flags) // br>

How to separate emojis entered (through default keyboard) on textfield

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) a "Emoji ZWJ Sequence" is
treated as a single Character as mandated by the Unicode 9 standard:

let str = "‍‍‍br>print(str.count) // 2
print(Array(str)) // ["‍‍‍, "]

Also String is a collection of its characters (again), so we can
call str.count to get the length, and Array(str) to get all
characters as an array.


(Old answer for Swift 3 and earlier)

This is only a partial answer which may help in this particular case.

"‍‍‍ is indeed a combination of four separate characters:

let str = "‍‍‍ //
print(Array(str.characters))

// Output: ["‍", "‍", "‍", ", "]

which are glued together with U+200D (ZERO WIDTH JOINER):

for c in str.unicodeScalars {
print(String(c.value, radix: 16))
}

/* Output:
1f468
200d
1f468
200d
1f467
200d
1f467
1f60d
*/

Enumerating the string with the .ByComposedCharacterSequences
options combines these characters correctly:

var chars : [String] = []
str.enumerateSubstringsInRange(str.characters.indices, options: .ByComposedCharacterSequences) {
(substring, _, _, _) -> () in
chars.append(substring!)
}
print(chars)

// Output: ["‍‍‍, "]

But there are other cases where this does not work,
e.g. the "flags" which are a sequence of "Regional Indicator
characters" (compare Swift countElements() return incorrect value when count flag emoji). With

let str = "br>

the result of the above loop is

[", "]

which is not the desired result.

The full rules are defined in "3 Grapheme Cluster Boundaries"
in the "Standard Annex #29 UNICODE TEXT SEGMENTATION" in the
Unicode standard.

String.join freezes when one of the array elements is a flag emoji

Definitely a bug in Swift. Just created a brand new single view app using Xcode 6.2, 6.4, and 7 beta 2 and all of them caused the same effect. You should file a bug report with Apple. I just did, and filing duplicate reports will increase the efficiency of response.

How to know if two emojis will be displayed as one emoji?

Update for Swift 4 (Xcode 9)

As of Swift 4, a "Emoji sequence" is treated as a single grapheme
cluster (according to the Unicode 9 standard):

let s = "ab‍❤️‍‍br>print(s.count) // 4

so the other workarounds are not needed anymore.


(Old answer for Swift 3 and earlier:)

A possible option is to enumerate and count the
"composed character sequences" in the string:

let s = "ab‍❤️‍‍br>var count = 0
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex,
options: .ByComposedCharacterSequences) {
(char, _, _, _) in
if let char = char {
count += 1
}
}
print(count) // 4

Another option is to find the range of the composed character
sequences at a given index:

let s = "‍❤️‍‍br>if s.rangeOfComposedCharacterSequenceAtIndex(s.startIndex) == s.characters.indices {
print("This is a single composed character")
}

As String extension methods:

// Swift 2.2:
extension String {
var composedCharacterCount: Int {
var count = 0
enumerateSubstringsInRange(characters.indices, options: .ByComposedCharacterSequences) {
(_, _, _, _) in count += 1
}
return count
}

var isSingleComposedCharacter: Bool {
return rangeOfComposedCharacterSequenceAtIndex(startIndex) == characters.indices
}
}

// Swift 3:
extension String {
var composedCharacterCount: Int {
var count = 0
enumerateSubstrings(in: startIndex..<endIndex, options: .byComposedCharacterSequences) {
(_, _, _, _) in count += 1
}
return count
}

var isSingleComposedCharacter: Bool {
return rangeOfComposedCharacterSequence(at: startIndex) == startIndex..<endIndex
}
}

Examples:

".composedCharacterCount // 1
".characters.count // 2

"‍❤️‍‍.composedCharacterCount // 1
"‍❤️‍‍.characters.count // 4

".composedCharacterCount // 2
".characters.count // 1

As you see, the number of Swift characters (extended grapheme clusters) can be more or less than
the number of composed character sequences.

Emoji skin-tone detect

A "hack" would be to compare the visual representation of a correct emoji (like ") and a wanna-be emoji (like ").

I've modified your code here and there to make it work:

extension String {
static let emojiSkinToneModifiers: [String] = [", ", ", ", "]

var emojiVisibleLength: Int {
var count = 0
let nsstr = self as NSString
let range = NSRange(location: 0, length: nsstr.length)
nsstr.enumerateSubstrings(in: range,
options: .byComposedCharacterSequences)
{ (_, _, _, _) in

count = count + 1
}
return count
}

var emojiUnmodified: String {
if isEmpty {
return self
}
let string = String(self.unicodeScalars.first!)
return string
}

private static let emojiReferenceSize: CGSize = {
let size = CGSize(width : CGFloat.greatestFiniteMagnitude,
height: CGFloat.greatestFiniteMagnitude)
let rect = (" as NSString).boundingRect(with: size,
options: .usesLineFragmentOrigin,
context: nil)
return rect.size
}()

var canHaveSkinToneModifier: Bool {
if isEmpty {
return false
}

let modified = self.emojiUnmodified + String.emojiSkinToneModifiers[0]

let size = (modified as NSString)
.boundingRect(with: CGSize(width : CGFloat.greatestFiniteMagnitude,
height: .greatestFiniteMagnitude),
options: .usesLineFragmentOrigin,
context: nil).size

return size == String.emojiReferenceSize
}
}

Let's try it out:

let emojis = [ ", ", " ]
for emoji in emojis {
if emoji.canHaveSkinToneModifier {
let unmodified = emoji.emojiUnmodified
print(unmodified)
for modifier in String.emojiSkinToneModifiers {
print(unmodified + modifier)
}
} else {
print(emoji)
}
print("\n")
}

And voila!

br>br>br>br>br>br>br>br>br>br>br>br>br>

Get the length of a String

As of Swift 4+

It's just:

test1.count

for reasons.

(Thanks to Martin R)

As of Swift 2:

With Swift 2, Apple has changed global functions to protocol extensions, extensions that match any type conforming to a protocol. Thus the new syntax is:

test1.characters.count

(Thanks to JohnDifool for the heads up)

As of Swift 1

Use the count characters method:

let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
println("unusualMenagerie has \(count(unusualMenagerie)) characters")
// prints "unusualMenagerie has 40 characters"

right from the Apple Swift Guide

(note, for versions of Swift earlier than 1.2, this would be countElements(unusualMenagerie) instead)

for your variable, it would be

length = count(test1) // was countElements in earlier versions of Swift

Or you can use test1.utf16count



Related Topics



Leave a reply



Submit