Which Swift Character Count Should I Use When Interacting with Nsstring APIs

Which Swift character count should I use when interacting with NSString APIs?


TL;DR

The documentation for NSString.length specifies:

The number of UTF-16 code units in the receiver.

Thus, if you want to interop between String and NSString:

  • You should use string.utf16.count, and it will match up perfectly with (string as NSString).length.

If you want to count the number of visible characters:

  • You should use string.count, and it will match up to the same number of times you need the (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).

    Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.


Here's a Swift 4.0 playground to test a bunch of strings and functions:

let header = "NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description"
var format = " %3d %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly

print(header)

test("")
test("abc")
test("❌")
test(")
test("☾test")
test("‍‍‍)
test("\u{200d}\u{200d}\u{200d})
test(")
test("\u{1F468}")
test("‍♀️‍♂️)
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")

func test(_ s: String, _ description: String? = nil) {
func icon(for length: Int) -> String {
return length == (s as NSString).length ? "✅" : "❌"
}

let description = description ?? "'" + s + "'"
let string = String(
format: format,
(s as NSString).length,
s.utf16.count, icon(for: s.utf16.count),
s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
s.count, icon(for: s.count),
s.characters.count, icon(for: s.characters.count),
s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
s.utf8.count, icon(for: s.utf8.count),
description)
print(string)
}

And here is the output:

NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description
0 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ ''
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 'abc'
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 3 ❌ '❌'
4 4 ✅ 4 ✅ 4 ✅ 1 ❌ 1 ❌ 1 ❌ 2 ❌ 8 ❌ ''
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 7 ❌ '☾test'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ '‍‍‍'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ '‍‍‍'
8 8 ✅ 8 ✅ 8 ✅ 4 ❌ 4 ❌ 4 ❌ 4 ❌ 16 ❌ ''
2 2 ✅ 2 ✅ 2 ✅ 1 ❌ 1 ❌ 1 ❌ 1 ❌ 4 ❌ ''
58 58 ✅ 58 ✅ 58 ✅ 13 ❌ 13 ❌ 13 ❌ 32 ❌ 122 ❌ '‍♀️‍♂️'
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 9 ❌ '你好吗'
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 10 ❌ Arabic word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Arabic letter
4 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 8 ❌ Hebrew word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Hebrew letter

Conclusions:

  • To get a length that is compatible with NSString/NSRange, use either (s as NSString).length, s.utf16.count (preferred), s.endIndex.encodedOffset, or NSRange(s.startIndex..<s.endIndex, in: s).
  • To get the number of visible characters, use either s.count (preferred), s.characters.count (deprecated), or s.distance(from: s.startIndex, to: s.endIndex)

A helpful extension to get the full range of a String:

public extension String {

var nsrange: NSRange {
return NSRange(startIndex..<endIndex, in: self)
}
}

Thus, you can call the original method like so:

replace("‍‍‍, characterAtIndex: "‍‍‍.utf16.count - 1) // ‍‍‍�!

Which Swift character count should I use with remove(first:)?

removeFirst(_:)
is a method of the RangeReplaceableCollection protocol, and for any collection, count
gives the number of its elements. So for any instance a of a RangeReplaceableCollection type,
the argument k passed to

a.removeFirst(k)

must be greater than or equal to zero, and less than or equal to a.count.

This applies to Array, String (which is a collection of Character)
and all other “range replaceable collection” types:

// Array:
var arr = [1, 2, 3, 4]
arr.removeFirst(k) // 0 <= k <= arr.count

// String:
var str = "‍‍‍br>str.removeFirst(k) // 0 <= k <= str.count

// Unicode scalar view of a string:
var u = "‍‍‍.unicodeScalars
u.removeFirst(k) // 0 <= k <= u.count

Having said that, I would implement the method as

extension String {
func removingPrefix(_ prefix: String) -> String? {
guard let range = range(of: prefix, options: .anchored) else {
return nil
}
return String(self[range.upperBound...])
}
}

to avoid unnecessary index/offset conversions.

What is the correct length: argument to provide to NSRange for NSRegularExpression using a (Swift) String?

The utf16 count is correct, not the utf8 count. Or, best, use the convenience initializers, which convert a Range of String.Index to a NSRange:

let range = NSRange(str.startIndex..., in: str)

And to convert NSRange to String.Range:

let range = Range(nsRange, in: str)

Thus, putting that together:

let str = "#tweak #wow #gaming" 
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
let nsRange = NSRange(str.startIndex..., in: str)
let strings = regex.matches(in: str, range: nsRange).compactMap {
Range($0.range, in: str).map { str[$0] }
}
print(strings)
}

See WWDC 2017 Efficient Interactions with Frameworks, which talks about (a) our historical use of UTF16 when dealing with ranges; and (b) the fact that we don’t have to do that any more.

Swift - which types to use? NSString or String

You should use the Swift native types whenever possible. The language is optimized to use them, and most of the functionality is bridged between the native types and the Foundation types.

While String and NSString are mostly interchangeable, i.e, you can pass String variables into methods that take NSString parameters and vice versa, some methods seem to not be automatically bridged as of this moment. See this answer for a discussion on how to get the a String's length and this answer for a discussion on using containsString() to check for substrings. (Disclaimer: I'm the author for both of these answers)

I haven't fully explored other data types, but I assume some version of what was stated above will also hold true for Array/NSArray, Dictionary/NSDictionary, and the various number types in Swift and NSNumber

Whenever you need to use one of the Foundation types, you can either use them to type variables/constants explicitly, as in var str: NSString = "An NSString" or use bridgeToObjectiveC() on an existing variable/constant of a Swift type, as in str.bridgeToObjectiveC().length for example. You can also cast a String to an NSString by using str as NSString.

However, the necessity for these techniques to explicitly use the Foundation types, or at least some of them, may be obsolete in the future, since from what is stated in the language reference, the String/NSString bridge, for example, should be completely seamless.

For a thorough discussion on the subject, refer to Using Swift with Cocoa and Objective-C: Working with Cocoa Data Types

Number of occurrences of a substring in an NSString?

This isn't tested, but should be a good start.

NSUInteger count = 0, length = [str length];
NSRange range = NSMakeRange(0, length);
while(range.location != NSNotFound)
{
range = [str rangeOfString: @"cake" options:0 range:range];
if(range.location != NSNotFound)
{
range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
count++;
}
}

What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?

To expand on @matt's answer a little:

The Unicode Consortium maintains certain standards for interoperation of data, and one of the most well-known standards is the Unicode string standard. This standard defines a huge list of characters and their properties, along with rules for how those characters interact with one another. (Like Matt notes: letters, emoji, combining characters [letters with diacritics, like é, etc.)

Swift strings being "Unicode-correct" means that Swift strings conform to this Unicode standard, offering the same characters, rules, and interactions as any other string implementation which conforms to the same standard. These days, being the main standard that many string implementations already conform to, this largely means that Swift strings will "just work" the way that you expect.

However, along with the character definitions, Unicode also defines many rules for how to perform certain common string actions, such as uppercasing and lowercasing strings, or sorting them. These rules can be very specific, and in many cases, depend entirely on context (e.g., the locale, or the language and region the text might belong to, or be displayed in). For instance:

  • Case conversion:
    • In English, the uppercase form of i ("LATIN SMALL LETTER I" in Unicode) is I ("LATIN CAPITAL LETTER I"), and vice versa
    • In Turkish, however, the uppercase form of i is actually İ ("LATIN CAPITAL LETTER I WITH DOT ABOVE"), and the lowercase form of I ("LATIN CAPITAL LETTER I") is ı ("LATIN SMALL LETTER DOTLESS I")
  • Collation (sorting):
    • In English, the letter Å ("LATIN CAPITAL LETTER A WITH RING ABOVE") is largely considered the same as the letter A ("LATIN CAPITAL LETTER A"), just with a modifier on it. Sorted in a list, words starting with Å would appear along with other A words, but before B words
    • In certain Scandinavian languages, however, Å is its own letter, distinct from A. In Danish and Norwegian, Å comes at the end of the alphabet: ... X, Y, Z, Æ, Ø, Å. In Swedish and Finnish, the alphabet ends with: ... X, Y, Z, Å, Ä, Ö. For these languages, words starting with Å would come after Z words in a list

In order to perform many string operations in a way that makes sense to users in various languages, those operations need to be performed within the context of their language and locale.

In the context of the documentation's description, "locale-insensitive" means that Swift strings do not offer locale-specific rules like these, and default to Unicode's default case conversion, case folding, and collation rules (effectively: English). So, in contexts where correct handling of these are needed (e.g. you are writing a localized app), you'll want to use the Foundation extensions to String methods which do take a Locale for correct handling:

  • localizedUppercase/uppercased(with locale: Locale?) over just uppercased()
  • localizedLowercase/lowercased(with locale: Locale?) over just lowercased()
  • localizedStandardCompare(_:)/compare(_:options:range:locale:) over just <

among others.

Extension to Test if NSString is Numeric in Swift

You would need to check if the length is greater than 0 and if the range location is equal to NSNotFound:

extension NSString  {
var isNumber: Bool {
return length > 0 && rangeOfCharacter(from: CharacterSet.decimalDigits.inverted).location == NSNotFound
}
}

("1" as NSString).isNumber  // true

("a" as NSString).isNumber // false

Converting from Swift string to const char*

Not sure why but this code is working. This passes a string to a C function expecting a const char* which seems to be the same as a unsafePointer.

internal func setupArtnode(ipAddress:String) -> NSInteger{
let cString = self.ipAddress.cString(using: String.defaultCStringEncoding)!
let newString:String = NSString(bytes: cString, length: Int(ipAddress.characters.count), encoding:String.Encoding.ascii.rawValue)! as String
let key2Pointer = UnsafePointer<Int8>(newString)

node = artnet_new(key2Pointer, Int32(verbose)) // VERBOSE : true(1) , false(0)
...


Related Topics



Leave a reply



Submit