Which Swift character count should I use when interacting with NSString APIs?
TL;DR
The documentation for NSString.length specifies:
The number of UTF-16 code units in the receiver.
Thus, if you want to interop between String and NSString:
- You should use
string.utf16.count
, and it will match up perfectly with(string as NSString).length
.
If you want to count the number of visible characters:
You should use
string.count
, and it will match up to the same number of times you need the → (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.
Here's a Swift 4.0 playground to test a bunch of strings and functions:
let header = "NSString .utf16❔ encodedOffset❔ NSRange❔ .count❔ .characters❔ distance❔ .unicodeScalars❔ .utf8❔ Description"
var format = " %3d %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly
print(header)
test("")
test("abc")
test("❌")
test(")
test("☾test")
test(")
test("\u{200d}\u{200d}\u{200d})
test(")
test("\u{1F468}")
test("♀️♂️)
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")
func test(_ s: String, _ description: String? = nil) {
func icon(for length: Int) -> String {
return length == (s as NSString).length ? "✅" : "❌"
}
let description = description ?? "'" + s + "'"
let string = String(
format: format,
(s as NSString).length,
s.utf16.count, icon(for: s.utf16.count),
s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
s.count, icon(for: s.count),
s.characters.count, icon(for: s.characters.count),
s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
s.utf8.count, icon(for: s.utf8.count),
description)
print(string)
}
And here is the output:
NSString .utf16❔ encodedOffset❔ NSRange❔ .count❔ .characters❔ distance❔ .unicodeScalars❔ .utf8❔ Description
0 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ ''
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 'abc'
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 3 ❌ '❌'
4 4 ✅ 4 ✅ 4 ✅ 1 ❌ 1 ❌ 1 ❌ 2 ❌ 8 ❌ ''
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 7 ❌ '☾test'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ ''
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ ''
8 8 ✅ 8 ✅ 8 ✅ 4 ❌ 4 ❌ 4 ❌ 4 ❌ 16 ❌ ''
2 2 ✅ 2 ✅ 2 ✅ 1 ❌ 1 ❌ 1 ❌ 1 ❌ 4 ❌ ''
58 58 ✅ 58 ✅ 58 ✅ 13 ❌ 13 ❌ 13 ❌ 32 ❌ 122 ❌ '♀️♂️'
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 9 ❌ '你好吗'
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 10 ❌ Arabic word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Arabic letter
4 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 8 ❌ Hebrew word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Hebrew letter
Conclusions:
- To get a length that is compatible with NSString/NSRange, use either
(s as NSString).length
,s.utf16.count
(preferred),s.endIndex.encodedOffset
, orNSRange(s.startIndex..<s.endIndex, in: s)
. - To get the number of visible characters, use either
s.count
(preferred),s.characters.count
(deprecated), ors.distance(from: s.startIndex, to: s.endIndex)
A helpful extension to get the full range of a String:
public extension String {
var nsrange: NSRange {
return NSRange(startIndex..<endIndex, in: self)
}
}
Thus, you can call the original method like so:
replace(", characterAtIndex: ".utf16.count - 1) // �!
Which Swift character count should I use with remove(first:)?
removeFirst(_:)
is a method of the RangeReplaceableCollection
protocol, and for any collection, count
gives the number of its elements. So for any instance a
of a RangeReplaceableCollection
type,
the argument k
passed to
a.removeFirst(k)
must be greater than or equal to zero, and less than or equal to a.count
.
This applies to Array
, String
(which is a collection of Character
)
and all other “range replaceable collection” types:
// Array:
var arr = [1, 2, 3, 4]
arr.removeFirst(k) // 0 <= k <= arr.count
// String:
var str = "br>str.removeFirst(k) // 0 <= k <= str.count
// Unicode scalar view of a string:
var u = ".unicodeScalars
u.removeFirst(k) // 0 <= k <= u.count
Having said that, I would implement the method as
extension String {
func removingPrefix(_ prefix: String) -> String? {
guard let range = range(of: prefix, options: .anchored) else {
return nil
}
return String(self[range.upperBound...])
}
}
to avoid unnecessary index/offset conversions.
What is the correct length: argument to provide to NSRange for NSRegularExpression using a (Swift) String?
The utf16 count is correct, not the utf8 count. Or, best, use the convenience initializers, which convert a Range
of String.Index
to a NSRange
:
let range = NSRange(str.startIndex..., in: str)
And to convert NSRange
to String.Range
:
let range = Range(nsRange, in: str)
Thus, putting that together:
let str = "#tweak #wow #gaming"
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
let nsRange = NSRange(str.startIndex..., in: str)
let strings = regex.matches(in: str, range: nsRange).compactMap {
Range($0.range, in: str).map { str[$0] }
}
print(strings)
}
See WWDC 2017 Efficient Interactions with Frameworks, which talks about (a) our historical use of UTF16 when dealing with ranges; and (b) the fact that we don’t have to do that any more.
Swift - which types to use? NSString or String
You should use the Swift native types whenever possible. The language is optimized to use them, and most of the functionality is bridged between the native types and the Foundation
types.
While String
and NSString
are mostly interchangeable, i.e, you can pass String
variables into methods that take NSString
parameters and vice versa, some methods seem to not be automatically bridged as of this moment. See this answer for a discussion on how to get the a String's length and this answer for a discussion on using containsString()
to check for substrings. (Disclaimer: I'm the author for both of these answers)
I haven't fully explored other data types, but I assume some version of what was stated above will also hold true for Array
/NSArray
, Dictionary
/NSDictionary
, and the various number types in Swift and NSNumber
Whenever you need to use one of the Foundation types, you can either use them to type variables/constants explicitly, as in var str: NSString = "An NSString"
or use bridgeToObjectiveC()
on an existing variable/constant of a Swift type, as in str.bridgeToObjectiveC().length
for example. You can also cast a String
to an NSString
by using str as NSString
.
However, the necessity for these techniques to explicitly use the Foundation types, or at least some of them, may be obsolete in the future, since from what is stated in the language reference, the String
/NSString
bridge, for example, should be completely seamless.
For a thorough discussion on the subject, refer to Using Swift with Cocoa and Objective-C: Working with Cocoa Data Types
Number of occurrences of a substring in an NSString?
This isn't tested, but should be a good start.
NSUInteger count = 0, length = [str length];
NSRange range = NSMakeRange(0, length);
while(range.location != NSNotFound)
{
range = [str rangeOfString: @"cake" options:0 range:range];
if(range.location != NSNotFound)
{
range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
count++;
}
}
What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?
To expand on @matt's answer a little:
The Unicode Consortium maintains certain standards for interoperation of data, and one of the most well-known standards is the Unicode string standard. This standard defines a huge list of characters and their properties, along with rules for how those characters interact with one another. (Like Matt notes: letters, emoji, combining characters [letters with diacritics, like é
, etc.)
Swift strings being "Unicode-correct" means that Swift strings conform to this Unicode standard, offering the same characters, rules, and interactions as any other string implementation which conforms to the same standard. These days, being the main standard that many string implementations already conform to, this largely means that Swift strings will "just work" the way that you expect.
However, along with the character definitions, Unicode also defines many rules for how to perform certain common string actions, such as uppercasing and lowercasing strings, or sorting them. These rules can be very specific, and in many cases, depend entirely on context (e.g., the locale, or the language and region the text might belong to, or be displayed in). For instance:
- Case conversion:
- In English, the uppercase form of
i
("LATIN SMALL LETTER I" in Unicode) isI
("LATIN CAPITAL LETTER I"), and vice versa - In Turkish, however, the uppercase form of
i
is actuallyİ
("LATIN CAPITAL LETTER I WITH DOT ABOVE"), and the lowercase form ofI
("LATIN CAPITAL LETTER I") isı
("LATIN SMALL LETTER DOTLESS I")
- In English, the uppercase form of
- Collation (sorting):
- In English, the letter
Å
("LATIN CAPITAL LETTER A WITH RING ABOVE") is largely considered the same as the letterA
("LATIN CAPITAL LETTER A"), just with a modifier on it. Sorted in a list, words starting withÅ
would appear along with otherA
words, but beforeB
words - In certain Scandinavian languages, however,
Å
is its own letter, distinct fromA
. In Danish and Norwegian,Å
comes at the end of the alphabet:... X, Y, Z, Æ, Ø, Å
. In Swedish and Finnish, the alphabet ends with:... X, Y, Z, Å, Ä, Ö
. For these languages, words starting withÅ
would come afterZ
words in a list
- In English, the letter
In order to perform many string operations in a way that makes sense to users in various languages, those operations need to be performed within the context of their language and locale.
In the context of the documentation's description, "locale-insensitive" means that Swift strings do not offer locale-specific rules like these, and default to Unicode's default case conversion, case folding, and collation rules (effectively: English). So, in contexts where correct handling of these are needed (e.g. you are writing a localized app), you'll want to use the Foundation extensions to String methods which do take a Locale
for correct handling:
localizedUppercase
/uppercased(with locale: Locale?)
over justuppercased()
localizedLowercase
/lowercased(with locale: Locale?)
over justlowercased()
localizedStandardCompare(_:)
/compare(_:options:range:locale:)
over just<
among others.
Extension to Test if NSString is Numeric in Swift
You would need to check if the length is greater than 0 and if the range location is equal to NSNotFound:
extension NSString {
var isNumber: Bool {
return length > 0 && rangeOfCharacter(from: CharacterSet.decimalDigits.inverted).location == NSNotFound
}
}
("1" as NSString).isNumber // true
("a" as NSString).isNumber // false
Converting from Swift string to const char*
Not sure why but this code is working. This passes a string to a C function expecting a const char* which seems to be the same as a unsafePointer.
internal func setupArtnode(ipAddress:String) -> NSInteger{
let cString = self.ipAddress.cString(using: String.defaultCStringEncoding)!
let newString:String = NSString(bytes: cString, length: Int(ipAddress.characters.count), encoding:String.Encoding.ascii.rawValue)! as String
let key2Pointer = UnsafePointer<Int8>(newString)
node = artnet_new(key2Pointer, Int32(verbose)) // VERBOSE : true(1) , false(0)
...
Related Topics
Swift 3 Get Start Index (As Int) of Substring
Insertion-Order Dictionary (Like Java's Linkedhashmap) in Swift
Swiftui .Rotationeffect() Framing and Offsetting
Does Kotlin Has Extension Class to Interface Like Swift
Uisplitviewcontroller Displaymodebuttonitem()
How to Connect Localhost (With Invalid Certificate) Using Alamofire
Subclass Nsapplication in Swift
Is There Any Way of Locking an Object in Swift Like in C#
Swift Error: Missing Argument Label 'Name:' in Call
Reversing the Order of a String Value
How to Detect a 'Click' Gesture in Swiftui Tvos
Nsnumberformatter:Show 'K' Instead of ',000' in Large Numbers
Swift Why Strcmp of Backspace Returns -92
Swift Package Manager - Type 'Bundle' Has No Member "Module" Error
Checking When a Date Has Passed - Swift
How to Read and Write Data to a Text File in Swift Using Playground