How does String.Index work in Swift
All of the following examples use
var str = "Hello, playground"
startIndex
and endIndex
startIndex
is the index of the first characterendIndex
is the index after the last character.
Example
// character
str[str.startIndex] // H
str[str.endIndex] // error: after last character
// range
let range = str.startIndex..<str.endIndex
str[range] // "Hello, playground"
With Swift 4's one-sided ranges, the range can be simplified to one of the following forms.
let range = str.startIndex...
let range = ..<str.endIndex
I will use the full form in the follow examples for the sake of clarity, but for the sake of readability, you will probably want to use the one-sided ranges in your code.
after
As in: index(after: String.Index)
after
refers to the index of the character directly after the given index.
Examples
// character
let index = str.index(after: str.startIndex)
str[index] // "e"
// range
let range = str.index(after: str.startIndex)..<str.endIndex
str[range] // "ello, playground"
before
As in: index(before: String.Index)
before
refers to the index of the character directly before the given index.
Examples
// character
let index = str.index(before: str.endIndex)
str[index] // d
// range
let range = str.startIndex..<str.index(before: str.endIndex)
str[range] // Hello, playgroun
offsetBy
As in: index(String.Index, offsetBy: String.IndexDistance)
- The
offsetBy
value can be positive or negative and starts from the given index. Although it is of the typeString.IndexDistance
, you can give it anInt
.
Examples
// character
let index = str.index(str.startIndex, offsetBy: 7)
str[index] // p
// range
let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end
str[range] // play
limitedBy
As in: index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)
- The
limitedBy
is useful for making sure that the offset does not cause the index to go out of bounds. It is a bounding index. Since it is possible for the offset to exceed the limit, this method returns an Optional. It returnsnil
if the index is out of bounds.
Example
// character
if let index = str.index(str.startIndex, offsetBy: 7, limitedBy: str.endIndex) {
str[index] // p
}
If the offset had been 77
instead of 7
, then the if
statement would have been skipped.
Why is String.Index needed?
It would be much easier to use an Int
index for Strings. The reason that you have to create a new String.Index
for every String is that Characters in Swift are not all the same length under the hood. A single Swift Character might be composed of one, two, or even more Unicode code points. Thus each unique String must calculate the indexes of its Characters.
It is possible to hide this complexity behind an Int index extension, but I am reluctant to do so. It is good to be reminded of what is actually happening.
How does String substring work in Swift
All of the following examples use
var str = "Hello, playground"
Swift 4
Strings got a pretty big overhaul in Swift 4. When you get some substring from a String now, you get a Substring
type back rather than a String
. Why is this? Strings are value types in Swift. That means if you use one String to make a new one, then it has to be copied over. This is good for stability (no one else is going to change it without your knowledge) but bad for efficiency.
A Substring, on the other hand, is a reference back to the original String from which it came. Here is an image from the documentation illustrating that.
No copying is needed so it is much more efficient to use. However, imagine you got a ten character Substring from a million character String. Because the Substring is referencing the String, the system would have to hold on to the entire String for as long as the Substring is around. Thus, whenever you are done manipulating your Substring, convert it to a String.
let myString = String(mySubstring)
This will copy just the substring over and the memory holding old String can be reclaimed. Substrings (as a type) are meant to be short lived.
Another big improvement in Swift 4 is that Strings are Collections (again). That means that whatever you can do to a Collection, you can do to a String (use subscripts, iterate over the characters, filter, etc).
The following examples show how to get a substring in Swift.
Getting substrings
You can get a substring from a string by using subscripts or a number of other methods (for example, prefix
, suffix
, split
). You still need to use String.Index
and not an Int
index for the range, though. (See my other answer if you need help with that.)
Beginning of a string
You can use a subscript (note the Swift 4 one-sided range):
let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str[..<index] // Hello
or prefix
:
let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str.prefix(upTo: index) // Hello
or even easier:
let mySubstring = str.prefix(5) // Hello
End of a string
Using subscripts:
let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str[index...] // playground
or suffix
:
let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str.suffix(from: index) // playground
or even easier:
let mySubstring = str.suffix(10) // playground
Note that when using the suffix(from: index)
I had to count back from the end by using -10
. That is not necessary when just using suffix(x)
, which just takes the last x
characters of a String.
Range in a string
Again we simply use subscripts here.
let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end
let mySubstring = str[range] // play
Converting Substring
to String
Don't forget, when you are ready to save your substring, you should convert it to a String
so that the old string's memory can be cleaned up.
let myString = String(mySubstring)
Using an Int
index extension?
I'm hesitant to use an Int
based index extension after reading the article Strings in Swift 3 by Airspeed Velocity and Ole Begemann. Although in Swift 4, Strings are collections, the Swift team purposely hasn't used Int
indexes. It is still String.Index
. This has to do with Swift Characters being composed of varying numbers of Unicode codepoints. The actual index has to be uniquely calculated for every string.
I have to say, I hope the Swift team finds a way to abstract away String.Index
in the future. But until then, I am choosing to use their API. It helps me to remember that String manipulations are not just simple Int
index lookups.
Why should we use String.Index instead of Int as index of Character in String?
First, you can't use Int as an index for a string. The interface requires String.Index.
Why? We are using Unicode, not ASCII. The unit for Swift strings is a Character, which is "Grapheme Cluster". A character can consist of multiple Unicode code points, and each Unicode code point can consist of 1 to 4 bytes.
Now lets say you have a string of 10 megabyte and did a search to find the substring "Wysteria". Would you want to return which character number the string starts with? If it's character 123,456 then to find the same string again, we have to start at the beginning of the string, and analyze 123,456 characters to find that substring. That is madly inefficient.
Instead we get a String.Index which is something that allows Swift to locate that substring quickly. It is most likely the byte offset, so it can be accessed very quickly.
Now adding "1" to that byte offset is nonsense, because you don't know how long the first character is. (It's quite possible that Unicode has another character that equals the ASCII 'W'). So you need to call a function that returns the index of the next character.
You can write code that returns the second Character from a string. To return the one millionth Character takes significant time. Swift doesn't allow you to do things that are enormously inefficient.
Index of a substring in a string with Swift
edit/update:
Xcode 11.4 • Swift 5.2 or later
import Foundation
extension StringProtocol {
func index<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> Index? {
range(of: string, options: options)?.lowerBound
}
func endIndex<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> Index? {
range(of: string, options: options)?.upperBound
}
func indices<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Index] {
ranges(of: string, options: options).map(\.lowerBound)
}
func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] {
var result: [Range<Index>] = []
var startIndex = self.startIndex
while startIndex < endIndex,
let range = self[startIndex...]
.range(of: string, options: options) {
result.append(range)
startIndex = range.lowerBound < range.upperBound ? range.upperBound :
index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
}
usage:
let str = "abcde"
if let index = str.index(of: "cd") {
let substring = str[..<index] // ab
let string = String(substring)
print(string) // "ab\n"
}
let str = "Hello, playground, playground, playground"
str.index(of: "play") // 7
str.endIndex(of: "play") // 11
str.indices(of: "play") // [7, 19, 31]
str.ranges(of: "play") // [{lowerBound 7, upperBound 11}, {lowerBound 19, upperBound 23}, {lowerBound 31, upperBound 35}]
case insensitive sample
let query = "Play"
let ranges = str.ranges(of: query, options: .caseInsensitive)
let matches = ranges.map { str[$0] } //
print(matches) // ["play", "play", "play"]
regular expression sample
let query = "play"
let escapedQuery = NSRegularExpression.escapedPattern(for: query)
let pattern = "\\b\(escapedQuery)\\w+" // matches any word that starts with "play" prefix
let ranges = str.ranges(of: pattern, options: .regularExpression)
let matches = ranges.map { str[$0] }
print(matches) // ["playground", "playground", "playground"]
How can I gave custom Index to String.Index working with unicode in Swift?
You can create an extension on String
where you take the index as either a String.Index
or an Int
, then use that to subscript unicodeScalars
.
extension String {
func unicodeScalarValue(at index: String.Index) -> UInt32 {
unicodeScalars[index].value
}
func unicodeScalarValue(at index: Int) -> UInt32 {
unicodeScalars[self.index(startIndex, offsetBy: index)].value
}
}
"ABC".unicodeScalarValue(at: 0)
"ABC".unicodeScalarValue(at: 1)
"ABC".unicodeScalarValue(at: 2)
Swift String.Index vs transforming the String to an Array
In a String, the byte representation is packed, so there's no way to know where the character boundaries are without traversing the whole string from the start.
When converting to an array, this is traversal is done once, and the result is an array of characters that are equidistantly spaced out in memory, which is what allows constant time subscripting by an Int
index. Importantly, the array is preserved, so many subscripting operations can be done upon the same array, requiring only one traversal of the String's bytes, for the initial unpacking.
It is possible extend String with a subscript that indexes it by an Int
, and you see it often come up on SO, but that's ill advised. The standard library programmers could have added it, but they purposely chose not to, because it obscures the fact that every indexing operation requires a separate traversal of the String's bytes, which is O(string.count)
. All of a sudden, innocuous code like this:
for i in string.indices {
print(string[i]) // Looks O(1), but is actually O(string.count)!
}
becomes quadratic.
Finding index of character in Swift String
You are not the only one who couldn't find the solution.
String
doesn't implement RandomAccessIndexType
. Probably because they enable characters with different byte lengths. That's why we have to use string.characters.count
(count
or countElements
in Swift 1.x) to get the number of characters. That also applies to positions. The _position
is probably an index into the raw array of bytes and they don't want to expose that. The String.Index
is meant to protect us from accessing bytes in the middle of characters.
That means that any index you get must be created from String.startIndex
or String.endIndex
(String.Index
implements BidirectionalIndexType
). Any other indices can be created using successor
or predecessor
methods.
Now to help us with indices, there is a set of methods (functions in Swift 1.x):
Swift 4.x
let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let characterIndex2 = text.index(text.startIndex, offsetBy: 2)
let lastChar2 = text[characterIndex2] //will do the same as above
let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)
Swift 3.0
let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let characterIndex2 = text.characters.index(text.characters.startIndex, offsetBy: 2)
let lastChar2 = text.characters[characterIndex2] //will do the same as above
let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)
Swift 2.x
let text = "abc"
let index2 = text.startIndex.advancedBy(2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let lastChar2 = text.characters[index2] //will do the same as above
let range: Range<String.Index> = text.rangeOfString("b")!
let index: Int = text.startIndex.distanceTo(range.startIndex) //will call successor/predecessor several times until the indices match
Swift 1.x
let text = "abc"
let index2 = advance(text.startIndex, 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let range = text.rangeOfString("b")
let index: Int = distance(text.startIndex, range.startIndex) //will call succ/pred several times
Working with String.Index
is cumbersome but using a wrapper to index by integers (see https://stackoverflow.com/a/25152652/669586) is dangerous because it hides the inefficiency of real indexing.
Note that Swift indexing implementation has the problem that indices/ranges created for one string cannot be reliably used for a different string, for example:
Swift 2.x
let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")!
//can randomly return a bad substring or throw an exception
let substring: String = text2[range]
//the correct solution
let intIndex: Int = text.startIndex.distanceTo(range.startIndex)
let startIndex2 = text2.startIndex.advancedBy(intIndex)
let range2 = startIndex2...startIndex2
let substring: String = text2[range2]
Swift 1.x
let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")
//can randomly return nil or a bad substring
let substring: String = text2[range]
//the correct solution
let intIndex: Int = distance(text.startIndex, range.startIndex)
let startIndex2 = advance(text2.startIndex, intIndex)
let range2 = startIndex2...startIndex2
let substring: String = text2[range2]
RangeString.Index Versus String.Index
String indices aren't integers. They're opaque objects (of type String.Index
) which can be used to subscript into a String to obtain a character.
Ranges aren't limited to only Range<Int>
. If you look at the declaration of Range
, you can see it's generic over any Bound
, so long as the Bound
is Comparable
(which String.Index
is).
So a Range<String.Index>
is just that. It's a range of string indices, and just like any other range, it has a lowerBound
, and an upperBound
.
Related Topics
How to Provide a Localized Description With an Error Type in Swift
Segue and Button Programmatically Swift
Accessing an Enumeration Association Value in Swift
How to Generate a Random Number in a Range (10...20) Using Swift
How to Make a Random Color With Swift
What Does a Module Mean in Swift
What Sorting Algorithm Does Swift Implement For Its Standard Library
Swift Compiler Segmentation Fault When Building
Downcasting Optionals in Swift: As? Type, or As! Type
In Swift 4, How to Remove a Block-Based Kvo Observer
How to Resolve: 'Keywindow' Was Deprecated in iOS 13.0
How to Compare Enum With Associated Values by Ignoring Its Associated Value in Swift
How to Compare Two Dictionaries in Swift
Arkit - What Do the Different Columns in Transform Matrix Represent