How Does String.Index Work in Swift

How does String.Index work in Swift

Sample Image

All of the following examples use

var str = "Hello, playground"

startIndex and endIndex

  • startIndex is the index of the first character
  • endIndex is the index after the last character.

Example

// character
str[str.startIndex] // H
str[str.endIndex] // error: after last character

// range
let range = str.startIndex..<str.endIndex
str[range] // "Hello, playground"

With Swift 4's one-sided ranges, the range can be simplified to one of the following forms.

let range = str.startIndex...
let range = ..<str.endIndex

I will use the full form in the follow examples for the sake of clarity, but for the sake of readability, you will probably want to use the one-sided ranges in your code.

after

As in: index(after: String.Index)

  • after refers to the index of the character directly after the given index.

Examples

// character
let index = str.index(after: str.startIndex)
str[index] // "e"

// range
let range = str.index(after: str.startIndex)..<str.endIndex
str[range] // "ello, playground"

before

As in: index(before: String.Index)

  • before refers to the index of the character directly before the given index.

Examples

// character
let index = str.index(before: str.endIndex)
str[index] // d

// range
let range = str.startIndex..<str.index(before: str.endIndex)
str[range] // Hello, playgroun

offsetBy

As in: index(String.Index, offsetBy: String.IndexDistance)

  • The offsetBy value can be positive or negative and starts from the given index. Although it is of the type String.IndexDistance, you can give it an Int.

Examples

// character
let index = str.index(str.startIndex, offsetBy: 7)
str[index] // p

// range
let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end
str[range] // play

limitedBy

As in: index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)

  • The limitedBy is useful for making sure that the offset does not cause the index to go out of bounds. It is a bounding index. Since it is possible for the offset to exceed the limit, this method returns an Optional. It returns nil if the index is out of bounds.

Example

// character
if let index = str.index(str.startIndex, offsetBy: 7, limitedBy: str.endIndex) {
str[index] // p
}

If the offset had been 77 instead of 7, then the if statement would have been skipped.

Why is String.Index needed?

It would be much easier to use an Int index for Strings. The reason that you have to create a new String.Index for every String is that Characters in Swift are not all the same length under the hood. A single Swift Character might be composed of one, two, or even more Unicode code points. Thus each unique String must calculate the indexes of its Characters.

It is possible to hide this complexity behind an Int index extension, but I am reluctant to do so. It is good to be reminded of what is actually happening.

How does String substring work in Swift

Sample Image

All of the following examples use

var str = "Hello, playground"

Swift 4

Strings got a pretty big overhaul in Swift 4. When you get some substring from a String now, you get a Substring type back rather than a String. Why is this? Strings are value types in Swift. That means if you use one String to make a new one, then it has to be copied over. This is good for stability (no one else is going to change it without your knowledge) but bad for efficiency.

A Substring, on the other hand, is a reference back to the original String from which it came. Here is an image from the documentation illustrating that.

No copying is needed so it is much more efficient to use. However, imagine you got a ten character Substring from a million character String. Because the Substring is referencing the String, the system would have to hold on to the entire String for as long as the Substring is around. Thus, whenever you are done manipulating your Substring, convert it to a String.

let myString = String(mySubstring)

This will copy just the substring over and the memory holding old String can be reclaimed. Substrings (as a type) are meant to be short lived.

Another big improvement in Swift 4 is that Strings are Collections (again). That means that whatever you can do to a Collection, you can do to a String (use subscripts, iterate over the characters, filter, etc).

The following examples show how to get a substring in Swift.

Getting substrings

You can get a substring from a string by using subscripts or a number of other methods (for example, prefix, suffix, split). You still need to use String.Index and not an Int index for the range, though. (See my other answer if you need help with that.)

Beginning of a string

You can use a subscript (note the Swift 4 one-sided range):

let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str[..<index] // Hello

or prefix:

let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str.prefix(upTo: index) // Hello

or even easier:

let mySubstring = str.prefix(5) // Hello

End of a string

Using subscripts:

let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str[index...] // playground

or suffix:

let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str.suffix(from: index) // playground

or even easier:

let mySubstring = str.suffix(10) // playground

Note that when using the suffix(from: index) I had to count back from the end by using -10. That is not necessary when just using suffix(x), which just takes the last x characters of a String.

Range in a string

Again we simply use subscripts here.

let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end

let mySubstring = str[range] // play

Converting Substring to String

Don't forget, when you are ready to save your substring, you should convert it to a String so that the old string's memory can be cleaned up.

let myString = String(mySubstring)

Using an Int index extension?

I'm hesitant to use an Int based index extension after reading the article Strings in Swift 3 by Airspeed Velocity and Ole Begemann. Although in Swift 4, Strings are collections, the Swift team purposely hasn't used Int indexes. It is still String.Index. This has to do with Swift Characters being composed of varying numbers of Unicode codepoints. The actual index has to be uniquely calculated for every string.

I have to say, I hope the Swift team finds a way to abstract away String.Index in the future. But until then, I am choosing to use their API. It helps me to remember that String manipulations are not just simple Int index lookups.

Why should we use String.Index instead of Int as index of Character in String?

First, you can't use Int as an index for a string. The interface requires String.Index.

Why? We are using Unicode, not ASCII. The unit for Swift strings is a Character, which is "Grapheme Cluster". A character can consist of multiple Unicode code points, and each Unicode code point can consist of 1 to 4 bytes.

Now lets say you have a string of 10 megabyte and did a search to find the substring "Wysteria". Would you want to return which character number the string starts with? If it's character 123,456 then to find the same string again, we have to start at the beginning of the string, and analyze 123,456 characters to find that substring. That is madly inefficient.

Instead we get a String.Index which is something that allows Swift to locate that substring quickly. It is most likely the byte offset, so it can be accessed very quickly.

Now adding "1" to that byte offset is nonsense, because you don't know how long the first character is. (It's quite possible that Unicode has another character that equals the ASCII 'W'). So you need to call a function that returns the index of the next character.

You can write code that returns the second Character from a string. To return the one millionth Character takes significant time. Swift doesn't allow you to do things that are enormously inefficient.

Index of a substring in a string with Swift

edit/update:

Xcode 11.4 • Swift 5.2 or later

import Foundation

extension StringProtocol {
func index<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> Index? {
range(of: string, options: options)?.lowerBound
}
func endIndex<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> Index? {
range(of: string, options: options)?.upperBound
}
func indices<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Index] {
ranges(of: string, options: options).map(\.lowerBound)
}
func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] {
var result: [Range<Index>] = []
var startIndex = self.startIndex
while startIndex < endIndex,
let range = self[startIndex...]
.range(of: string, options: options) {
result.append(range)
startIndex = range.lowerBound < range.upperBound ? range.upperBound :
index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
}

usage:

let str = "abcde"
if let index = str.index(of: "cd") {
let substring = str[..<index] // ab
let string = String(substring)
print(string) // "ab\n"
}


let str = "Hello, playground, playground, playground"
str.index(of: "play") // 7
str.endIndex(of: "play") // 11
str.indices(of: "play") // [7, 19, 31]
str.ranges(of: "play") // [{lowerBound 7, upperBound 11}, {lowerBound 19, upperBound 23}, {lowerBound 31, upperBound 35}]

case insensitive sample

let query = "Play"
let ranges = str.ranges(of: query, options: .caseInsensitive)
let matches = ranges.map { str[$0] } //
print(matches) // ["play", "play", "play"]

regular expression sample

let query = "play"
let escapedQuery = NSRegularExpression.escapedPattern(for: query)
let pattern = "\\b\(escapedQuery)\\w+" // matches any word that starts with "play" prefix

let ranges = str.ranges(of: pattern, options: .regularExpression)
let matches = ranges.map { str[$0] }

print(matches) // ["playground", "playground", "playground"]

How can I gave custom Index to String.Index working with unicode in Swift?

You can create an extension on String where you take the index as either a String.Index or an Int, then use that to subscript unicodeScalars.

extension String {
func unicodeScalarValue(at index: String.Index) -> UInt32 {
unicodeScalars[index].value
}

func unicodeScalarValue(at index: Int) -> UInt32 {
unicodeScalars[self.index(startIndex, offsetBy: index)].value
}
}

"ABC".unicodeScalarValue(at: 0)
"ABC".unicodeScalarValue(at: 1)
"ABC".unicodeScalarValue(at: 2)

Swift String.Index vs transforming the String to an Array

In a String, the byte representation is packed, so there's no way to know where the character boundaries are without traversing the whole string from the start.

When converting to an array, this is traversal is done once, and the result is an array of characters that are equidistantly spaced out in memory, which is what allows constant time subscripting by an Int index. Importantly, the array is preserved, so many subscripting operations can be done upon the same array, requiring only one traversal of the String's bytes, for the initial unpacking.

It is possible extend String with a subscript that indexes it by an Int, and you see it often come up on SO, but that's ill advised. The standard library programmers could have added it, but they purposely chose not to, because it obscures the fact that every indexing operation requires a separate traversal of the String's bytes, which is O(string.count). All of a sudden, innocuous code like this:

for i in string.indices {
print(string[i]) // Looks O(1), but is actually O(string.count)!
}

becomes quadratic.

Finding index of character in Swift String

You are not the only one who couldn't find the solution.

String doesn't implement RandomAccessIndexType. Probably because they enable characters with different byte lengths. That's why we have to use string.characters.count (count or countElements in Swift 1.x) to get the number of characters. That also applies to positions. The _position is probably an index into the raw array of bytes and they don't want to expose that. The String.Index is meant to protect us from accessing bytes in the middle of characters.

That means that any index you get must be created from String.startIndex or String.endIndex (String.Index implements BidirectionalIndexType). Any other indices can be created using successor or predecessor methods.

Now to help us with indices, there is a set of methods (functions in Swift 1.x):

Swift 4.x

let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let characterIndex2 = text.index(text.startIndex, offsetBy: 2)
let lastChar2 = text[characterIndex2] //will do the same as above

let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)

Swift 3.0

let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let characterIndex2 = text.characters.index(text.characters.startIndex, offsetBy: 2)
let lastChar2 = text.characters[characterIndex2] //will do the same as above

let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)

Swift 2.x

let text = "abc"
let index2 = text.startIndex.advancedBy(2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let lastChar2 = text.characters[index2] //will do the same as above

let range: Range<String.Index> = text.rangeOfString("b")!
let index: Int = text.startIndex.distanceTo(range.startIndex) //will call successor/predecessor several times until the indices match

Swift 1.x

let text = "abc"
let index2 = advance(text.startIndex, 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let range = text.rangeOfString("b")
let index: Int = distance(text.startIndex, range.startIndex) //will call succ/pred several times

Working with String.Index is cumbersome but using a wrapper to index by integers (see https://stackoverflow.com/a/25152652/669586) is dangerous because it hides the inefficiency of real indexing.

Note that Swift indexing implementation has the problem that indices/ranges created for one string cannot be reliably used for a different string, for example:

Swift 2.x

let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")!

//can randomly return a bad substring or throw an exception
let substring: String = text2[range]

//the correct solution
let intIndex: Int = text.startIndex.distanceTo(range.startIndex)
let startIndex2 = text2.startIndex.advancedBy(intIndex)
let range2 = startIndex2...startIndex2

let substring: String = text2[range2]

Swift 1.x

let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")

//can randomly return nil or a bad substring
let substring: String = text2[range]

//the correct solution
let intIndex: Int = distance(text.startIndex, range.startIndex)
let startIndex2 = advance(text2.startIndex, intIndex)
let range2 = startIndex2...startIndex2

let substring: String = text2[range2]

RangeString.Index Versus String.Index

String indices aren't integers. They're opaque objects (of type String.Index) which can be used to subscript into a String to obtain a character.

Ranges aren't limited to only Range<Int>. If you look at the declaration of Range, you can see it's generic over any Bound, so long as the Bound is Comparable (which String.Index is).

So a Range<String.Index> is just that. It's a range of string indices, and just like any other range, it has a lowerBound, and an upperBound.



Related Topics



Leave a reply



Submit