Big O of Accessing a String with an Index in Swift 3.0

Big O of accessing a string with an index in Swift 3.0

Unless documented otherwise, an implementation of Collection's subscript requirement should always have an O(1) time complexity. It's part of the contract with Collection itself.

As the documentation states (emphasis mine):

Types that conform to Collection are expected to provide the start​Index and end​Index properties and subscript access to elements as O(1) operations. Types that are not able to guarantee that expected performance must document the departure [...]

The potentially expensive part happens when you come to advance indices, such as with index(_:offsetBy:). The complexity for this is O(1) for a RandomAccessCollection, O(n) otherwise, where n is the magnitude of the offset.

String.CharacterView isn't a RandomAccessCollection, therefore advancing indices is O(n). As you say, the reason for this is that characters can have different byte lengths. But once you have the index (which internally is just an offset value for the string's unicode scalars along with the length of the given extended grapheme cluster in UTF-16 code units), you can subscript in constant time.

Therefore

for index in greeting.characters.indices {
print("\(greeting[index]) ", terminator: "")
}

is O(n). Each iteration of the loop simply advances the current index by 1, and the subscript uses the index's offset to jump to the start of the extended grapheme cluster and therefore get the character for that given index.

If however we said:

for offset in 0..<greeting.characters.count {
let index = greeting.index(greeting.startIndex, offsetBy: offset)
print("\(greeting[index]) ", terminator: "")
}

That would be O(n2), because we're now advancing the index from the start index at each iteration of the loop (not to mention doing an O(n) walk just to get the count of the characters to begin with).

What is the BigO of Swift's String.count?

It is definitely O(n). From the Swift Book:

As a result, the number of characters in a string can't be calculated without iterating through the string to determine its extended grapheme cluster boundaries. If you are working with particularly long string values, be aware that the count property must iterate over the Unicode scalars in the entire string in order to determine the characters for that string.

This has a few implications, the biggest of which is integer subscripting (i.e. str[5]) is not available through the standard library. Internally, String uses ASCII or UTF-16 encoding (from Swift 5, it uses UTF-8 only). If the string only uses ASCII characters then count can be O(1) but ASCII only has 127 characters so consider this an exception rather than the rule.

NSString, on the other hand, always uses UTF-16 so accessing its length is O(1). And also keep in mind that NSString.length != String.count (try strings with emojis and you'll see).

As for your second question, it does not cache count for subsequent calls. Every call to count is thus O(n), even if the string has not changed. The code in the Foundation repo also confirms that.

How does String substring work in Swift

Sample Image

All of the following examples use

var str = "Hello, playground"

Swift 4

Strings got a pretty big overhaul in Swift 4. When you get some substring from a String now, you get a Substring type back rather than a String. Why is this? Strings are value types in Swift. That means if you use one String to make a new one, then it has to be copied over. This is good for stability (no one else is going to change it without your knowledge) but bad for efficiency.

A Substring, on the other hand, is a reference back to the original String from which it came. Here is an image from the documentation illustrating that.

No copying is needed so it is much more efficient to use. However, imagine you got a ten character Substring from a million character String. Because the Substring is referencing the String, the system would have to hold on to the entire String for as long as the Substring is around. Thus, whenever you are done manipulating your Substring, convert it to a String.

let myString = String(mySubstring)

This will copy just the substring over and the memory holding old String can be reclaimed. Substrings (as a type) are meant to be short lived.

Another big improvement in Swift 4 is that Strings are Collections (again). That means that whatever you can do to a Collection, you can do to a String (use subscripts, iterate over the characters, filter, etc).

The following examples show how to get a substring in Swift.

Getting substrings

You can get a substring from a string by using subscripts or a number of other methods (for example, prefix, suffix, split). You still need to use String.Index and not an Int index for the range, though. (See my other answer if you need help with that.)

Beginning of a string

You can use a subscript (note the Swift 4 one-sided range):

let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str[..<index] // Hello

or prefix:

let index = str.index(str.startIndex, offsetBy: 5)
let mySubstring = str.prefix(upTo: index) // Hello

or even easier:

let mySubstring = str.prefix(5) // Hello

End of a string

Using subscripts:

let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str[index...] // playground

or suffix:

let index = str.index(str.endIndex, offsetBy: -10)
let mySubstring = str.suffix(from: index) // playground

or even easier:

let mySubstring = str.suffix(10) // playground

Note that when using the suffix(from: index) I had to count back from the end by using -10. That is not necessary when just using suffix(x), which just takes the last x characters of a String.

Range in a string

Again we simply use subscripts here.

let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end

let mySubstring = str[range] // play

Converting Substring to String

Don't forget, when you are ready to save your substring, you should convert it to a String so that the old string's memory can be cleaned up.

let myString = String(mySubstring)

Using an Int index extension?

I'm hesitant to use an Int based index extension after reading the article Strings in Swift 3 by Airspeed Velocity and Ole Begemann. Although in Swift 4, Strings are collections, the Swift team purposely hasn't used Int indexes. It is still String.Index. This has to do with Swift Characters being composed of varying numbers of Unicode codepoints. The actual index has to be uniquely calculated for every string.

I have to say, I hope the Swift team finds a way to abstract away String.Index in the future. But until then, I am choosing to use their API. It helps me to remember that String manipulations are not just simple Int index lookups.

Get nth character of a string in Swift

Attention: Please see Leo Dabus' answer for a proper implementation for Swift 4 and Swift 5.

Swift 4 or later

The Substring type was introduced in Swift 4 to make substrings
faster and more efficient by sharing storage with the original string, so that's what the subscript functions should return.

Try it out here

extension StringProtocol {
subscript(offset: Int) -> Character { self[index(startIndex, offsetBy: offset)] }
subscript(range: Range<Int>) -> SubSequence {
let startIndex = index(self.startIndex, offsetBy: range.lowerBound)
return self[startIndex..<index(startIndex, offsetBy: range.count)]
}
subscript(range: ClosedRange<Int>) -> SubSequence {
let startIndex = index(self.startIndex, offsetBy: range.lowerBound)
return self[startIndex..<index(startIndex, offsetBy: range.count)]
}
subscript(range: PartialRangeFrom<Int>) -> SubSequence { self[index(startIndex, offsetBy: range.lowerBound)...] }
subscript(range: PartialRangeThrough<Int>) -> SubSequence { self[...index(startIndex, offsetBy: range.upperBound)] }
subscript(range: PartialRangeUpTo<Int>) -> SubSequence { self[..<index(startIndex, offsetBy: range.upperBound)] }
}

To convert the Substring into a String, you can simply
do String(string[0..2]), but you should only do that if
you plan to keep the substring around. Otherwise, it's more
efficient to keep it a Substring.

It would be great if someone could figure out a good way to merge
these two extensions into one. I tried extending StringProtocol
without success, because the index method does not exist there.
Note: This answer has been already edited, it is properly implemented and now works for substrings as well. Just make sure to use a valid range to avoid crashing when subscripting your StringProtocol type. For subscripting with a range that won't crash with out of range values you can use this implementation


Why is this not built-in?

The error message says "see the documentation comment for discussion". Apple provides the following explanation in the file UnavailableStringAPIs.swift:

Subscripting strings with integers is not available.

The concept of "the ith character in a string" has
different interpretations in different libraries and system
components. The correct interpretation should be selected
according to the use case and the APIs involved, so String
cannot be subscripted with an integer.

Swift provides several different ways to access the character
data stored inside strings.

  • String.utf8 is a collection of UTF-8 code units in the
    string. Use this API when converting the string to UTF-8.
    Most POSIX APIs process strings in terms of UTF-8 code units.

  • String.utf16 is a collection of UTF-16 code units in
    string. Most Cocoa and Cocoa touch APIs process strings in
    terms of UTF-16 code units. For example, instances of
    NSRange used with NSAttributedString and
    NSRegularExpression store substring offsets and lengths in
    terms of UTF-16 code units.

  • String.unicodeScalars is a collection of Unicode scalars.
    Use this API when you are performing low-level manipulation
    of character data.

  • String.characters is a collection of extended grapheme
    clusters, which are an approximation of user-perceived
    characters.


Note that when processing strings that contain human-readable text,
character-by-character processing should be avoided to the largest extent
possible. Use high-level locale-sensitive Unicode algorithms instead, for example,
String.localizedStandardCompare(),
String.localizedLowercaseString,
String.localizedStandardRangeOfString() etc.

Why is the cost of fetching a value in an array, O(1)

Simple Answer

Locating a value in an array by index is O(1).

Searching for a value in an unsorted array is O(n).


Big O notation

Array is an abstract data structure in computer programming. When discussing Big O notation for an abstract data structure, what we’re asking is: how long does an algorithm take when applied to the abstract data structure given any number of elements. Put another way: how fast is your algorithm as a function of the data within the abstract structure.


Swift Context

This isn’t Machine Code. Swift is a high-level programming language. Meaning Swift code is itself an abstraction from the internal details of the computer it’s running on. So, when we discuss Big O notation for an array in Swift, we aren’t really talking about how it’s stored in memory because we’re assuming the way it’s stored in memory is a reasonable representation of an Array data structure. I think this is important to understand, and it’s why an array in Ruby will be slower than an Array in C for example, because one language is a greater abstraction of the computer it’s running on. We can now move forward knowing the discussion about Big O in the context of a Swift Array is indeed affected by it’s underlying implementation in the computer, but we’re still referring to a perfect version of an instantiated Array when using Big O.


Searching for a value in an unsorted array is O(n)

| 2 | 6 | 10 | 0 |

The above text is a visual representation of an array containing 4 elements.

If I said every time you look at an element, that is a single action with a complexity of 1.

Then I said, I want you to look at each element from left to right only once. What’s the complexity?

4. Good.



STOP.



| 2 | 6 | 10 | 0 |

in Swift is:

let array: Array<Int> = [2, 6, 10, 0]


I want you to look at each element from left to right only once.

I am a computer programmer writing Swift, and you are the computer.

for element in array {
print(element)
}

Lets make it more general. Lets say we didn’t know what was inside our array and we named it something… How about book. I now ask: I want you to look at each element from left to right only once. What’s the complexity?

Well, it’s the number of elements in book! Good.


OK Computer. Look at each page in book from left to right until the page has an exclamation point on it. If you find it, show me the page it’s on, if you don’t find it, tell me you didn’t find it. What’s the complexity?
Since neither you or I know what page, if any, contains an exclamation point, it’s entirely possible that the very last page is the only page which contains an exclamation point. It’s also possible that none of the pages contain an exclamation point.

The number of pages in our book is n.

In the worst case, whether we find the exclamation point or not, we have to look, at least once, at every single page in book until we find the page with !

O(n)


Bringing it all together in an example

struct Page {
let content: String
func contains(_ character: Character) -> Bool {
return content.contains(character)
}
init(_ content: String) {
self.content = content
}
}

typealias Book = Array<Page>

let emotionalBook: Book = [Page("this"),
Page("is"),
Page("a"),
Page("simple"),
Page("book!")]

let unemotionalBook: Book = [Page("this"),
Page("book"),
Page("lacks"),
Page("emotion.")]

enum EmotionalBook: Error {
case lacking
}

func findExclamationPoint(within book: Book) throws -> Page {
//Look at every single page once.
for page in book {
if page.contains("!") {
return page
}
}
//Didn't find a page with ! inside it
throw EmotionalBook.lacking
}

do {
//Discovers the last page!
let emotionalPage = try findExclamationPoint(within: emotionalBook)
print(emotionalPage.content)
} catch {
print("This book is \(error) in emotion!")
}

do {
//Doesn't discover a page, but still checks all of them!
let emotionalPage = try findExclamationPoint(within: unemotionalBook)
print(emotionalPage.content)
} catch {
print("This book is \(error) in emotion!")
}


I hope this helps!

How do you use String.substringWithRange? (or, how do Ranges work in Swift?)

You can use the substringWithRange method. It takes a start and end String.Index.

var str = "Hello, playground"
str.substringWithRange(Range<String.Index>(start: str.startIndex, end: str.endIndex)) //"Hello, playground"

To change the start and end index, use advancedBy(n).

var str = "Hello, playground"
str.substringWithRange(Range<String.Index>(start: str.startIndex.advancedBy(2), end: str.endIndex.advancedBy(-1))) //"llo, playgroun"

You can also still use the NSString method with NSRange, but you have to make sure you are using an NSString like this:

let myNSString = str as NSString
myNSString.substringWithRange(NSRange(location: 0, length: 3))

Note: as JanX2 mentioned, this second method is not safe with unicode strings.

Find all indices of a search term in a string

Instead of checking for the search term at each position of the string
you could use rangeOfString() to find the next occurrence (hoping
that rangeOfString() uses more advanced algorithms):

extension String {

func indicesOf(searchTerm:String) -> [Int] {
var indices = [Int]()
var pos = self.startIndex
while let range = self.rangeOfString(searchTerm, range: pos ..< self.endIndex) {
indices.append(distance(self.startIndex, range.startIndex))
pos = range.startIndex.successor()
}
return indices
}
}

Generally, it depends on the size of the input string and the size
of the search string which algorithm is "the fastest". You'll find
an overview with links to various algorithms in
String searching algorithm.

Update for Swift 3:

extension String {

func indices(of searchTerm:String) -> [Int] {
var indices = [Int]()
var pos = self.startIndex
while let range = range(of: searchTerm, range: pos ..< self.endIndex) {
indices.append(distance(from: startIndex, to: range.lowerBound))
pos = index(after: range.lowerBound)
}
return indices
}
}

How does String.Index work in Swift

Sample Image

All of the following examples use

var str = "Hello, playground"

startIndex and endIndex

  • startIndex is the index of the first character
  • endIndex is the index after the last character.

Example

// character
str[str.startIndex] // H
str[str.endIndex] // error: after last character

// range
let range = str.startIndex..<str.endIndex
str[range] // "Hello, playground"

With Swift 4's one-sided ranges, the range can be simplified to one of the following forms.

let range = str.startIndex...
let range = ..<str.endIndex

I will use the full form in the follow examples for the sake of clarity, but for the sake of readability, you will probably want to use the one-sided ranges in your code.

after

As in: index(after: String.Index)

  • after refers to the index of the character directly after the given index.

Examples

// character
let index = str.index(after: str.startIndex)
str[index] // "e"

// range
let range = str.index(after: str.startIndex)..<str.endIndex
str[range] // "ello, playground"

before

As in: index(before: String.Index)

  • before refers to the index of the character directly before the given index.

Examples

// character
let index = str.index(before: str.endIndex)
str[index] // d

// range
let range = str.startIndex..<str.index(before: str.endIndex)
str[range] // Hello, playgroun

offsetBy

As in: index(String.Index, offsetBy: String.IndexDistance)

  • The offsetBy value can be positive or negative and starts from the given index. Although it is of the type String.IndexDistance, you can give it an Int.

Examples

// character
let index = str.index(str.startIndex, offsetBy: 7)
str[index] // p

// range
let start = str.index(str.startIndex, offsetBy: 7)
let end = str.index(str.endIndex, offsetBy: -6)
let range = start..<end
str[range] // play

limitedBy

As in: index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)

  • The limitedBy is useful for making sure that the offset does not cause the index to go out of bounds. It is a bounding index. Since it is possible for the offset to exceed the limit, this method returns an Optional. It returns nil if the index is out of bounds.

Example

// character
if let index = str.index(str.startIndex, offsetBy: 7, limitedBy: str.endIndex) {
str[index] // p
}

If the offset had been 77 instead of 7, then the if statement would have been skipped.

Why is String.Index needed?

It would be much easier to use an Int index for Strings. The reason that you have to create a new String.Index for every String is that Characters in Swift are not all the same length under the hood. A single Swift Character might be composed of one, two, or even more Unicode code points. Thus each unique String must calculate the indexes of its Characters.

It is possible to hide this complexity behind an Int index extension, but I am reluctant to do so. It is good to be reminded of what is actually happening.

Swift String's count property time complexity

Array conforms to RandomAccessCollection. String conforms to BidirectionalCollection. If you look at the documentation for the Collection protocol you'll find this:

The performance of some collection operations depends on the type of
index that the collection provides. For example, a random-access
collection, which can measure the distance between two indices in O(1)
time, can calculate its count property in O(1) time. Conversely,
because a forward or bidirectional collection must traverse the entire
collection to count the number of contained elements, accessing its
count property is an O(n) operation.

So, you can expect String.count to be more computationally expensive than Array.count.



Related Topics



Leave a reply



Submit