Slow Swift String Performance

Slow Swift String Performance

A Swift String is a collection of Characters, and a Character represents a single extended grapheme cluster, that can be one or more
Unicode scalars. That makes some index operations like "skip the first N characters" slow.

But the first improvement is to "short-circuit" the isPalindrome()
function. Instead of building the reversed string completely, compare
the character sequence with its reversed sequence and stop as soon
as a difference is found:

func isPalindrome(_ s: String) -> Bool {
return !zip(s.characters, s.characters.reversed()).contains { $0 != $1 }
}

s.characters.reversed() does not create a new collection in reverse
order, it just enumerates the characters from back to front.
With String(s.characters.reversed()) as in your method however,
you force the creation of a new collection for the reversed string,
that makes it slow.

For the 110-character string

let string = String(repeating: "Hello world", count: 10)

this reduces the computation time from about 6 sec to 1.2 sec in my test.

Next, avoid index calculations like

let endIndex = string.index(string.startIndex, offsetBy: length-1)

and iterate over the character index itself instead:

func partition(_ s: String) -> [[String]] {

var result: [[String]] = []

func dfs(string: String, partiton: [String]) {
if string.isEmpty {
result.append(partiton)
return
}

var idx = string.startIndex
repeat {
string.characters.formIndex(after: &idx)
let part = string.substring(to: idx)
if isPalindrome(part) {
let leftPart = string.substring(from: idx)
dfs(string: leftPart, partiton: partiton + [part])
}
} while idx != string.endIndex
}

func isPalindrome(_ s: String) -> Bool {
return !zip(s.characters, s.characters.reversed()).contains { $0 != $1 }
}

dfs(string: s, partiton: [])
return result
}

Computation time is now 0.7 sec.

The next step is to avoid string indexing totally, and work with
an array of characters, because array indexing is fast. Even better,
use array slices which are fast to create and reference the original
array elements:

func partition(_ s: String) -> [[String]] {

var result: [[String]] = []

func dfs(chars: ArraySlice<Character>, partiton: [String]) {

if chars.isEmpty {
result.append(partiton)
return
}

for length in 1...chars.count {
let part = chars.prefix(length)
if isPalindrome(part) {
let leftPart = chars.dropFirst(length)
dfs(chars: leftPart, partiton: partiton + [String(part)])
}
}
}

func isPalindrome(_ c: ArraySlice<Character>) -> Bool {
return !zip(c, c.reversed()).contains { $0 != $1 }
}

dfs(chars: ArraySlice(s.characters), partiton: [])
return result
}

Computation time is now 0.08 sec.

If your string contains only characters in the "basic multilingual plane" (i.e. <= U+FFFF) then you can work with UTF-16 code points instead:

func partition(_ s: String) -> [[String]] {

var result: [[String]] = []

func dfs(chars: ArraySlice<UInt16>, partiton: [String]) {

if chars.isEmpty {
result.append(partiton)
return
}

for length in 1...chars.count {
let part = chars.prefix(length)
if isPalindrome(part) {
let leftPart = chars.dropFirst(length)
part.withUnsafeBufferPointer {
dfs(chars: leftPart, partiton: partiton + [String(utf16CodeUnits: $0.baseAddress!, count: length)])
}
}
}
}

func isPalindrome(_ c: ArraySlice<UInt16>) -> Bool {
return !zip(c, c.reversed()).contains { $0 != $1 }
}

dfs(chars: ArraySlice(s.utf16), partiton: [])
return result
}

Computation time is now 0.04 sec for the 110 character test string.


So some tips which potentially can improve the performance when working with Swift strings are

  • Iterate over the characters/indices sequentially. Avoid "jumping"
    to the n'th position.
  • If you need "random" access to all characters, convert the string
    to an array first.
  • Working with the UTF-16 view of a string can be faster than working
    with the characters view.

Of course it depends on the actual use-case. In this application,
we were able to reduce the computation time from 6 sec to 0.04 sec,
that is a factor of 150.

Slow Swift Arrays and Strings performance

There are multiple reasons why the Swift code is slower than the Objective-C code.
I made a very simple test case by comparing two fixed strings 100 times.

  • Objective-C code: 0.026 seconds
  • Swift code: 3.14 seconds

The first reason is that a Swift Character represents an "extended grapheme cluster",
which can contain several Unicode code points (e.g. "flags"). This makes the
decomposition of a string into characters slow. On the other hand, Objective-C
NSString stores the strings as a sequence of UTF-16 code points.

If you replace

let a = Array(aStr)
let b = Array(bStr)

by

let a = Array(aStr.utf16)
let b = Array(bStr.utf16)

so that the Swift code works on UTF-16 sequences as well then the time goes down
to 1.88 seconds.

The allocation of the 2-dimensional array is also slow. It is faster to allocate
a single one-dimensional array. I found a simple Array2D class here:
http://blog.trolieb.com/trouble-multidimensional-arrays-swift/

class Array2D {
var cols:Int, rows:Int
var matrix: [Int]

init(cols:Int, rows:Int) {
self.cols = cols
self.rows = rows
matrix = Array(count:cols*rows, repeatedValue:0)
}

subscript(col:Int, row:Int) -> Int {
get {
return matrix[cols * row + col]
}
set {
matrix[cols*row+col] = newValue
}
}

func colCount() -> Int {
return self.cols
}

func rowCount() -> Int {
return self.rows
}
}

Using that class in your code

func levenshtein(aStr: String, bStr: String) -> Int {
let a = Array(aStr.utf16)
let b = Array(bStr.utf16)

var dist = Array2D(cols: a.count + 1, rows: b.count + 1)

for i in 1...a.count {
dist[i, 0] = i
}

for j in 1...b.count {
dist[0, j] = j
}

for i in 1...a.count {
for j in 1...b.count {
if a[i-1] == b[j-1] {
dist[i, j] = dist[i-1, j-1] // noop
} else {
dist[i, j] = min(
dist[i-1, j] + 1, // deletion
dist[i, j-1] + 1, // insertion
dist[i-1, j-1] + 1 // substitution
)
}
}
}

return dist[a.count, b.count]
}

the time in the test case goes down to 0.84 seconds.

The last bottleneck that I found in the Swift code is the min() function.
The Swift library has a built-in min() function which is faster. So just removing
the custom function from the Swift code reduces the time for the test case to
0.04 seconds, which is almost as good as the Objective-C version.

Addendum: Using Unicode scalars seems to be even slightly faster:

let a = Array(aStr.unicodeScalars)
let b = Array(bStr.unicodeScalars)

and has the advantage that it works correctly with surrogate pairs such
as Emojis.

Extremely slow parsing text with Swift

You Int indexing extension is the problem. To get a substring at position n, it needs to go through all the characters 0..n. Therefore your algorithm has O(n^2) (quadratic) complexity instead of the expected O(n) (linear) complexity.

Don't use that extension.

To search for a substring, there is a native method

if let range = html.range(of: SEARCH_START) {
let integerIndex = html.distance(from: html.startIndex, to: range.upperBound)
print(integerIndex)
}

If you really want to work with integers, you should convert your string to an array of characters first:

let chars = Array(html.characters)

and work with subarrays instead of substrings.

Edit:

To better understand what happens in your extension:

self.substring(with: self.index(self.startIndex, offsetBy: start)..<self.index(self.startIndex, offsetBy: realEnd))

In Java, where String is an array and supports random indexing this would be a constant (fast) operation. However, in Swift this is composed from 3 steps:

  1. self.index(self.startIndex, offsetBy: start) iterates from the first character until it finds the character at index start.
  2. self.index(self.startIndex, offsetBy: realEnd)) iterates from the first character until it finds the character at index realEnd.
  3. Gets the substring (fast)

In short, for every substring at start position n, the algorithm has to iterate over 2n characters. To get a single substring at index 20000, you need 40000 operations!

Why `String(describing: Class.self)` is slower than `NSStringFromClass(Class.self)`?

You’re comparing apples with motorcars.

  • NSStringFromClass does a highly specific job, involving only class types, and is used for a conversion related to the dynamic nature of Objective-C programs.

  • String(describing:) in Swift turns anything into a string representation and is intended (and suitable) only for debugging. It will not be used for any user-facing code and no meaningful program action will depend upon its output. Thus it will not appear in a released app and its speed is of no account. Your question therefore optimizes not only prematurely but unnecessarily; the speed of String(describing:) is unimportant.

Array Contains Too Slow Swift

i think you need the realPosition to link from a tap on a row in the tableview to the source array?

1) make a second array with data only for the tableViewDataSource

copy all visible elements to this new array. create a special ViewModel as class or better struct which only has the nessesary data to display in the tableview. save in this new ViewModel the realdataposition also as value. now you have a backlink to the source array

2) then populate this TableView only from the new datasource

3) look more into the functional programming in swift - there you can nicer go over arrays for example:

var array1 = ["a", "b", "c", "d", "e"]
let array2 = ["a", "c", "d"]
array1 = array1.filter { !array2.contains($0) }

or in your case:

let newArray = comments.filter{ !hidden.contains($0.getId()) }

or enumerated to create the viewmodel

struct CommentViewModel {
var id: Int
var text: String
var realPosition: Int
}

let visibleComments: [CommentViewModel] = comments
.enumerated()
.map { (index, element) in
return CommentViewModel(id: element.getId(), text: element.getText(), realPosition: index)
}
.filter{ !hidden.contains($0.id) }


Related Topics



Leave a reply



Submit