Is Swift Dictionary of Indexed for Performance? Even for Exotic Types (Uuid)

Is Swift dictionary of indexed for performance? Even for exotic types (UUID)?

Swift's Dictionary is implemented with a hash table, therefore lookups will typically be done in O(1) time, assuming a good hashing algorithm. As @rmaddy says, this therefore means that the type of key is mainly irrelevant – the only thing that really matters is whether it has a good implementation of hashValue, which you shouldn't have to worry about at all for standard library types.

The worst case time complexity for a hash table lookup (assuming maximal hash collisions) depends on how the hash table resolves the collisions. In the case of Dictionary, two storage schemes can be used, native or Cocoa.

From HashedCollections.swift.gyb:

In normal usage, you can expect the backing storage of a Dictionary to be a
NativeStorage.

[...]

Native storage is a hash table with open addressing and linear probing. The
bucket array forms a logical ring (e.g., a chain can wrap around the end of
buckets array to the beginning of it).

Therefore the worst case lookup time complexity is O(n), as if each element has the same hash, potentially every element will have to be searched through in order to determine if a given element exists in the table.

For Cocoa storage, a _CocoaDictionaryBuffer wrapper is used in order to wrap an NSDictionary. Unfortunately, I cannot find any documentation on how it's implemented, although it's Core Foundation counterpart CFDictionary's header file states that:

The access time for a value in the dictionary is guaranteed to be at
worst O(lg N) for any implementation, current and future

(Which sounds like it uses a balanced binary tree in order to handle collisions.)

What is meaning of Dictionary uses two storage schemes: native storage and Cocoa storage ?

Ideally, Swift's collection types are built from the ground up to take advantage of the strengths of Swift's features, optimized for its quirks, and so on.

But Swift's types don't exist in a vacuum. They need to routinely interoperate with existing Foundation types. If Swift has super performant awesome data structures, but each bridging to ObjC land and back caused a full copy to massage the data into whatever shape ObjC needed, then the overall performance would be totally unworkable.

So most of Swift's data structures, including strings, dictionaries, and arrays, are implemented as wrappers over one of several different storage strategies. One of them is optimal for Swift-only use, (which is used by default, or exclusively on a platform that doesn't have Objective C support, like Linux). The other is a wrapper around the Foundation counterpart, which is meant to bridge cheaply.

The easiest to explain is Swift.String. Foundation.NSString was introduced around the 80s-90s. UTF-16 was all the rage, and like the other languages of the era (Java, C#), Objective C used UTF-16 for String storage. Things progressed, and eventually we learned that UTF-8 was a much more memory dense (better for cache) and thus performant encoding, for most strings that people use. The ship had sailed for the other languages: UTF-16 was far too ingrained to migrate to UTF-8. Swift being a new language, had a chance to embrace it, which is did in Swift 5.

But now we have two worlds. Swift.String is using UTF-8 where possible, but all the existing Foundation/AppKit/UIKit/etc. APIs use Foundation.NSString, which expects UTF-16. The conversion between the two is a linear time traversal/copy of the whole string. This would be prohibitively expensive if it were done on any passing of a Swift.String where Foundation.NSString was expected (overwhelming any benefit UTF-8 might have had).

To get around this, Swift.String have one storage strategy that uses UTF-16, which allows it to really cheaply bridge to Objective C APIs. It also has a storage strategy for "small" strings (less than 15 UTF-8 code units), which can be packed into the String's buffer pointer directly, as a tagged pointer, without actually allocating any heap buffer at all.

You might be interested in https://swift.org/blog/utf8-string/

Swift - Stored values order is completely changed in Dictionary

This is because of the definition of Dictionaries:

Dictionary

A dictionary stores associations between keys of the same type and values of the same type in an collection with no defined ordering.

There is no order, they might come out differently than they were put in.
This is comparable to NSSet.


Edit:

NSDictionary

Dictionaries Collect Key-Value Pairs. Rather than simply maintaining an ordered or unordered collection of objects, an NSDictionary stores objects against given keys, which can then be used for retrieval.

There is also no order, however there is sorting on print for debugging purposes.

The most efficient way to build tree from dictionary data

While the builder pattern is useful, I think in this case it just complicates the straight-forward solution, although as you'll see, I'll present a couple that are more complicated, but those are based on increased performance optimizations on the first simple solution.

As you you noted, initializing and deinitializing classes is kind of slow. Actually the worst part is the dynamic memory allocation. Classes are powerful and definitely have their uses, but they're not the fastest tool in the Swift toolbox. Unless you make methods final, calling them can require a virtual dispatch. That can happen with protocols too depending on the particulars of their declaration, though in that case it's called "witness table thunking". But the worst part about classes is that their instances can be littered pretty much anywhere in memory. That's hell on the processor's on-chip cache. So for performance try to avoid dynamic dispatch, and reference types (ie, classes), and when you do need to allocate memory (such as Array or Dictionary), try to allocate all you need at once, and reuse it as much as possible. In Swift that requres some thought because of copy-on-write. You can easily end up allocating memory when you didn't intend to.

I present three solutions. Each one is more complicated, but also (hopefully) faster than the previous one. I'll explain why, and what to look out for. I should mention that I am not including the simplest solution. It is much like my first solution but with local variable arrays. The performance would not be especially good, and you're question makes it clear that performance is an issue.

First there's some boiler plate. To code it up and test it, I needed to define your Point and Line types, plus a few others I use for convenience, as well as some extensions purely for generating output. This code is common to all three solutions. Substitue your own definitions for Point and Line.

struct Point: Hashable
{
// This is just to give the points uniqueness, and letter names
static private let pointNames = [Character]("abcdefghijklmnopqrstuvwxyz")
static private var curNameIndex = 0
static private var nextID: Character
{
defer { curNameIndex += 1 }
return pointNames[curNameIndex]
}

let id = nextID
}
typealias Line = (Point, Point)
typealias Graph = [Point: [Line]]
typealias Path = [Line]

// Now we add some extensions for convenient output
extension Point: CustomStringConvertible {
var description: String { "\(id)" }
}

extension String.StringInterpolation
{
mutating func appendInterpolation(_ line: Line) {
appendLiteral("(\(line.0),\(line.1))")
}
mutating func appendInterpolation(_ lines: [Line]) {
appendLiteral(lines.map { "\($0)" }.joined(separator: "->"))
}
}

You mention that Point has an X and Y, but for this problem it doesn't matter. You just need unique "things" to serve as end-points for your Line instances.

Then I declared the inputs from your example:

let (a, b, c, d) = (Point(), Point(), Point(), Point())
let input = [ a: [(a,b),(a,c)], b: [(b,c), (b,d)], c: [(c,d)] ]

With the common code out of the way, here are the actual solutions:

Solution 1

Although the recursion introduces overhead all by itself, the main problem with the most straight-forward solution is that it requres local arrays that are allocated and deallocated up and down the call stack. The dynamic memory allocations and deallocations for them are actually the main performance problem with the simple recursive solution.

The solution is to attempt to pre-allocate working storage, and re-use it all through-out the recursion, or at least make reallocations rare.

One way would be to allocate the working arrays at the top level and pass them in as inout parameters. Another is to make them mutable properties of a struct (actually, a class wouldn't be too bad in this case, because you only allocate one instance). I chose the latter approach.

As I mentioned in my comment, I think of this problem as a graph theory problem.. Point = node. Line = edge. Though it's not explicitly stated, I assume there are no cycles. Putting in code to detect cycles isn't that hard, but would complicate the solution slightly. I also assume that the output should not contain entries with just one Line, because your example doesn't include any examples of that.

struct PathFinderVersion1
{
private let graph: Graph
private var pathList = [Path]()
private var currentPath = Path()

private init(for graph: Graph)
{
self.graph = graph
self.pathList.reserveCapacity(graph.count)
self.currentPath.reserveCapacity(graph.count)
}

static func pathList(for graph: Graph) -> [Path]
{
var pathFinder = Self(for: graph)
return pathFinder.makePathLists()
}

private mutating func makePathLists() -> [Path]
{
for src in graph.keys
{
for edge in graph[src]!
{
assert(edge.0 == src, "sanity check failed")
currentPath.append(edge)
appendAllPaths()
currentPath.removeLast()
}
}

return pathList
}

private mutating func appendAllPaths()
{
assert(currentPath.count > 0, "currentPath must not be empty on entry")

guard let src = currentPath.last?.1 else { return }

guard let edges = graph[src] else
{
if currentPath.count > 1 {
pathList.append(currentPath)
}
return
}

for edge in edges
{
assert(edge.0 == src, "sanity check failed")
currentPath.append(edge)
appendAllPaths()
currentPath.removeLast()
}
}
}

Apart from the init, and static wrapper function, pathList(for:), the algorithm is really just two functions. The initializer is where I pre-allocate the working storage. Assuming there is an entry in the graph Dictionary for each Point, no path can ever be longer than there entries keys in the graph ... at least not without cycles, so currentPath is initialized with that much capacity. Similar thinking applies to the other working arrays. The pathList is likely to be be larger than graph.count, but unless there are a lot of unconnected Lines, it will need to be at least as big as graph is.

makePathLists() is the part that gets thing started, extracting the Point and array of Line for each of its entries. It initializes the first entry in a currentPath, then calls appendAllPaths() to recursively append new Line instances to currentPath on the way down. When it reaches the end, it has found a complete path, so it adds the currentPath to the pathList. On the way back up, it removes the entry it added from currentPath so that it's in a state that it can be reused again to go down another path in the graph.

Once makePathLists() has iterated over all its keys and appendAllPaths() has recursed for each one, pathList contains the result. I'm not 100% sure it's in the format you want. It's basically a flat list of all the paths it found. So it's kind of one data structure. But all the paths will be grouped together according to the starting point in the line, so splitting it into smaller lists is easy enough, if that's what you actually want.

In any case, the re-use of existing storage is where this version gets most of its performance.

You can call it like this:

print("Version 1")
for path in PathFinderVersion1.pathList(for: input) {
print("\(path)")
}

And here's the output for the data I set up as input:

Version 1
(b,c)->(c,d)
(a,b)->(b,c)->(c,d)
(a,b)->(b,d)
(a,c)->(c,d)

The exact order changes slightly from run-to-run because Dictionary doesn't necessarily hand out its keys in an order that is consistent from run to run, even if they are inserted exactly the same way every time. I tested each version to verify they all emit the same output (and so should you), but I won't include the output again for the other versions.

Solution 2

This version is based on solution 1, so everything about it applies to this version too. What's different is the addition of the dynamic programming technique of caching intermediate values. Basically I cache paths along the way, so that when I encounter them again, I don't have do all that recursion. I can just used the cached path instead.

There is one snag with this caching, it requres allocating some local caches, which introduces dynamic memory allocation again. However hope is not lost. For starters, assuming lots of nodes in the graph have multiple input edges (ie, lots of different lines connect to the same line), the result should be a win overall from being able to avoid a vast amount of recursion. Additionally i, re-use the local caches, so I only ever have to actually allocate a new one when I recurse deeper than the previous maximum depth reached. So while some allocation does happen, it's minimized.

All that cache handling makes the code longer, but faster.

To make the code more readable I put the local caching in nested struct. Here's the code for version 2:

struct PathFinderVersion2
{
private typealias CachedPath = Path.SubSequence
private typealias CachedPaths = Array<CachedPath>

private let graph: Graph
private var pathList = [Path]()
private var currentPath = Path()
private var pathCache = [Point: CachedPaths]()

private struct LocalPathCache
{
var cache = [CachedPath]()
var pathListIndex: Int
var curPathLength: Int

mutating func updateLocalPathCache(from pathList: [Path])
{
while pathListIndex < pathList.endIndex
{
let pathToCache =
pathList[pathListIndex][curPathLength...]
cache.append(pathToCache)
pathListIndex += 1
}
}

mutating func update(
mainCache: inout [Point: CachedPaths],
for src: Point)
{
if cache.count > 0 {
mainCache[src] = cache
}
}
}

private var localCaches = [LocalPathCache]()
private mutating func getLocalCache(
pathListIndex: Int,
curPathLength: Int) -> LocalPathCache
{
if var cache = localCaches.last
{
localCaches.removeLast()
cache.cache.removeAll(keepingCapacity: true)
cache.pathListIndex = pathListIndex
cache.curPathLength = curPathLength
return cache
}

return LocalPathCache(
pathListIndex: pathListIndex,
curPathLength: curPathLength
)
}

private mutating func freeLocalCache(_ cache: LocalPathCache) {
localCaches.append(cache)
}

private init(for graph: Graph)
{
self.graph = graph
self.pathList.reserveCapacity(graph.count)
self.currentPath.reserveCapacity(graph.count)
self.pathCache.reserveCapacity(graph.count)
}

static func pathList(for graph: Graph) -> [Path]
{
var pathFinder = Self(for: graph)
return pathFinder.makePathLists()
}

private mutating func makePathLists() -> [Path]
{
for src in graph.keys
{
for edge in graph[src]!
{
assert(edge.0 == src, "sanity check failed")

currentPath.append(edge)
appendAllPaths()
currentPath.removeLast()
}
}

return pathList
}

private mutating func appendAllPaths()
{
assert(currentPath.count > 0, "currentPath must not be empty on entry")

guard let src = currentPath.last?.1 else { return }

if updatePathListFromCache(for: src) { return }

guard let edges = graph[src] else
{
if currentPath.count > 1 {
pathList.append(currentPath)
}
return
}

var localCache = getLocalCache(
pathListIndex: pathList.endIndex,
curPathLength: currentPath.endIndex
)
defer { freeLocalCache(localCache) }

for edge in edges
{
assert(edge.0 == src, "sanity check failed")


currentPath.append(edge)
appendAllPaths()
currentPath.removeLast()

localCache.updateLocalPathCache(from: pathList)
}

localCache.update(mainCache: &pathCache, for: src)
}

mutating func updatePathListFromCache(for src: Point) -> Bool
{
if let cachedPaths = pathCache[src]
{
let curPathIndex = currentPath.endIndex
for path in cachedPaths
{
currentPath.append(contentsOf: path)
pathList.append(currentPath)
currentPath.removeSubrange(curPathIndex...)
}
return true
}
return false
}
}

Solution 3

Solutions 1 and 2 still use recursion. Recursion is elegant, and nice to think about, because you can express a problem as a slightly simpler problem plus a bit. But unless it's tail-recursive, compilers can't optimize it particularly well. So the solution to that problem is to use iteration instead of recursion.

Turning a recursive algorithm into an iterative one is not always so easy, and can result in ugly code. That is the definitely the case here. There might be a simpler iterative algorithm, but I don't know it. So I basically replaced recursion with a state machine + stack. I've used this technique before for recursive code the desperately needed to be faster, and it does work. The code is a total pain for a human to read and maintain, but compilers can optimize the hell out of it.

This version still uses the cached intermediate solutions from version 2.

struct PathFinderVersion3
{
private typealias CachedPath = Path.SubSequence
private typealias CachedPaths = Array<CachedPath>

private let graph: Graph
private var pathList = [Path]()
private var currentPath = Path()
private var pathCache = [Point: CachedPaths]()

private struct LocalPathCache
{
var cache = [CachedPath]()
var pathListIndex: Int
var curPathLength: Int

mutating func updateLocalPathCache(from pathList: [Path])
{
while pathListIndex < pathList.endIndex
{
let pathToCache =
pathList[pathListIndex][curPathLength...]
cache.append(pathToCache)
pathListIndex += 1
}
}

mutating func update(
mainCache: inout [Point: CachedPaths],
for src: Point)
{
if cache.count > 0 {
mainCache[src] = cache
}
}
}

private var localCaches = [LocalPathCache]()
private mutating func getLocalCache(
pathListIndex: Int,
curPathLength: Int) -> LocalPathCache
{
if var cache = localCaches.last
{
localCaches.removeLast()
cache.cache.removeAll(keepingCapacity: true)
cache.pathListIndex = pathListIndex
cache.curPathLength = curPathLength
return cache
}

return LocalPathCache(
pathListIndex: pathListIndex,
curPathLength: curPathLength
)
}

private mutating func freeLocalCache(_ cache: LocalPathCache) {
localCaches.append(cache)
}

private init(for graph: Graph)
{
self.graph = graph
self.pathList.reserveCapacity(graph.count)
self.currentPath.reserveCapacity(graph.count)
self.pathCache.reserveCapacity(graph.count)
}

static func pathList(for graph: Graph) -> [Path]
{
var pathFinder = Self(for: graph)
return pathFinder.makePathLists()
}

private mutating func makePathLists() -> [Path]
{
for src in graph.keys
{
for edge in graph[src]!
{
assert(edge.0 == src, "sanity check failed")

currentPath.append(edge)
appendAllPaths()
currentPath.removeLast()
}
}

return pathList
}

struct Stack<T>
{
var storage: [T] = []
var isEmpty: Bool { storage.isEmpty }
var count: Int { storage.count }

init(capacity: Int) { storage.reserveCapacity(capacity) }
mutating func push(_ element: T) { storage.append(element) }
mutating func pop() -> T? { storage.popLast() }
}

private mutating func appendAllPaths()
{
assert(currentPath.count > 0, "currentPath must not be empty on entry")

enum State
{
case entry
case inLoopPart1
case inLoopPart2
case exit
}
var state: State = .entry

typealias StackElement =
(Point, Int, Line, [Line], LocalPathCache, State)
var stack = Stack<StackElement>(capacity: graph.count)

var src: Point! = nil
var edges: [Line]! = nil
var edgeIndex: Int = 0
var edge: Line! = nil
var localCache: LocalPathCache! = nil

outer: while true
{
switch state
{
case .entry:
if let s = currentPath.last?.1 {
src = s
}
else
{
state = .exit
continue outer
}

if updatePathListFromCache(for: src)
{
state = .exit
continue outer
}

if let e = graph[src] { edges = e }
else
{
if currentPath.count > 1 {
pathList.append(currentPath)
}
state = .exit
continue outer
}

localCache = getLocalCache(
pathListIndex: pathList.endIndex,
curPathLength: currentPath.endIndex
)
edgeIndex = edges.startIndex
state = .inLoopPart1
continue outer

case .inLoopPart1:
if edgeIndex < edges.endIndex
{
edge = edges[edgeIndex]
assert(edge.0 == src, "sanity check failed")


currentPath.append(edge)

// Simulate function call
stack.push((src, edgeIndex, edge, edges, localCache, .inLoopPart2))
state = .entry
continue outer
}
localCache.update(mainCache: &pathCache, for: src)
state = .exit

case .inLoopPart2:
localCache.updateLocalPathCache(from: pathList)
edgeIndex += 1
state = .inLoopPart1 // Simulate goto top of inner loop

case .exit: // Simulate return
if let c = localCache { freeLocalCache(c) }
if let savedState = stack.pop()
{
(src, edgeIndex, edge, edges, localCache, state) = savedState
currentPath.removeLast()
}
else { break outer }
}
}

assert(stack.isEmpty)
}

mutating func updatePathListFromCache(for src: Point) -> Bool
{
if let cachedPaths = pathCache[src]
{
let curPathIndex = currentPath.endIndex
for path in cachedPaths
{
currentPath.append(contentsOf: path)
pathList.append(currentPath)
currentPath.removeSubrange(curPathIndex...)
}
return true
}
return false
}
}

Ummm... yeah, there it is in all its ugly glory. It works. It's fast. Good luck maintaining it.

The question is whether the speed is worth it when weighed against the readability issues and maintenance headaches, and that all depends on the application requirements. In my opinion that speed would have to be pretty darn important to put up with maintaining this version, and I'm normally fine with putting up with some ugliness to get speed in critical code. Somehow, this version, written in a language that doesn't even have a goto statement is nonetheless a poster-child for why goto is considered bad in the first place.

Solution 4 (Bonus)

Originally I just mentioned that you could parallelize it, but I didn't implement it. I decided that for completeness, a parallelization example really should be included.

I chose to parallelize solution 1, but all three of the previous solutions can be parallelized in exactly the same way. The changes that have to be made are to add the split method, and modify the static pathList(for:) method as well as the private makePathLists instance method. You also need a concurrent DispatchQueue. I create a global one for this example, but you can use an existing one. Don't use DispatchQueue.main for this, if you want your app to be responsive while processing.

Here's the code:

import Foundation

let dispatchQueue =
DispatchQueue(label: "PathFinder-\(UUID())",attributes: .concurrent)


struct PathFinderVersion4
{
private let graph: Graph
private var pathList = [Path]()
private var currentPath = Path()

private init(for graph: Graph)
{
self.graph = graph
self.pathList.reserveCapacity(graph.count)
self.currentPath.reserveCapacity(graph.count)
}

public static func pathList(for graph: Graph) -> [Path]
{
let concurrency = min(4, graph.count)

let pointGroups = split(.init(graph.keys), numberOfGroups: concurrency)
var pathLists = [[Path]](repeating: [], count: concurrency)

let waitSems = Array(
repeating: DispatchSemaphore(value: 0),
count: concurrency
)

for groupIndex in pointGroups.indices
{
dispatchQueue.async
{
defer { waitSems[groupIndex].signal() }

var pathFinder = Self(for: graph)
pathLists[groupIndex] =
pathFinder.makePathLists(for: pointGroups[groupIndex])
}
}

// Need to signal each semaphore after waiting or will crash on return.
// See Apple documentation for DispatchSemaphore
waitSems.forEach { $0.wait(); $0.signal() }

var result = [Path]()
result.reserveCapacity(pathLists.reduce(0) { $0 + $1.count })
pathLists.forEach { result.append(contentsOf: $0) }

return result
}

private static func split<Value>(
_ values: [Value],
numberOfGroups: Int) -> [[Value]]
{
var groups = [[Value]]()
groups.reserveCapacity(numberOfGroups)


Related Topics



Leave a reply



Submit