How to Handle Hash Collisions for Dictionaries in Swift

How to handle hash collisions for Dictionaries in Swift


func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.hashValue == rhs.hashValue
}

Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.

Your problem is an incorrect equality implementation.

A hash table (such as a Swift Dictionary or Set) requires separate equality and hash implementations.

hash gets you close to the object you're looking for; equality gets you the exact object you're looking for.

Your code uses the same implementation for hash and equality, and this will guarantee a collision.

To fix the problem, implement equality to match exact object values (however your model defines equality). E.g.:

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.id == rhs.id
}

How are hash collisions handled?

Fundamentally, there are two major ways of handling hash collisions - separate chaining, when items with colliding hash codes are stored in a separate data structure, and open addressing, when colliding data is stored in another available bucket that was selected using some algorithm.

Both strategies have numerous sub-strategies, described in Wikipedia. The exact strategy used by a particular implementation is, not surprisingly, implementation-specific, so the authors can change it at any time for something more efficient without breaking the assumptions of their users.

A this point, the only way to find out how Swift handles collisions would be disassembling the library (that is, unless you work for Apple, and have access to the source code). Curious people did that to NSDictionary, and determined that it uses linear probing, the simplest variation of the open addressing technique.

How does Dictionary use the Equatable protocol in Swift?

Well, there's your answer:

https://bugs.swift.org/browse/SR-3330?focusedCommentId=19980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-19980

What's actually happening:

  • We hash a value only once on insertion.
  • We don't use hashes for comparison of elements, only ==. Using hashes for comparison is only reasonable if you store the hashes, but
    that means more memory usage for every Dictionary. A compromise that
    needs evaluation.
  • We try to insert the element before evaluating if the Dictionary can fit that element. This is because the element might already be in the
    Dictionary, in which case we don't need any more capacity.
  • When we resize the Dictionary, we have to rehash everything, because we didn't store the hashes.

So what you're seeing is:

  • one hash of the search key
  • some =='s (searching for a space)
  • hashes of every element in the collection (resize)
  • one hash of the search key (actually totally wasteful, but not a big deal considering it only happens after an O reallocation)
  • some =='s (searching for a space in the new buffer)

We all had it totally wrong. They don't use the hashes at all — only == — to decide whether this is a distinct key. And then there is a second round of calls in the situation where the collection is grown.

Swift: Hashable struct with dictionary property

I think you need to review your data model if you have to use a whole struct as a dictionary key. Anyhow, here's one way to do it:

internal struct MapKey: Hashable {
internal let id: String
internal let values: [String:String]

var hashValue: Int {
get {
var hashString = self.id + ";"
for key in values.keys.sort() {
hashString += key + ";" + values[key]!
}

return hashString.hashValue
}
}
}

func ==(lhs: MapKey, rhs: MapKey) -> Bool {
return lhs.id == rhs.id && lhs.values == rhs.values
}

This assumes that you don't have semicolon (;) in id or in the keys and values of values. Hasable implies Equatable so you don't need to declare it conforming to Equatable again.

How to handle hash collisions?

Typically hash collisions are handled in two ways:

  1. Use a larger hash, so that collisions are practically impossible.

  2. Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.

A 128 bit GUID uses the first method. The HashSet<T> class in .NET is an example of the second method.

How can I comprehend this sentence two instances with the same hash value don’t necessarily compare equally.

Think of the hash value as a quick, compact, non-unique identifier for a given object instance. The only hard condition is this: if two objects compare equally, according to the == operator, than both instances must have the exact same hash value. That’s all there is to it ;)

In particular, given that hash values aren’t unique — and how they could be given Int limited range? — we can’t safely assume that two instances with the same hash value will compare equally.

Do swift hashable protocol hash functions need to return unique values?

Collisions are not "generally OK". The underlying assumption is that the hash value of x is the hash value of y if and only if x == y. If you consider column 2, row 1 the same as column 1, row 2, then fine. But I don't think you do! The application may appear to work, but presumably you have done nothing that requires hashability - yet.

Hashable struct with interchangeable properties?

This is how Hasher works

https://developer.apple.com/documentation/swift/hasher

However, the underlying hash algorithm is designed to exhibit
avalanche effects: slight changes to the seed or the input byte
sequence will typically produce drastic changes in the generated hash
value.

So the problem here in hash(into:) func

Since the sequence matters combine operation is not commutative. You should find some other function to be a hash for this struct. In your case the best option is

    func hash(into hasher: inout Hasher) {
hasher.combine(leftOperand & rightOperand)
}

As @Martin R pointed out to have less collisions it's better to use ^

    func hash(into hasher: inout Hasher) {
hasher.combine(leftOperand ^ rightOperand)
}


Related Topics



Leave a reply



Submit