Which Equality Test Does Ruby's Hash Use When Comparing Keys

Which equality test does Ruby's Hash use when comparing keys?

Hash uses key.eql? to test keys for equality. If you need to use
instances of your own classes as keys in a Hash, it is recommended
that you define both the eql? and hash methods. The hash method must
have the property that a.eql?(b) implies a.hash == b.hash.

So...

class A
attr_reader :x
def initialize(inner)
@inner=inner
end
def x; @inner.x; end
def ==(other)
@inner.x==other.x
end

def eql?(other)
self == other
end

def hash
x.hash
end
end

How exactly does #eql? rely on #hash?

This seems to be actually the other way around. eql? is expected to return true for objects returning the same hash value, but it is not defined to compare these values. You are simply expected to override both.

The eql? method returns true if obj and other refer to the same hash key. This is used by Hash to test members for equality. For any pair of objects where eql? returns true, the hash value of both objects must be equal. So any subclass that overrides eql? should also override hash appropriately.

What is the big-O complexity to test for the equality of two hashes?

Hash comparisons are, on average, O(keys) + ∑Cost(value).

If the values are simple then comparing each value is O(1). The hash comparison is O(keys) + ∑O(1) which is O(keys).

Ruby does a few extra things, but the basic algorithm is simple.

  1. If they're the same hash, they're equal.
  2. If they have a different number of keys, they're not equal.
  3. If any pair in hash1 is not in hash2, they're not equal.
  4. They're equal.
def hash_eq(hash1, hash2)
# O(1)
return true if hash1.equal?(hash2)

# O(1)
return false if hash1.size != hash2.size

# O(keys)
hash1.each {|k,v|
# O(1) O(value)
return false if hash2[k] != v
}

return true
end

Checking object equality is just comparing two pointers, so O(1).

Ruby hashes store their size in num_entries, so comparing their size is O(1).

Iterating through the keys and values of a hash is O(keys), you're basically walking a array.

Looking up a value in a hash is O(1). Worst case is O(n), but this is exceedingly rare in modern hashes especially after it was found to be a security issue.

Finally, comparing the values have their own cost. For simple values like numbers this is O(1). Anything else has its own cost which I'll call O(val). This can be large strings, which are O(n) where n is the size of the string, or complex nested data structures. We need to do this for each value and each value carries its own cost. I'll write this as the sum of all those costs: ∑Cost(value).

Thus O(1) + O(1) + O(keys) + ∑Cost(value). The O(1)s are swamped by O(keys) and disappear, thus O(keys) + ∑Cost(value).


A simple benchmark bears this out.

require 'benchmark'

def benchmark_hash_eq(&block)
# We can't use h2 = h1.clone because that will not clone
# the values. Then Ruby can cheat and use object equality to
# compare the values.
h1 = block.call
h2 = block.call

puts "Size: #{h1.size}"
puts Benchmark.measure {
10_000.times { h1 == h2 }
}
end

puts "With number values"
benchmark_hash_eq do
Hash[(1..100).collect { |i| [i, i] }]
end
benchmark_hash_eq do
Hash[(1..1_000).collect { |i| [i, i] }]
end
benchmark_hash_eq do
Hash[(1..10_000).collect { |i| [i, i] }]
end

puts "With large, equal size strings"
benchmark_hash_eq do
Hash[(1..100).collect { |i| [i, "a" * 1000] }]
end
benchmark_hash_eq do
Hash[(1..1_000).collect { |i| [i, "a" * 1000] }]
end
benchmark_hash_eq do
Hash[(1..10_000).collect { |i| [i, "a" * 1000] }]
end
With number values
Size: 100
0.019237 0.000043 0.019280 ( 0.019306)
Size: 1000
0.195047 0.000515 0.195562 ( 0.196367)
Size: 10000
1.913112 0.003115 1.916227 ( 1.920030)

With large, equal size strings
Size: 100
0.065414 0.000225 0.065639 ( 0.065839)
Size: 1000
0.669863 0.001145 0.671008 ( 0.672279)
Size: 10000
10.041987 0.013201 10.055188 ( 10.066840)

Perfectly linear, except that last one which could be due to the hash size reaching some memory threshold.

Note that Ruby can do the comparison in parallel. This doesn't change the complexity, but might make it perform faster.

How to test order-conscious equality of hashes

Probably the easiest is to compare the corresponding arrays.

h1.to_a == h2.to_a

How can you easily test hash equality in Ruby when you only care about intersecting keys?

def compare_intersecting_keys(a, b)
(a.keys & b.keys).all? {|k| a[k] == b[k]}
end

Use like this:

compare_intersecting_keys(hash_x, hash_y)  # => true
compare_intersecting_keys(hash_p, hash_q) # => false

If you want it monkey-patched:

class Hash
def compare_intersection(other)
(self.keys & other.keys).all? {|k| self[k] == other[k]}
end
end

Hash lookup fails when set key and lookup pass an equality test

Based on those links, is hash for both objects identical?

We've had issues in a multithreaded JRuby environment where keys in two separate runtimes satisfy == but fail to retrieve keys due to differing object_ids of the underlying keys.
It could be a similar class of problem - two objects A and B which satisfy A.eql? B but do not satisfy A is B

How do I compare two hashes?

You can compare hashes directly for equality:

hash1 = {'a' => 1, 'b' => 2}
hash2 = {'a' => 1, 'b' => 2}
hash3 = {'a' => 1, 'b' => 2, 'c' => 3}

hash1 == hash2 # => true
hash1 == hash3 # => false

hash1.to_a == hash2.to_a # => true
hash1.to_a == hash3.to_a # => false


You can convert the hashes to arrays, then get their difference:

hash3.to_a - hash1.to_a # => [["c", 3]]

if (hash3.size > hash1.size)
difference = hash3.to_a - hash1.to_a
else
difference = hash1.to_a - hash3.to_a
end
Hash[*difference.flatten] # => {"c"=>3}

Simplifying further:

Assigning difference via a ternary structure:

  difference = (hash3.size > hash1.size) \
? hash3.to_a - hash1.to_a \
: hash1.to_a - hash3.to_a
=> [["c", 3]]
Hash[*difference.flatten]
=> {"c"=>3}

Doing it all in one operation and getting rid of the difference variable:

  Hash[*(
(hash3.size > hash1.size) \
? hash3.to_a - hash1.to_a \
: hash1.to_a - hash3.to_a
).flatten]
=> {"c"=>3}

Ruby Compare 2 hashes

Two hashes with the same key/value pairs will be equal regardless of key order:

a = {:x => 1, :y => 2}
b = {:y => 2, :x => 1}

a == b
# => true


Related Topics



Leave a reply



Submit