Ruby: Why Does '#Hash' Need to Overridden Whenever '#Eql' Is Overridden

Ruby: Why does `#hash` need to overridden whenever `#eql?` is overridden?

Firstly, what is the #hash method? I can see it returns an integer.

The #hash method is supposed to return a hash of the receiver. (The name of the method is a bit of a giveaway).

Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from.

There are dozens of questions of the type "Where does this method come from" on [so], and the answer is always the same: the best way to know where a method comes from, is to simply ask it:

hash_method = 1.method(:hash)
hash_method.owner #=> Kernel

So, #hash is inherited from Kernel. Note however, that there is a bit of a peculiar relationship between Object and Kernel, in that some methods that are implemented in Kernel are documented in Object or vice versa. This probably has historic reasons, and is now an unfortunate fact of life in the Ruby community.

Unfortunately, for reasons I don't understand, the documentation for Object#hash was deleted in 2017 in a commit ironically titled "Add documents". It is, however, still available in Ruby 2.4 (bold emphasis mine):

hashinteger


Generates an Integer hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash.

The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key. […]

So, as you can see, there is a deep and important relationship between #eql? and #hash, and in fact the correct behavior of methods that use #eql? and #hash depends on the fact that this relationship is maintained.

So, we know that the method is called #hash and thus likely computes a hash. We know it is used together with eql?, and we know that it is used in particular by the Hash class.

What does it do, exactly? Well, we all know what a hash function is: it is a function that maps a larger, potentially infinite, input space into a smaller, finite, output space. In particular, in this case, the input space is the space of all Ruby objects, and the output space is the "fast integers" (i.e. the ones that used to be called Fixnum).

And we know how a hash table works: values are placed in buckets based on the hash of their keys, if I want to find a value, then I only need to compute the hash of the key (which is fast) and know which bucket I find the value in (in constant time), as opposed to e.g. an array of key-value-pairs, where I need to compare the key against every key in the array (linear search) to find the value.

However, there is a problem: Since the output space of a hash is smaller than the input space, there are different objects which have the same hash value and thus end up in the same bucket. Thus, when two objects have different hash values, I know for a fact that they are different, but if they have the same hash value, then they could still be different, and I need to compare them for equality to be sure – and that's where the relationship between hash and equality comes from. Also note that when many keys and up in the same bucket, I will again have to compare the search key against every key in the bucket (linear search) to find the value.

From all this we can conclude the following properties of the #hash method:

  • It must return an Integer.
  • Not only that, it must return a "fast integer" (equivalent to the old Fixnums).
  • It must return the same integer for two objects that are considered equal.
  • It may return the same integer for two objects that are considered unequal.
  • However, it only should do so with low probability. (Otherwise, a Hash may degenerate into a linked list with highly degraded performance.)
  • It also should be hard to construct objects that are unequal but have the same hash value deliberately. (Otherwise, an attacker can force a Hash to degenerate into a linked list as a form of Degradation-of-Service attack.)

Does overriding hash and eql in ruby affect performance?

Overriding a method per se does not affect performance, but the implementation of the method matters. Your method is bad because it has redundant things. It could be better written as:

def eql?(other)
url = self.url and other and url == other.url
end

The url = self.url is memoization.


You originally have five conditions to make it true:

  • not other == false
  • not url == nil
  • not other == nil
  • not other.url == nil
  • url == other.url

Among them,

  • No1 and No3 can be put togher by putting url in the condition.
  • No4 is redundant under No2 and No5 because if url is not nil, and other.url is url, then other.url is not nil.

Unique objects in the ruby ​array

Unless specified otherwise, no two objects are the same:

Object.new.eql?(Object.new)
# => false

Thus, where #uniq is concerned, all 150 Airplane instances are unique, with no duplicates.

The easiest way to fix this is to provide the uniqueness criterion to #uniq:

planes.uniq(&:model)

The other way is to define what "duplicate" means for the Airplane class:

class Airplane
attr_accessor :model

def initialize(model)
@model = model
end

def ==(other)
other.class == self.class && other.model == self.model
end

alias_method :eql?, :==

def hash
self.model.hash
end
end

However, this solution will make two airplanes of the same model the same airplane, in all cases, which might have unintended consequences in other places.

Deleting a modified object from a set in a no-op?

Yes, this is a bug or at least I'd call it a bug. Some would call this "an implementation detail accidentally leaking to the outside world" but that's just fancy pants city-boy talk for bug.

The problem has two main causes:

  1. You're modifying elements of the Set without Set knowing about it.
  2. The standard Ruby Set is implemented as a Hash.

The result is that you're modifying the internal Hash's keys without the Hash knowing about it and that confuses the poor Hash into not really knowing what keys it has anymore. The Hash class has a rehash method:

rehash → hsh

Rebuilds the hash based on the current hash values for each key. If values of key objects have changed since they were inserted, this method will reindex hsh.

a = [ "a", "b" ]
c = [ "c", "d" ]
h = { a => 100, c => 300 }
h[a] #=> 100
a[0] = "z"
h[a] #=> nil
h.rehash #=> {["z", "b"]=>100, ["c", "d"]=>300}
h[a] #=> 100

Notice the interesting behavior in the example included with the rehash documentation. Hashes keep track of things using the k.hash values for the key k. If you have an array as a key and you change the array, you can change the array's hash value as well; the result is that the Hash still has that array as a key but the Hash won't be able to find that array as a key because it will be looking in the bucket for the new hash value but the array will be in the bucket for the old hash value. But, if you rehash the Hash, it will all of a sudden be able to find all of its keys again and the senility goes away. Similar problems will occur with non-array keys: you just have to change the key in such a way that its hash value changes and the Hash containing that key will get confused and wander around lost until you rehash it.

The Set class uses a Hash internally to store its members and the members are used as the hash's keys. So, if you change a member, the Set will get confused. If Set had a rehash method then you could kludge around the problem by slapping the Set upside the head with rehash to knock some sense into it; alas, there is no such method in Set. However, you can monkey patch your own in:

class Set
def rehash
@hash.rehash
end
end

Then you can change the keys, call rehash on the Set, and your delete (and various other methods such as member?) will work properly.

What's the difference between equal?, eql?, ===, and ==?

I'm going to heavily quote the Object documentation here, because I think it has some great explanations. I encourage you to read it, and also the documentation for these methods as they're overridden in other classes, like String.

Side note: if you want to try these out for yourself on different objects, use something like this:

class Object
def all_equals(o)
ops = [:==, :===, :eql?, :equal?]
Hash[ops.map(&:to_s).zip(ops.map {|s| send(s, o) })]
end
end

"a".all_equals "a" # => {"=="=>true, "==="=>true, "eql?"=>true, "equal?"=>false}


== — generic "equality"

At the Object level, == returns true only if obj and other are the same object. Typically, this method is overridden in descendant classes to provide class-specific meaning.

This is the most common comparison, and thus the most fundamental place where you (as the author of a class) get to decide if two objects are "equal" or not.

=== — case equality

For class Object, effectively the same as calling #==, but typically overridden by descendants to provide meaningful semantics in case statements.

This is incredibly useful. Examples of things which have interesting === implementations:

  • Range
  • Regex
  • Proc (in Ruby 1.9)

So you can do things like:

case some_object
when /a regex/
# The regex matches
when 2..4
# some_object is in the range 2..4
when lambda {|x| some_crazy_custom_predicate }
# the lambda returned true
end

See my answer here for a neat example of how case+Regex can make code a lot cleaner. And of course, by providing your own === implementation, you can get custom case semantics.

eql?Hash equality

The eql? method returns true if obj and other refer to the same hash key. This is used by Hash to test members for equality. For objects of class Object, eql? is synonymous with ==. Subclasses normally continue this tradition by aliasing eql? to their overridden == method, but there are exceptions. Numeric types, for example, perform type conversion across ==, but not across eql?, so:

1 == 1.0     #=> true
1.eql? 1.0 #=> false

So you're free to override this for your own uses, or you can override == and use alias :eql? :== so the two methods behave the same way.

equal? — identity comparison

Unlike ==, the equal? method should never be overridden by subclasses: it is used to determine object identity (that is, a.equal?(b) iff a is the same object as b).

This is effectively pointer comparison.

Return object after performing intersection of two arrays based on attribute

you can:

1 :

override the eql?(other) method then the array intersection will work

class Link < ApplicationRecord
def eql?(other)
self.class == other.class && self.id == other&.id # classes comparing class is a guard here
end

# you should always update the hash if you are overriding the eql?() https://stackoverflow.com/a/54961965/5872935
def hash
self.id.hash
end
end

2:

use array.select:

array_links.flat_map {|i| selected_links.select {|k|  k.user_id == i.user_id }}

what does a singleton method belong to if the metaclass method is wrongly overridden?

Still the metaclass, you've just removed your ability to access it directly...

foo.instance_eval { class << self; self; end.instance_methods.include?(:shout) }
=> true

How to declare what need for include ruby module

There are some existing mixins, classes, and methods in the Ruby core library that have the exact same problem, e.g. Enumerable, Comparable, Range, Hash, Array#uniq: they require certain behavior from other objects in order to work. Some examples are:

  • Enumerable:

    The class must provide a method each, which yields successive members of the collection. If Enumerable#max, #min, or #sort is used, the objects in the collection must also implement a meaningful <=> operator […]

  • Comparable:

    The class must define the <=> operator, which compares the receiver against another object, returning -1, 0, or +1 depending on whether the receiver is less than, equal to, or greater than the other object. If the other object is not comparable then the <=> operator should return nil.

  • Range:

    Ranges can be constructed using any objects that can be compared using the <=> operator. Methods that treat the range as a sequence (#each and methods inherited from Enumerable) expect the begin object to implement a succ method to return the next object in sequence. The step and include? methods require the begin object to implement succ or to be numeric.

  • Hash:

    A user-defined class may be used as a hash key if the hash and eql? methods are overridden to provide meaningful behavior.

    And in order to define what "meaningful behavior" means, the documentation of Hash further links to the documentation of Object#hash and Object#eql?:

  • Object#hash:

    […] This function must have the property that a.eql?(b) implies a.hash == b.hash. […]

  • Object#eql?:

    […] The eql? method returns true if obj and other refer to the same hash key. […]

So, as you can see, your question is a quite common one, and the answer is: documentation.

Why does IRB not ignore EOF as per IRB.conf hash

The docs for irb say about that configuration:

**conf.ignore_eof = true/false**
Whether ^D (control-d) will be ignored or not. If false is set, ^D means quit.

So, no that setting isn't meant to do what you're looking for. As far as I can tell, there isn't a way to do what you want with irb. The closest would be to start irb without an argument, then use require './foo.rb' to run that file.



Related Topics



Leave a reply



Submit