Ruby: Why does `#hash` need to overridden whenever `#eql?` is overridden?
Firstly, what is the
#hash
method? I can see it returns an integer.
The #hash
method is supposed to return a hash of the receiver. (The name of the method is a bit of a giveaway).
Using pry I can see that an integer responds to
#hash
but I cannot see where it inherits the method from.
There are dozens of questions of the type "Where does this method come from" on [so], and the answer is always the same: the best way to know where a method comes from, is to simply ask it:
hash_method = 1.method(:hash)
hash_method.owner #=> Kernel
So, #hash
is inherited from Kernel
. Note however, that there is a bit of a peculiar relationship between Object
and Kernel
, in that some methods that are implemented in Kernel
are documented in Object
or vice versa. This probably has historic reasons, and is now an unfortunate fact of life in the Ruby community.
Unfortunately, for reasons I don't understand, the documentation for Object#hash
was deleted in 2017 in a commit ironically titled "Add documents". It is, however, still available in Ruby 2.4 (bold emphasis mine):
hash
→integer
Generates an Integer hash value for this object. This function must have the property that
a.eql?(b)
impliesa.hash == b.hash
.The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key. […]
So, as you can see, there is a deep and important relationship between #eql?
and #hash
, and in fact the correct behavior of methods that use #eql?
and #hash
depends on the fact that this relationship is maintained.
So, we know that the method is called #hash
and thus likely computes a hash. We know it is used together with eql?
, and we know that it is used in particular by the Hash
class.
What does it do, exactly? Well, we all know what a hash function is: it is a function that maps a larger, potentially infinite, input space into a smaller, finite, output space. In particular, in this case, the input space is the space of all Ruby objects, and the output space is the "fast integers" (i.e. the ones that used to be called Fixnum
).
And we know how a hash table works: values are placed in buckets based on the hash of their keys, if I want to find a value, then I only need to compute the hash of the key (which is fast) and know which bucket I find the value in (in constant time), as opposed to e.g. an array of key-value-pairs, where I need to compare the key against every key in the array (linear search) to find the value.
However, there is a problem: Since the output space of a hash is smaller than the input space, there are different objects which have the same hash value and thus end up in the same bucket. Thus, when two objects have different hash values, I know for a fact that they are different, but if they have the same hash value, then they could still be different, and I need to compare them for equality to be sure – and that's where the relationship between hash and equality comes from. Also note that when many keys and up in the same bucket, I will again have to compare the search key against every key in the bucket (linear search) to find the value.
From all this we can conclude the following properties of the #hash
method:
- It must return an
Integer
. - Not only that, it must return a "fast integer" (equivalent to the old
Fixnum
s). - It must return the same integer for two objects that are considered equal.
- It may return the same integer for two objects that are considered unequal.
- However, it only should do so with low probability. (Otherwise, a
Hash
may degenerate into a linked list with highly degraded performance.) - It also should be hard to construct objects that are unequal but have the same hash value deliberately. (Otherwise, an attacker can force a
Hash
to degenerate into a linked list as a form of Degradation-of-Service attack.)
Does overriding hash and eql in ruby affect performance?
Overriding a method per se does not affect performance, but the implementation of the method matters. Your method is bad because it has redundant things. It could be better written as:
def eql?(other)
url = self.url and other and url == other.url
end
The url = self.url
is memoization.
You originally have five conditions to make it true:
- not other == false
- not url == nil
- not other == nil
- not other.url == nil
- url == other.url
Among them,
- No1 and No3 can be put togher by putting
url
in the condition. - No4 is redundant under No2 and No5 because if
url
is notnil
, andother.url
isurl
, thenother.url
is notnil
.
Unique objects in the ruby array
Unless specified otherwise, no two objects are the same:
Object.new.eql?(Object.new)
# => false
Thus, where #uniq
is concerned, all 150 Airplane
instances are unique, with no duplicates.
The easiest way to fix this is to provide the uniqueness criterion to #uniq
:
planes.uniq(&:model)
The other way is to define what "duplicate" means for the Airplane
class:
class Airplane
attr_accessor :model
def initialize(model)
@model = model
end
def ==(other)
other.class == self.class && other.model == self.model
end
alias_method :eql?, :==
def hash
self.model.hash
end
end
However, this solution will make two airplanes of the same model the same airplane, in all cases, which might have unintended consequences in other places.
Deleting a modified object from a set in a no-op?
Yes, this is a bug or at least I'd call it a bug. Some would call this "an implementation detail accidentally leaking to the outside world" but that's just fancy pants city-boy talk for bug.
The problem has two main causes:
- You're modifying elements of the Set without Set knowing about it.
- The standard Ruby Set is implemented as a Hash.
The result is that you're modifying the internal Hash's keys without the Hash knowing about it and that confuses the poor Hash into not really knowing what keys it has anymore. The Hash class has a rehash
method:
rehash → hsh
Rebuilds the hash based on the current hash values for each key. If values of key objects have changed since they were inserted, this method will reindex hsh.
a = [ "a", "b" ]
c = [ "c", "d" ]
h = { a => 100, c => 300 }
h[a] #=> 100
a[0] = "z"
h[a] #=> nil
h.rehash #=> {["z", "b"]=>100, ["c", "d"]=>300}
h[a] #=> 100
Notice the interesting behavior in the example included with the rehash
documentation. Hashes keep track of things using the k.hash
values for the key k
. If you have an array as a key and you change the array, you can change the array's hash
value as well; the result is that the Hash still has that array as a key but the Hash won't be able to find that array as a key because it will be looking in the bucket for the new hash
value but the array will be in the bucket for the old hash
value. But, if you rehash
the Hash, it will all of a sudden be able to find all of its keys again and the senility goes away. Similar problems will occur with non-array keys: you just have to change the key in such a way that its hash
value changes and the Hash containing that key will get confused and wander around lost until you rehash
it.
The Set class uses a Hash internally to store its members and the members are used as the hash's keys. So, if you change a member, the Set will get confused. If Set had a rehash
method then you could kludge around the problem by slapping the Set upside the head with rehash
to knock some sense into it; alas, there is no such method in Set. However, you can monkey patch your own in:
class Set
def rehash
@hash.rehash
end
end
Then you can change the keys, call rehash
on the Set, and your delete
(and various other methods such as member?
) will work properly.
What's the difference between equal?, eql?, ===, and ==?
I'm going to heavily quote the Object documentation here, because I think it has some great explanations. I encourage you to read it, and also the documentation for these methods as they're overridden in other classes, like String.
Side note: if you want to try these out for yourself on different objects, use something like this:
class Object
def all_equals(o)
ops = [:==, :===, :eql?, :equal?]
Hash[ops.map(&:to_s).zip(ops.map {|s| send(s, o) })]
end
end
"a".all_equals "a" # => {"=="=>true, "==="=>true, "eql?"=>true, "equal?"=>false}
==
— generic "equality"
At the Object level,
==
returns true only ifobj
andother
are the same object. Typically, this method is overridden in descendant classes to provide class-specific meaning.
This is the most common comparison, and thus the most fundamental place where you (as the author of a class) get to decide if two objects are "equal" or not.
===
— case equality
For class Object, effectively the same as calling
#==
, but typically overridden by descendants to provide meaningful semantics in case statements.
This is incredibly useful. Examples of things which have interesting ===
implementations:
- Range
- Regex
- Proc (in Ruby 1.9)
So you can do things like:
case some_object
when /a regex/
# The regex matches
when 2..4
# some_object is in the range 2..4
when lambda {|x| some_crazy_custom_predicate }
# the lambda returned true
end
See my answer here for a neat example of how case
+Regex
can make code a lot cleaner. And of course, by providing your own ===
implementation, you can get custom case
semantics.
eql?
— Hash
equality
The
eql?
method returns true ifobj
andother
refer to the same hash key. This is used byHash
to test members for equality. For objects of classObject
,eql?
is synonymous with==
. Subclasses normally continue this tradition by aliasingeql?
to their overridden==
method, but there are exceptions.Numeric
types, for example, perform type conversion across==
, but not acrosseql?
, so:1 == 1.0 #=> true
1.eql? 1.0 #=> false
So you're free to override this for your own uses, or you can override ==
and use alias :eql? :==
so the two methods behave the same way.
equal?
— identity comparison
Unlike
==
, theequal?
method should never be overridden by subclasses: it is used to determine object identity (that is,a.equal?(b)
iffa
is the same object asb
).
This is effectively pointer comparison.
Return object after performing intersection of two arrays based on attribute
you can:
1 :
override the eql?(other)
method then the array intersection will work
class Link < ApplicationRecord
def eql?(other)
self.class == other.class && self.id == other&.id # classes comparing class is a guard here
end
# you should always update the hash if you are overriding the eql?() https://stackoverflow.com/a/54961965/5872935
def hash
self.id.hash
end
end
2:
use array.select
:
array_links.flat_map {|i| selected_links.select {|k| k.user_id == i.user_id }}
what does a singleton method belong to if the metaclass method is wrongly overridden?
Still the metaclass, you've just removed your ability to access it directly...
foo.instance_eval { class << self; self; end.instance_methods.include?(:shout) }
=> true
How to declare what need for include ruby module
There are some existing mixins, classes, and methods in the Ruby core library that have the exact same problem, e.g. Enumerable
, Comparable
, Range
, Hash
, Array#uniq
: they require certain behavior from other objects in order to work. Some examples are:
Enumerable
:The class must provide a method
each
, which yields successive members of the collection. IfEnumerable#max
,#min
, or#sort
is used, the objects in the collection must also implement a meaningful<=>
operator […]Comparable
:The class must define the
<=>
operator, which compares the receiver against another object, returning -1, 0, or +1 depending on whether the receiver is less than, equal to, or greater than the other object. If the other object is not comparable then the<=>
operator should return nil.Range
:Ranges can be constructed using any objects that can be compared using the
<=>
operator. Methods that treat the range as a sequence (#each
and methods inherited fromEnumerable
) expect the begin object to implement asucc
method to return the next object in sequence. Thestep
andinclude?
methods require the begin object to implementsucc
or to be numeric.Hash
:A user-defined class may be used as a hash key if the
hash
andeql?
methods are overridden to provide meaningful behavior.And in order to define what "meaningful behavior" means, the documentation of
Hash
further links to the documentation ofObject#hash
andObject#eql?
:Object#hash
:[…] This function must have the property that
a.eql?(b)
impliesa.hash == b.hash
. […]Object#eql?
:[…] The
eql?
method returns true if obj and other refer to the same hash key. […]
So, as you can see, your question is a quite common one, and the answer is: documentation.
Why does IRB not ignore EOF as per IRB.conf hash
The docs for irb
say about that configuration:
**conf.ignore_eof = true/false**
Whether ^D (control-d) will be ignored or not. If false is set, ^D means quit.
So, no that setting isn't meant to do what you're looking for. As far as I can tell, there isn't a way to do what you want with irb
. The closest would be to start irb
without an argument, then use require './foo.rb'
to run that file.
Related Topics
Could Not Find Gem 'Logstash-Devutils (>= 0) Ruby' in Any of the Gem Sources
Using $1, $2, etc. Global Variables Inside Method Definition
When to Use Keyword Arguments Aka Named Parameters in Ruby
Ruby Equivalent of C#'s 'Yield' Keyword, Or, Creating Sequences Without Preallocating Memory
Ruby Minitest Assert_Output Syntax
How to Define a Method in Ruby Using Splat and an Optional Hash at the Same Time
How to Get Error Messages from Ruby Threads
Are There More Elegant Ways to Prevent Negative Numbers in Ruby
Exclude Option from Collection.Map in Ruby on Rails
Does It Matter If a Conditional Statement Comes Before or After the Expression
Rails Is Not Using My Global Ruby Version
Ruby - What's the Difference Between Single and Double Quotes
Rails Routing - Custom Routes for Resources
How to Run Perl and Ruby Scripts as Tasks in Ant
Installing MySQL-2.9.0 Gem on Windows Fails Due to Lack of Libmysql