What Hash Function Does Ruby Use

How does ruby's String .hash method work?

The hash method is defined for all objects. See documentation:

Generates a Fixnum hash value for this
object. This function must have the
property that a.eql?(b) implies a.hash == b.hash.
The hash value is used by class Hash. Any hash value that
exceeds the capacity of a Fixnum will
be truncated before being used.

So the String.hash method is defined in C-Code. Basically (over-simplified) it just sums up the characters in that string.

Store functions in hash

Not exactly like this, no. You need to get a hold of the Method proxy object for the method and store that in the Hash:

hash = { 'v' => method(:show_version) }

And you need to call the Method object:

hash['v'].()

Method duck-types Proc, so you could even store simple Procs alongside Methods in the Hash and there would be no need to distinguish between them, because both of them are called the same way:

hash['h'] = -> { puts 'Hello' }
hash['h'].()
# Hello

Ruby function as value of hash

Functions/methods are one of the few things in Ruby that are not objects, so you can't use them as keys or values in hashes. The closest thing to a function that is an object would be a proc. So you are best off using these...

The other answers pretty much listed all possible ways of how to put a proc into a hash as value, but I'll summarize it nonetheless ;)

hash = {}

hash['variant1'] = Proc.new {|var| var + 2}
hash['variant2'] = proc {|var| var + 2}
hash['variant3'] = lambda {|var| var + 2}

def func(var)
var + 2
end

hash['variant4'] = method(:func) # the *method* method returns a proc
# describing the method's body

there are also different ways to evaluate procs:

hash['variant1'].call(2) # => 4
hash['variant1'][2] # => 4
hash['variant1'].(2) # => 4

What hash function does Ruby use?

The standard Ruby implementation uses the Murmur hash for some types (integer, string)

From string.c:1901:

/* MurmurHash described in http://murmurhash.googlepages.com/ */
static unsigned int
hash(const unsigned char * data, int len, unsigned int h)

(note that this function seems to be renamed to st_hash in the SVN trunk)

Search for rb_memhash in the source code if you want to know where it gets used. I have used the Murmur2 hash in an own project before, it is very fast and has good cryptographic properties (but not good enough to be used as cryptographic hash function).

Ruby internals and how to guarantee unique hash values

Can you share with us how you came to the conclusion that Ruby uses only the hash value to determine equality?

The text below is to explain to others your excellent point that the probability of computing the same hash value for two different keys is not zero, so how can the Hash class rely on just the hash value to determine equality?

For the purpose of this discussion I will refer to Ruby hashes as maps, so as not to confuse the 2 uses of the term hash in the Ruby language (1, a computed value on an object, and 2, a map/dictionary of pairs of values and unique keys).

As I understand it, hash values in maps, sets, etc. are used as a quick first pass at determining possible equality. That is, if the hashes of 2 objects are equal, then it is possible that the 2 objects are equal; but it's also possible that the 2 objects are not equal, but coincidentally produce the same hash value.

In other words, the only sure thing you can tell about equality from the hash values of the objects being compared is that if hash1 != hash2 then the objects are definitely not equal.

If the 2 hashes are equal, then the 2 objects must be compared by their content (in Ruby, by calling the == method, I believe).

So comparing hashes is not a substitute for comparing the objects themselves, it's just a quick first pass used to optimize performance.

What is `hash` in ruby?

TL;DR – it's the hash value for Ruby's top-level object, equivalent to self.hash.

Here's a little debugging help:

irb(main):001:0> hash
#=> 3220857809431415791

irb(main):002:0> defined? hash
#=> "method"

irb(main):003:0> method(:hash)
#=> #<Method: Object(Kernel)#hash>

You can now lookup Object#hash1 online:

http://ruby-doc.org/core-2.3.1/Object.html#method-i-hash

Or in IRB:

irb(main):004:0> help "Object#hash"
= Object#hash

(from ruby core)
------------------------------------------------------------------------------
obj.hash -> fixnum

------------------------------------------------------------------------------

Generates a Fixnum hash value for this object. This function must have the
property that a.eql?(b) implies a.hash == b.hash.

The hash value is used along with #eql? by the Hash class to determine if two
objects reference the same hash key. Any hash value that exceeds the capacity
of a Fixnum will be truncated before being used.

The hash value for an object may not be identical across invocations or
implementations of Ruby. If you need a stable identifier across Ruby
invocations and implementations you will need to generate one with a custom
method.

#=> nil
irb(main):005:0>

1 Object(Kernel)#hash actually means that hash is defined in Kernel, but as stated in the documentation for Object:

Although the instance methods of Object are defined by the Kernel module, we have chosen to document them here for clarity.

Storing a function in the value for a key within a hash

You can use the send keyword

send h[step]

since you writing the method name directly in value part of the hash, the call is being made, but If you store the method names as a string and then if you call by send method as shown below, it would work.

def hi
puts 'hi'
end

def hello
puts 'hello'
end

h = {
1 => 'hi',
2 => 'hello',
}

send h[1]

Why does Ruby's hash method vary across runs?

According to page 23 of http://patshaughnessy.net/Ruby-Under-a-Microscope-Rough-Draft-May.pdf

Here’s how Ruby’s hash function actually works ... [snip] ... For string and arrays it works differently. In this case, Ruby actually iterates through all
of the characters
in the string or elements in the array and calculates a cumulative hash value; this guarantees that the
hash value will always be the same for any instance of a string or array, and will always change if any of
the values in that string or array change.

And:

Also, Ruby 1.9 and Ruby 2.0 initialize MurmurHash using a random seed value which is
reinitialized each time you restart Ruby. This means that if you stop and restart Ruby you’ll
get different hash values for the same input data. It also means if you try this yourself
you’ll get different values than I did above. However, the hash values will always be the
same within the same Ruby process.



Related Topics



Leave a reply



Submit