When Is a Block or Object That Is Passed to Hash.New Created or Run

When is a block or object that is passed to Hash.new created or run?

For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.

The task

Suppose you are given an array

arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]  

and wish to to create the hash

{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }

First solution

h = {}
arr.each do |k,v|
h[k] = [] unless h.key?(k)
h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

This is quite straightforward.

Second solution

More Ruby-like is to write:

h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

When Ruby sees (h[k] ||= []) << v the first thing she does is expand it to

(h[k] = h[k] || []) << v

If h does not have a key k, h[k] #=> nil, so the expression becomes

(h[k] = nil || []) << v

which becomes

(h[k] = []) << v

so

h[k] #=> [v]

Note that h[k] on the left of equality uses the method Hash#[]=, whereas h[k] on the right employs Hash#[].

This solution requires that none of the hash values equal nil.

Third solution

A third approach is to give the hash a default value. If a hash h does not have a key k, h[k] returns the default value. There are two types of default values.

Passing the default value as an argument to Hash::new

If an empty array is passed as an argument to Hash::new, that value becomes the default value:

a = []
a.object_id
#=> 70339916855860
g = Hash.new(a)
#=> {}

g[k] returns [] when h does not have a key k. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we write

x = g[:cat] << "bo"
#=> ["bo"]
y = g[:dog] << "diva"
#=> ["bo", "diva"]
x #=> ["bo", "diva"]

This is because the values of :cat and :dog are both set equal to the same object, an empty array. We can see this by examining object_ids:

x.object_id
#=> 70339916855860
y.object_id
#=> 70339916855860

Giving Hash::new a block which returns the default value

The second form of default value is to perform a block calculation. If we define the hash with a block:

h = Hash.new { |h,k| h[key] = [] }

then if h does not have a key k, h[k] will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h is the newly-created empty hash. This allows us to write

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

As the first element passed to the block is arr.first, the block variables are assigned values by evaluating

k, v = arr.first
#=> [:dog, "fido"]
k #=> :dog
v #=> "fido"

The block calculation is therefore

h[k] << v
#=> h[:dog] << "fido"

but since h does not (yet) have a key :dog, the block is triggered, setting h[k] equal to [] and then that empty array is appended with "fido", so that

h #=> { :dog=>["fido"] }

Similarly, after the next two elements of arr are passed to the block we have

h #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }

When the next (fourth) element of arr is passed to the block, we evaluate

h[:dog] << "diva"

but now h does have a key, so the default does not apply and we end up with

h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]} 

The last element of arr is processed similarly.

Note that, when using Hash::new with a block, we could write something like this:

h = Hash.new { launch_missiles("any time now") }

in which case h[k] would be set equal to the return value of launch_missiles. In other words, anything can be done in the block.

Even more Ruby-like

Lastly, the more Ruby-like way of writing

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

is to use Enumerable#each_with_object:

arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
#=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

which eliminates two lines of code.

Which is best?

Personally, I am indifferent to the second and third solutions. Both are used in practice.

Ruby Hash.new with a block need in-depth explanation

arr = [1, 2, 1, 3, 2, 1]

You could write your code with no bells or whistles

def duped_index(arr)
result = {}
arr.each_with_index do |ele, idx|
result[ele] = [] unless result.key?(ele)
result[ele] << idx
end
result.select { |ele, indices| indices.length > 1 }
end

duped_index(arr)
#=> {1=>[0, 2, 5], 2=>[1, 4]}

Another way is create empty arrays on the fly, as needed

def duped_index(arr)
result = {}
arr.each_with_index { |ele, idx| (result[ele] ||= []) << idx }
result.select { |ele, indices| indices.length > 1 }
end

duped_index(arr)
#=> {1=>[0, 2, 5], 2=>[1, 4]}

Ruby's parser expands the abbreviated assignment result[ele] ||= [] to:

result[ele] = result[ele] || = []

If result does not have a key ele, result[ele] #=> nil, so

result[ele] = nil || = []
#=> []

If result has a key ele result[ele] remains unchanged. Therefore,

(result[ele] ||= []) << idx

causes idx to be appended to the array (empty or otherwise) that is the value of result for the key ele.

This method would more commonly be written as follows:

def duped_index(arr)
arr.each_with_index.with_object({}) { |(ele, idx), result|
(result[ele] ||= []) << idx }.
select { |ele, indices| indices.length > 1 }
end

A third way is to create a hash with a default proc, as in the question

Suppose:

result = Hash.new { |hash, key| hash[key] = [] }
#=> {}

Now perform the following operation:

result['dog'] << 'woof'
#=> ["woof"]
result
#=> {"dog"=>["woof"]}

When result['dog'] is executed Ruby sees that result.key? #=> false, so she executes the block, first by assigning values to the block variables:

hash, key = [result, 'dog']
#=> [{}, 'dog']
hash
#=> {}
key
#=> 'dog'

Then executes:

hash['key'] = []

resulting in:

result
#=> { 'dog'=>[] }

She then executes:

result['dog'] << 'woof'
result
#=> {"dog"=>["woof"]}

Now suppose we execute:

result['dog'] << 'I love kibble!'
result
#=> {"dog"=>["woof", "I love kibble!"]}

This time Ruby sees that result has a key 'dog', so she simply appends "I love kibble!" to the array result['dog'], without referencing the block.

Let's take another example:

result = Hash.new do |hash, key|
puts "I just launched the missiles...just kidding"
hash[key] = []
end

result['dog'] << 'woof'
I just launched the missiles...just kidding
#=> ["woof"]

The behaviour is the same as before except a message is displayed as well. The point is that you can put any code you like in the block, extracting data from database being an example (though I don't think that's a common use of default procs).

The method using this form of Hash#new would commonly be written:

def duped_index(arr)
arr.each_with_index.
with_object(Hash.new { |h,k| h[k]=[] }) { |(ele,idx), result|
result[ele] << idx }.
select { |ele, indices| indices.length > 1 }
end

The choice of which approach to take is mainly a matter of taste, but I expect most Rubyists would elect #2 or #3.

Working with Hashes that have a default value

0 will be the fallback if you try to access a key in the hash that doesn't exist

For example:

count = Hash.new -> count['key'] => nil

vs

count = Hash.new(0) -> count['key'] => 0

Hash.new([]) does not behave as expected

When you use the default argument for a Hash, the same object is used for all keys that have not been explicitly set. This means that only one array is being used here, the one you passed into Hash.new. See below for evidence of that.

>> h = Hash.new([])
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:bar, :baz]
>> h[:foo].object_id
=> 2177177000
>> h[:bar].object_id
=> 2177177000

The weird thing is that as you found, if you inspect the hash, you'll find that it is empty! This is because only the default object has been modified, no keys have yet been assigned.

Fortunately, there is another way to do default values for hashes. You can provide a default block instead:

>> h = Hash.new { |h,k| h[k] = [] }
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:baz]
>> h[:foo].object_id
=> 2176949560
>> h[:bar].object_id
=> 2176921940

When you use this approach, the block gets executed every time an unassigned key is used, and it is provided the hash itself and the key as an argument. By assigning the default value within the block, you can be sure that a new object will get created for each distinct key, and that the assignment will happen automatically. This is the idiomatic way of creating a "Hash of Arrays" in Ruby, and is generally safer to use than the default argument approach.

That said, if you're working with immutable values (like numbers), doing something like Hash.new(0) is safe, as you'll only change those values by re-assignment. But because I prefer to keep fewer concepts in my head, I pretty much use the block form exclusively.

Ruby Hash: creating a default value for non-existing elements

Hashes have a thing called a default_proc, which is simply a proc that Ruby runs when you try to access a hash key that doesn't exist. This proc receives both the hash itself and the target key as parameters.

You can set a Hash's default_proc at any time. Passing a block parameter to Hash.new simply allows you to initialize a Hash and set its default_proc in one step:

h = Hash.new
h.default_proc = proc{ |hash, key| hash[key] = 'foo' }

# The above is equivalent to:

h = Hash.new{ |hash, key| hash[key] = 'foo' }

We can also access the default proc for a hash by calling h.default_proc. Knowing this, and knowing that the ampersand (&) allows a proc passed as a normal parameter to be treated as a block parameter, we can now explain how this code works:

cool_hash = Hash.new{ |h, k| h[k] = Hash.new(&h.default_proc) }

The block passed to Hash.new will be called when we try to access a key that doesn't exist. This block will receive the hash itself as h, and the key we tried to access as k. We respond by setting h[k] (that is, the value of the key we're trying to access) to a new hash. Into the constructor of this new hash, we pass the "parent" hash's default_proc, using an ampersand to force it to be interpreted as a block parameter. This is the equivalent of doing the following, to an infinite depth:

cool_hash = Hash.new{ |h, k| h[k] = Hash.new{ |h, k| h[k] = Hash.new{ ... } } }

The end result is that the key we tried to access was initialized to a new Hash, which itself will initialize any "not found" keys to a new Hash, which itself will have the same behavior, etc. It's hashes all the way down.

How to dispose() an object that is created in a SELECT...CASE statement?

Can you do something like this...

Public Shared Function HashMe(ByVal plainText As String, ByVal hash2use As String) As Byte()
Dim returnHash As Byte()
Select Case hash2use.ToUpper
Case "SHA1"
Using (HashAlgorith hashAlgorith = new HashAlgorith())
returnHash = hashAlgorith.ComputeHash(Encoding.UTF8.GetBytes(plainText))
End Using
... do the same for the other cases ...
End Select
Return returnHash
End Function

Ruby hash default value behavior

The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.

The question is not whether Arrays are mutable, the question is whether you mutate it.

You can get both the behaviors you see above, just by using Arrays. Observe:

One default Array with mutation

hsh = Hash.new([])

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!

One default Array without mutation

hsh = Hash.new([])

hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']

hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array

hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.

A new, different Array every time with mutation

hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!

hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }

Why does this block not run when it is stored in a proc?

You have a syntax error in the line x = calculation(5,6) *ankh. To pass a method as a block, you use the &-operator.

x = calculation(5,6,&ankh)

Need a simple explanation of the inject method

You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.

At the end of the process, inject returns the accumulator, which in this case is the sum of all the values in the array, or 10.

Here's another simple example to create a hash from an array of objects, keyed by their string representation:

[1,"a",Object.new,:hi].inject({}) do |hash, item|
hash[item.to_s] = item
hash
end

In this case, we are defaulting our accumulator to an empty hash, then populating it each time the block executes. Notice we must return the hash as the last line of the block, because the result of the block will be stored back in the accumulator.



Related Topics



Leave a reply



Submit