When is a block or object that is passed to Hash.new created or run?
For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.
The task
Suppose you are given an array
arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]
and wish to to create the hash{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }
First solutionh = {}
arr.each do |k,v|
h[k] = [] unless h.key?(k)
h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
This is quite straightforward.Second solution
More Ruby-like is to write:
h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
When Ruby sees (h[k] ||= []) << v
the first thing she does is expand it to(h[k] = h[k] || []) << v
If h
does not have a key k
, h[k] #=> nil
, so the expression becomes(h[k] = nil || []) << v
which becomes(h[k] = []) << v
soh[k] #=> [v]
Note that h[k]
on the left of equality uses the method Hash#[]=, whereas h[k]
on the right employs Hash#[].This solution requires that none of the hash values equal nil
.
Third solution
A third approach is to give the hash a default value. If a hash h
does not have a key k
, h[k]
returns the default value. There are two types of default values.
Passing the default value as an argument to Hash::new
If an empty array is passed as an argument to Hash::new
, that value becomes the default value:
a = []
a.object_id
#=> 70339916855860
g = Hash.new(a)
#=> {}
g[k]
returns []
when h
does not have a key k
. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we writex = g[:cat] << "bo"
#=> ["bo"]
y = g[:dog] << "diva"
#=> ["bo", "diva"]
x #=> ["bo", "diva"]
This is because the values of :cat
and :dog
are both set equal to the same object, an empty array. We can see this by examining object_id
s:x.object_id
#=> 70339916855860
y.object_id
#=> 70339916855860
Giving Hash::new
a block which returns the default valueThe second form of default value is to perform a block calculation. If we define the hash with a block:
h = Hash.new { |h,k| h[key] = [] }
then if h
does not have a key k
, h[k]
will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h
is the newly-created empty hash. This allows us to writeh = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
As the first element passed to the block is arr.first
, the block variables are assigned values by evaluatingk, v = arr.first
#=> [:dog, "fido"]
k #=> :dog
v #=> "fido"
The block calculation is thereforeh[k] << v
#=> h[:dog] << "fido"
but since h
does not (yet) have a key :dog
, the block is triggered, setting h[k]
equal to []
and then that empty array is appended with "fido", so thath #=> { :dog=>["fido"] }
Similarly, after the next two elements of arr
are passed to the block we haveh #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }
When the next (fourth) element of arr
is passed to the block, we evaluateh[:dog] << "diva"
but now h
does have a key, so the default does not apply and we end up withh #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]}
The last element of arr
is processed similarly.Note that, when using Hash::new with a block, we could write something like this:
h = Hash.new { launch_missiles("any time now") }
in which case h[k]
would be set equal to the return value of launch_missiles
. In other words, anything can be done in the block.Even more Ruby-like
Lastly, the more Ruby-like way of writing
h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
is to use Enumerable#each_with_object:arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
#=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
which eliminates two lines of code.Which is best?
Personally, I am indifferent to the second and third solutions. Both are used in practice.
Ruby Hash.new with a block need in-depth explanation
arr = [1, 2, 1, 3, 2, 1]
You could write your code with no bells or whistlesdef duped_index(arr)
result = {}
arr.each_with_index do |ele, idx|
result[ele] = [] unless result.key?(ele)
result[ele] << idx
end
result.select { |ele, indices| indices.length > 1 }
end
duped_index(arr)
#=> {1=>[0, 2, 5], 2=>[1, 4]}
Another way is create empty arrays on the fly, as neededdef duped_index(arr)
result = {}
arr.each_with_index { |ele, idx| (result[ele] ||= []) << idx }
result.select { |ele, indices| indices.length > 1 }
end
duped_index(arr)
#=> {1=>[0, 2, 5], 2=>[1, 4]}
Ruby's parser expands the abbreviated assignment result[ele] ||= []
to:result[ele] = result[ele] || = []
If result
does not have a key ele
, result[ele] #=> nil
, soresult[ele] = nil || = []
#=> []
If result
has a key ele
result[ele]
remains unchanged. Therefore,(result[ele] ||= []) << idx
causes idx
to be appended to the array (empty or otherwise) that is the value of result
for the key ele
.This method would more commonly be written as follows:
def duped_index(arr)
arr.each_with_index.with_object({}) { |(ele, idx), result|
(result[ele] ||= []) << idx }.
select { |ele, indices| indices.length > 1 }
end
A third way is to create a hash with a default proc, as in the questionSuppose:
result = Hash.new { |hash, key| hash[key] = [] }
#=> {}
Now perform the following operation:result['dog'] << 'woof'
#=> ["woof"]
result
#=> {"dog"=>["woof"]}
When result['dog']
is executed Ruby sees that result.key? #=> false
, so she executes the block, first by assigning values to the block variables:hash, key = [result, 'dog']
#=> [{}, 'dog']
hash
#=> {}
key
#=> 'dog'
Then executes:hash['key'] = []
resulting in:result
#=> { 'dog'=>[] }
She then executes:result['dog'] << 'woof'
result
#=> {"dog"=>["woof"]}
Now suppose we execute:result['dog'] << 'I love kibble!'
result
#=> {"dog"=>["woof", "I love kibble!"]}
This time Ruby sees that result
has a key 'dog'
, so she simply appends "I love kibble!"
to the array result['dog']
, without referencing the block.Let's take another example:
result = Hash.new do |hash, key|
puts "I just launched the missiles...just kidding"
hash[key] = []
end
result['dog'] << 'woof'
I just launched the missiles...just kidding
#=> ["woof"]
The behaviour is the same as before except a message is displayed as well. The point is that you can put any code you like in the block, extracting data from database being an example (though I don't think that's a common use of default procs).The method using this form of Hash#new would commonly be written:
def duped_index(arr)
arr.each_with_index.
with_object(Hash.new { |h,k| h[k]=[] }) { |(ele,idx), result|
result[ele] << idx }.
select { |ele, indices| indices.length > 1 }
end
The choice of which approach to take is mainly a matter of taste, but I expect most Rubyists would elect #2 or #3. Working with Hashes that have a default value
0 will be the fallback if you try to access a key in the hash that doesn't exist
For example:
count = Hash.new
-> count['key'] => nil
vs
count = Hash.new(0)
-> count['key'] => 0
Hash.new([]) does not behave as expected
When you use the default argument for a Hash, the same object is used for all keys that have not been explicitly set. This means that only one array is being used here, the one you passed into Hash.new
. See below for evidence of that.
>> h = Hash.new([])
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:bar, :baz]
>> h[:foo].object_id
=> 2177177000
>> h[:bar].object_id
=> 2177177000
The weird thing is that as you found, if you inspect the hash, you'll find that it is empty! This is because only the default object has been modified, no keys have yet been assigned.Fortunately, there is another way to do default values for hashes. You can provide a default block instead:
>> h = Hash.new { |h,k| h[k] = [] }
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:baz]
>> h[:foo].object_id
=> 2176949560
>> h[:bar].object_id
=> 2176921940
When you use this approach, the block gets executed every time an unassigned key is used, and it is provided the hash itself and the key as an argument. By assigning the default value within the block, you can be sure that a new object will get created for each distinct key, and that the assignment will happen automatically. This is the idiomatic way of creating a "Hash of Arrays" in Ruby, and is generally safer to use than the default argument approach.That said, if you're working with immutable values (like numbers), doing something like Hash.new(0)
is safe, as you'll only change those values by re-assignment. But because I prefer to keep fewer concepts in my head, I pretty much use the block form exclusively.
Ruby Hash: creating a default value for non-existing elements
Hashes have a thing called a default_proc
, which is simply a proc that Ruby runs when you try to access a hash key that doesn't exist. This proc receives both the hash itself and the target key as parameters.
You can set a Hash's default_proc
at any time. Passing a block parameter to Hash.new
simply allows you to initialize a Hash and set its default_proc
in one step:
h = Hash.new
h.default_proc = proc{ |hash, key| hash[key] = 'foo' }
# The above is equivalent to:
h = Hash.new{ |hash, key| hash[key] = 'foo' }
We can also access the default proc for a hash by calling h.default_proc
. Knowing this, and knowing that the ampersand (&
) allows a proc passed as a normal parameter to be treated as a block parameter, we can now explain how this code works:cool_hash = Hash.new{ |h, k| h[k] = Hash.new(&h.default_proc) }
The block passed to Hash.new
will be called when we try to access a key that doesn't exist. This block will receive the hash itself as h
, and the key we tried to access as k
. We respond by setting h[k]
(that is, the value of the key we're trying to access) to a new hash. Into the constructor of this new hash, we pass the "parent" hash's default_proc
, using an ampersand to force it to be interpreted as a block parameter. This is the equivalent of doing the following, to an infinite depth:cool_hash = Hash.new{ |h, k| h[k] = Hash.new{ |h, k| h[k] = Hash.new{ ... } } }
The end result is that the key we tried to access was initialized to a new Hash, which itself will initialize any "not found" keys to a new Hash, which itself will have the same behavior, etc. It's hashes all the way down. How to dispose() an object that is created in a SELECT...CASE statement?
Can you do something like this...
Public Shared Function HashMe(ByVal plainText As String, ByVal hash2use As String) As Byte()
Dim returnHash As Byte()
Select Case hash2use.ToUpper
Case "SHA1"
Using (HashAlgorith hashAlgorith = new HashAlgorith())
returnHash = hashAlgorith.ComputeHash(Encoding.UTF8.GetBytes(plainText))
End Using
... do the same for the other cases ...
End Select
Return returnHash
End Function
Ruby hash default value behavior
The other answers seem to indicate that the difference in behavior is due to Integer
s being immutable and Array
s being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.
The question is not whether Array
s are mutable, the question is whether you mutate it.
You can get both the behaviors you see above, just by using Array
s. Observe:
One default Array
with mutation
hsh = Hash.new([])
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
One default Array
without mutation
hsh = Hash.new([])
hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']
hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array
hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.
A new, different Array
every time with mutation
hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }
Why does this block not run when it is stored in a proc?
You have a syntax error in the line x = calculation(5,6) *ankh
. To pass a method as a block, you use the &
-operator.
x = calculation(5,6,&ankh)
Need a simple explanation of the inject method
You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.
At the end of the process, inject returns the accumulator, which in this case is the sum of all the values in the array, or 10.
Here's another simple example to create a hash from an array of objects, keyed by their string representation:
[1,"a",Object.new,:hi].inject({}) do |hash, item|
hash[item.to_s] = item
hash
end
In this case, we are defaulting our accumulator to an empty hash, then populating it each time the block executes. Notice we must return the hash as the last line of the block, because the result of the block will be stored back in the accumulator.
Related Topics
Bundler Using Wrong Ruby Version
Ruby Readline Fails If Process Started with Arguments
Where Are Keywords Defined in Ruby
Sinatra on Nginx Configuration - What's Wrong
Browsing Ruby Code a La Smalltalk
Ruby/Rails 3.1: Given a Url String, Remove Path
Ruby: How to Escape Url with Square Brackets [ and ]
Hw Impossibility: "Create a Rock Paper Scissors Program in Ruby Without Using Conditionals"
How to Set in a Middleware a Variable Accessible in All My Application
Streaming CSV Download from Rails 3.2 App
Why The Unit Test Frameworks in Fortran Rely on Ruby Instead of Fortran Itself
How to Include Actionmailer Class in Rake Task
Ruby Datamapper Table Inheritance with Associations
Rvm - Macports Won't Update Through Proxy
Broken Rails Routes After Implementing Single Table Inheritance
How to Use The "Self" Keyword in Rails
Prawn Doesn't Seem to Push Layout Down When Using Repeat(:All)