Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])
First, note that this behavior applies to any default value that is subsequently mutated (e.g. hashes and strings), not just arrays. It also applies similarly to the populated elements in Array.new(3) { [] }
.
TL;DR: Use Hash.new { |h, k| h[k] = [] }
if you want the most idiomatic solution and don’t care why.
What doesn’t work
Why Hash.new([])
doesn’t work
Let’s look more in-depth at why Hash.new([])
doesn’t work:
h = Hash.new([])
h[0] << 'a' #=> ["a"]
h[1] << 'b' #=> ["a", "b"]
h[1] #=> ["a", "b"]
h[0].object_id == h[1].object_id #=> true
h #=> {}
We can see that our default object is being reused and mutated (this is because it is passed as the one and only default value, the hash has no way of getting a fresh, new default value), but why are there no keys or values in the array, despite h[1]
still giving us a value? Here’s a hint:
h[42] #=> ["a", "b"]
The array returned by each []
call is just the default value, which we’ve been mutating all this time so now contains our new values. Since <<
doesn’t assign to the hash (there can never be assignment in Ruby without an =
present†), we’ve never put anything into our actual hash. Instead we have to use <<=
(which is to <<
as +=
is to +
):
h[2] <<= 'c' #=> ["a", "b", "c"]
h #=> {2=>["a", "b", "c"]}
This is the same as:
h[2] = (h[2] << 'c')
Why Hash.new { [] }
doesn’t work
Using Hash.new { [] }
solves the problem of reusing and mutating the original default value (as the block given is called each time, returning a new array), but not the assignment problem:
h = Hash.new { [] }
h[0] << 'a' #=> ["a"]
h[1] <<= 'b' #=> ["b"]
h #=> {1=>["b"]}
What does work
The assignment way
If we remember to always use <<=
, then Hash.new { [] }
is a viable solution, but it’s a bit odd and non-idiomatic (I’ve never seen <<=
used in the wild). It’s also prone to subtle bugs if <<
is inadvertently used.
The mutable way
The documentation for Hash.new
states (emphasis my own):
If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
So we must store the default value in the hash from within the block if we wish to use <<
instead of <<=
:
h = Hash.new { |h, k| h[k] = [] }
h[0] << 'a' #=> ["a"]
h[1] << 'b' #=> ["b"]
h #=> {0=>["a"], 1=>["b"]}
This effectively moves the assignment from our individual calls (which would use <<=
) to the block passed to Hash.new
, removing the burden of unexpected behavior when using <<
.
Note that there is one functional difference between this method and the others: this way assigns the default value upon reading (as the assignment always happens inside the block). For example:
h1 = Hash.new { |h, k| h[k] = [] }
h1[:x]
h1 #=> {:x=>[]}
h2 = Hash.new { [] }
h2[:x]
h2 #=> {}
The immutable way
You may be wondering why Hash.new([])
doesn’t work while Hash.new(0)
works just fine. The key is that Numerics in Ruby are immutable, so we naturally never end up mutating them in-place. If we treated our default value as immutable, we could use Hash.new([])
just fine too:
h = Hash.new([].freeze)
h[0] += ['a'] #=> ["a"]
h[1] += ['b'] #=> ["b"]
h[2] #=> []
h #=> {0=>["a"], 1=>["b"]}
However, note that ([].freeze + [].freeze).frozen? == false
. So, if you want to ensure that the immutability is preserved throughout, then you must take care to re-freeze the new object.
Conclusion
Of all the ways, I personally prefer “the immutable way”—immutability generally makes reasoning about things much simpler. It is, after all, the only method that has no possibility of hidden or subtle unexpected behavior. However, the most common and idiomatic way is “the mutable way”.
As a final aside, this behavior of Hash default values is noted in Ruby Koans.
† This isn’t strictly true, methods like instance_variable_set
bypass this, but they must exist for metaprogramming since the l-value in =
cannot be dynamic.
Why `Hash.new { |h, k| [] }` causes `hash[char] idx` to stop working
I can fix the issue by either initializing the hash using
Hash.new {|h,k| h[k] = []}
or [...]
That's actually the correct way to use the block variant. From the docs for Hash.new
:
"It is the block’s responsibility to store the value in the hash if required."
The block is called when accessing missing values via Hash#[]
. If you don't store the value, the hash remains empty. So the next time you attempt to access this value, it will still be missing and the block will be called again:
hash = Hash.new { rand }
hash[:foo] #=> 0.9548960551853385
hash[:foo] #=> 0.7535154376706064
hash[:foo] #=> 0.007113200178872958
hash[:foo] #=> 0.07621008793193496
hash #=> {}
The same happens for your Hash.new { [] }
attempt – you'll get a fresh array every time you call hash[char]
because the array is never stored in the hash. Your code is equivalent to:
def dupe_indices(arr)
hash = {}
arr.each.with_index do |char, idx|
[] << idx # <- obviously doesn't do anything useful
end
return hash
end
When Hash's default value is set as Hash, it is giving unexpected output
When you use the default value, no key/value is set. The default is simply returned instead of nil
.
I think you're imagining it works like this where the default is set on the key being accessed like ||=
.
default = {x: 0, y: 0}
foo = Hash.new
foo['bar'] ||= default
foo['bar'][:x] += 1
Instead, it works like this where the default is returned when there is no key.
default = {x: 0, y: 0}
foo = Hash.new
val = foo['bar'] || default
val[:x] += 1
Put another way, you're expecting this.
def [](key)
@data[key] ||= default
end
But it works like this.
def [](key)
@data[key] || default
end
But this behaviour appears to change if I provide, say, an integer instead of a Hash as the default value. For instance, if I do foo = Hash.new(1), then foo['bar'] += 1 the behaviour is what I would expect. foo is not empty, and the default value has not changed. – aardvarkk 4 mins ago
foo['bar'] += 1
is really shorthand for
default = foo['bar'] # fetch the default
foo['bar'] = default + 1 # sets 'bar' on foo
Note that it calls []=
on foo
.
foo['bar'][:x] += 1
is shorthand for...
default = foo['bar'] # fetch the default value
val = default[:x] # fetch :x from the default
default[:x] = val + 1 # set :x on the default value
Note that it calls []=
on the default value, not foo
.
strange Hash behavior for nested assignments with defaults
new(obj) → new_hash
If
obj
is specified, this single object will be used for all default values.
Now Hash.new([])
is holding the default Array
object. Now b[:a][:b] << 'hello'
you are entering, the "hello"
to the default Array
.The default value is being returned, when the key doesn't exist in the Hash
.
Don't think you are adding keys to the Hash objects, with this b[:a][:b] << 'hello' line.
b[:a]
is giving the default Hash
object, which is Hash.new([])
. Now on this Hash
object you are calling Hash#[]
using the key :b
, but as :b
is the non existent key, it is giving the default Array
object.
That's why b
, b.size
and b.keys
all are proving that Hash
is empty.
Finally.
Why is that I can access the value stored at b[:a][:b] yet b has a size of 0 and no keys?
Because, you added the value "Hello"
to the default Array
, as I mentioned above. That value is coming when you are using the line b[:a][:b]
.
Hash default-value creates all of the same instance
If you give Hash.new
an object (like another Hash.new
) this very object is the default value. It is shared across different keys.
default = []
hash = Hash.new(default)
hash[:one] << 1
# now default is [1] !
You want to use Hash.new with a block, so that something new happens each time a key was not found.
Like
h = Hash.new { |hash,new_key| hash[new_key] = {} }
There is great explanation about that e.g. in Ruby hash default value behavior . Your question is kind of a duplicate.
Also, as we figured out in the comments, you probably wanted to compare the object_id
and not the hash
value.
first_hash = {}
second_hash = {}
# .hash same, but different objects!
puts "initial, hash:"
puts first_hash.hash == second_hash.hash ? " - same" : " - differs"
puts "initial, object_id"
puts first_hash.object_id == second_hash.object_id ? " - same" : " - differs"
puts
# Change the world
# .hash different, and still different objects.
first_hash[:for] = "better"
puts "better world now, hash:"
puts first_hash.hash == second_hash.hash ? " - same" : " - differs"
puts "better world now, object_id"
puts first_hash.object_id == second_hash.object_id ? " - same" : " - differs"
Hash default value not being used
When you did a[:key] << 2
, you slipped that empty array default value out and added 2 to it (modifying the actual array, not the reference) without letting the hash object a
know that you had changed anything. You modified the object that a
was using as a default, so you will see this as well:
p a[:wat] #=> [2]
p a[:anything] #=> [2]
In the second example, you made a new array, and use b[:key]=
which tells b
that it has a value under that key.
Try this if you want the best of both worlds:
c = Hash.new([])
c[:key] += [2]
This will access c[:key]
and make a new array with +
and reassign it.
Running Hash.new([]) does what you expect but not in the way you expect it
foo = Hash.new([])
sets the default value of a not existed key to an array. Herefoo[:bar] << 'Item 1'
the :bar
key doesn't exist, so foo
uses an array to which you add a new element. By doing so you mutate the default value because the array is provided to you by a reference.
> foo = Hash.new([])
=> {}
> foo.default
=> []
If you call any not defined key on your hash you'll get this array:
> foo[:bar] << 'Item 1'
=> ["Item 1"]
> foo[:foo]
=> ["Item 1"]
To achieve your goal you should return a new array every time. It's possible by passing a block to Hash.new
, the block will be executed every time you access a not defined key:
> foo = Hash.new { |hash, key| hash[key] = [] }
=> {}
> foo[:bar] << 'Item 1'
=> ["Item 1"]
> foo[:bar]
=> ["Item 1"]
> foo.keys
=> [:bar]
> foo[:foo]
=> []
> foo[:foo] << 'Item 2'
=> ["Item 2"]
> foo
=> {:bar=>["Item 1"], :foo=>["Item 2"]}
Here is the documentation.
Modifying the default hash value
Hash's default value doesn't work like you're expecting it to. When you say h[k]
, the process goes like this:
- If we have a
k
key, return its value. - If we have a default value for the Hash, return that default value.
- If we have a block for providing default values, execute the block and return its return value.
Note that (2) and (3) say nothing at all about inserting k
into the Hash. The default value essentially turns h[k]
into this:
h.has_key?(k) ? h[k] : the_default_value
So simply accessing a non-existant key and getting the default value back won't add the missing key to the Hash.
Furthermore, anything of the form:
Hash.new([ ... ])
# or
Hash.new({ ... })
is almost always a mistake as you'll be sharing exactly the same default Array or Hash for for all default values. For example, if you do this:
h = Hash.new(['a'])
h[:k].push('b')
Then h[:i]
, h[:j]
, ... will all return ['a', 'b']
and that's rarely what you want.
I think you're looking for the block form of the default value:
h = Hash.new { |h, k| h[k] = [ 'alright' ] }
That will do two things:
- Accessing a non-existent key will add that key to the Hash and it will have the provided Array as its value.
- All of the default values will be distinct objects so altering one will not alter the rest.
Related Topics
What's the Difference Between Uri.Escape and Cgi.Escape
Why Does Installing Nokogiri on MAC Os Fail With Libiconv Is Missing
How to Validate a Date in Rails
Creating Matrix With 'Array.New(N, Array.New)'
Why Isn't the Eigenclass Equivalent to Self.Class, When It Looks So Similar
Get Names of All Files from a Folder With Ruby
Cannot Install Json Gem in Rails Using Windows
Cannot Load Such File - Zlib Even After Using Rvm Pkg Install Zlib
Rbenv Not Changing Ruby Version
Ruby 1.9: Invalid Byte Sequence in Utf-8
What Exactly Is the Singleton Class in Ruby
Pass Variables to Ruby Script Via Command Line
How to Break Outer Cycle in Ruby