Initializing a Hash with Empty Array Unexpected Behaviour

Initializing a Hash with empty array unexpected behaviour

Just do

a = Hash.new { |h, k| h[k] = [] }
a[1] << "asd"
a # => {1=>["asd"]}

Read the below lines from the Hash::new documentation. It really explains why you didn't get the desired result.

new(obj) → new_hash

If obj is specified, this single object will be used for all default values.

new {|hash, key| block } → new_hash

If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.

You can test by hand :

a = Hash.new([])
a[1].object_id # => 2160424560
a[2].object_id # => 2160424560

Now with the above style of Hash object creation, you can see every access to an unknown key, returning back the same default object. Now the other way, I meant block way :

b = Hash.new { |h, k| [] }
b[2].object_id # => 2168989980
b[1].object_id # => 2168933180

So, with the block form, every unknown key access, returning a new Array object.

Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])

First, note that this behavior applies to any default value that is subsequently mutated (e.g. hashes and strings), not just arrays. It also applies similarly to the populated elements in Array.new(3, []).

TL;DR: Use Hash.new { |h, k| h[k] = [] } if you want the most idiomatic solution and don’t care why.



What doesn’t work

Why Hash.new([]) doesn’t work

Let’s look more in-depth at why Hash.new([]) doesn’t work:

h = Hash.new([])
h[0] << 'a' #=> ["a"]
h[1] << 'b' #=> ["a", "b"]
h[1] #=> ["a", "b"]

h[0].object_id == h[1].object_id #=> true
h #=> {}

We can see that our default object is being reused and mutated (this is because it is passed as the one and only default value, the hash has no way of getting a fresh, new default value), but why are there no keys or values in the array, despite h[1] still giving us a value? Here’s a hint:

h[42]  #=> ["a", "b"]

The array returned by each [] call is just the default value, which we’ve been mutating all this time so now contains our new values. Since << doesn’t assign to the hash (there can never be assignment in Ruby without an = present), we’ve never put anything into our actual hash. Instead we have to use <<= (which is to << as += is to +):

h[2] <<= 'c'  #=> ["a", "b", "c"]
h #=> {2=>["a", "b", "c"]}

This is the same as:

h[2] = (h[2] << 'c')

Why Hash.new { [] } doesn’t work

Using Hash.new { [] } solves the problem of reusing and mutating the original default value (as the block given is called each time, returning a new array), but not the assignment problem:

h = Hash.new { [] }
h[0] << 'a' #=> ["a"]
h[1] <<= 'b' #=> ["b"]
h #=> {1=>["b"]}


What does work

The assignment way

If we remember to always use <<=, then Hash.new { [] } is a viable solution, but it’s a bit odd and non-idiomatic (I’ve never seen <<= used in the wild). It’s also prone to subtle bugs if << is inadvertently used.

The mutable way

The documentation for Hash.new states (emphasis my own):

If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.

So we must store the default value in the hash from within the block if we wish to use << instead of <<=:

h = Hash.new { |h, k| h[k] = [] }
h[0] << 'a' #=> ["a"]
h[1] << 'b' #=> ["b"]
h #=> {0=>["a"], 1=>["b"]}

This effectively moves the assignment from our individual calls (which would use <<=) to the block passed to Hash.new, removing the burden of unexpected behavior when using <<.

Note that there is one functional difference between this method and the others: this way assigns the default value upon reading (as the assignment always happens inside the block). For example:

h1 = Hash.new { |h, k| h[k] = [] }
h1[:x]
h1 #=> {:x=>[]}

h2 = Hash.new { [] }
h2[:x]
h2 #=> {}

The immutable way

You may be wondering why Hash.new([]) doesn’t work while Hash.new(0) works just fine. The key is that Numerics in Ruby are immutable, so we naturally never end up mutating them in-place. If we treated our default value as immutable, we could use Hash.new([]) just fine too:

h = Hash.new([].freeze)
h[0] += ['a'] #=> ["a"]
h[1] += ['b'] #=> ["b"]
h[2] #=> []
h #=> {0=>["a"], 1=>["b"]}

However, note that ([].freeze + [].freeze).frozen? == false. So, if you want to ensure that the immutability is preserved throughout, then you must take care to re-freeze the new object.



Conclusion

Of all the ways, I personally prefer “the immutable way”—immutability generally makes reasoning about things much simpler. It is, after all, the only method that has no possibility of hidden or subtle unexpected behavior. However, the most common and idiomatic way is “the mutable way”.

As a final aside, this behavior of Hash default values is noted in Ruby Koans.


This isn’t strictly true, methods like instance_variable_set bypass this, but they must exist for metaprogramming since the l-value in = cannot be dynamic.

Can't use an array as default values for Ruby Hash?

Try the following instead:

hash = Hash.new{|h, k| h[k] = []}
hash['a'] << 1 # => [1]
hash['b'] << 2 # => [2]

The reason you got your unexpected results is that you specified an empty array as default value, but the same array is used; no copy is done. The right way is to initialize the value with a new empty array, as in my code.

Creating a Hash with values as arrays and default value as empty array

Lakshmi is right. When you created the Hash using Hash.new([]), you created one array object.

Hence, the same array is returned for every missing key in the Hash.

That is why, if the shared array is edited, the change is reflected across all the missing keys.

Using:

Hash.new { |h, k| h[k] = [] }

Creates and assigns a new array for each missing key in the Hash, so that it is a unique object.

How can I initialize an Array inside a Hash in Ruby

@my_hash = Hash.new(Array.new)

This creates exactly one array object, which is returned every time a key is not found. Since you only ever mutate that array and never create a new one, all your keys map to the same array.

What you want to do is:

@my_hash = Hash.new {|h,k| h[k] = Array.new }

or simply

@my_hash = Hash.new {|h,k| h[k] = [] }

Passing a block to Hash.new differs from simply passing an argument in 2 ways:

  1. The block is executed every time a key is not found. Thus you'll get a new array each time. In the version with an argument, that argument is evaluated once (before new is called) and the result of that is returned every time.

  2. By doing h[k] = you actually insert the key into the hash. If you don't do this just accessing @my_hash[some_key] won't actually cause some_key to be inserted in the hash.

Hash default value not being used

When you did a[:key] << 2, you slipped that empty array default value out and added 2 to it (modifying the actual array, not the reference) without letting the hash object a know that you had changed anything. You modified the object that a was using as a default, so you will see this as well:

p a[:wat] #=> [2]
p a[:anything] #=> [2]

In the second example, you made a new array, and use b[:key]= which tells b that it has a value under that key.

Try this if you want the best of both worlds:

c = Hash.new([])
c[:key] += [2]

This will access c[:key] and make a new array with + and reassign it.

What's going on with this hash?

Note that all arguments, unlike blocks, are evaluated only once and prior to the method call.

  • In step 1, you are assigning a particular array as the default value. This array instance will be used for the default value of h. Notice that, since you have not set the default value using a block, calling a key-value pair will not assign that to the hash.
  • In step 2, the array instance for the default value is called because 'a' is not a key of the hash. You are modifying this array instance.
  • Steps 3 and 5 are the same thing; since you have not assigned the default value of h with a block, a key-value pair that is called is not assigned to the hash.
  • In step 4, you are just calling the default value.

Compare your code with this:

h = Hash.new{|h, k| h[k] = []}

which will generate a new array each time a previously-uncalled key is called, and will assign that key-value pair to the hash.



Related Topics



Leave a reply



Submit