Why Does Hash.New({}) Hide Hash Members

Why does Hash.new({}) hide hash members?

It is expected behaviour (across all ruby versions). And if you experiment a bit further, you'll see that you always access the same hash, no matter which key you use:

>> a[:a][:b] = 1
=> 1
>> a[:c][:d] = 2
=> 2
>> a[:d]
=> {:b=>1, :d=>2}

The way Hash.new with a default argument works is: If you do hash[key] it checks whether that key exists in the hash. If it does, it returns the value for that key. If not it returns the default value. It does not add the key to the hash and it will return the same default object (not a copy) every time.

To get what you want, you want to specify a default block instead. That way, the block will be executed every time you access a key that is not in the hash. Inside the block you can create a new Hash and set the key to "point" to that hash. Like so:

Hash.new { |h,k|  h[k] = {} }

Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])

First, note that this behavior applies to any default value that is subsequently mutated (e.g. hashes and strings), not just arrays. It also applies similarly to the populated elements in Array.new(3, []).

TL;DR: Use Hash.new { |h, k| h[k] = [] } if you want the most idiomatic solution and don’t care why.

What doesn’t work

Why `Hash.new([])` doesn’t work

Let’s look more in-depth at why Hash.new([]) doesn’t work:

h = Hash.new([])
h[0] << 'a'  #=> ["a"]
h[1] << 'b'  #=> ["a", "b"]
h[1]         #=> ["a", "b"]

h[0].object_id == h[1].object_id  #=> true
h  #=> {}

We can see that our default object is being reused and mutated (this is because it is passed as the one and only default value, the hash has no way of getting a fresh, new default value), but why are there no keys or values in the array, despite h[1] still giving us a value? Here’s a hint:

h[42]  #=> ["a", "b"]

The array returned by each [] call is just the default value, which we’ve been mutating all this time so now contains our new values. Since << doesn’t assign to the hash (there can never be assignment in Ruby without an = present^†), we’ve never put anything into our actual hash. Instead we have to use <<= (which is to << as += is to +):

h[2] <<= 'c'  #=> ["a", "b", "c"]
h             #=> {2=>["a", "b", "c"]}

This is the same as:

h[2] = (h[2] << 'c')

Why `Hash.new { [] }` doesn’t work

Using Hash.new { [] } solves the problem of reusing and mutating the original default value (as the block given is called each time, returning a new array), but not the assignment problem:

h = Hash.new { [] }
h[0] << 'a'   #=> ["a"]
h[1] <<= 'b'  #=> ["b"]
h             #=> {1=>["b"]}

What does work

The assignment way

If we remember to always use <<=, then Hash.new { [] } is a viable solution, but it’s a bit odd and non-idiomatic (I’ve never seen <<= used in the wild). It’s also prone to subtle bugs if << is inadvertently used.

The mutable way

The documentation for Hash.new states (emphasis my own):

If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.

So we must store the default value in the hash from within the block if we wish to use << instead of <<=:

h = Hash.new { |h, k| h[k] = [] }
h[0] << 'a'  #=> ["a"]
h[1] << 'b'  #=> ["b"]
h            #=> {0=>["a"], 1=>["b"]}

This effectively moves the assignment from our individual calls (which would use <<=) to the block passed to Hash.new, removing the burden of unexpected behavior when using <<.

Note that there is one functional difference between this method and the others: this way assigns the default value upon reading (as the assignment always happens inside the block). For example:

h1 = Hash.new { |h, k| h[k] = [] }
h1[:x]
h1  #=> {:x=>[]}

h2 = Hash.new { [] }
h2[:x]
h2  #=> {}

The immutable way

You may be wondering why Hash.new([]) doesn’t work while Hash.new(0) works just fine. The key is that Numerics in Ruby are immutable, so we naturally never end up mutating them in-place. If we treated our default value as immutable, we could use Hash.new([]) just fine too:

h = Hash.new([].freeze)
h[0] += ['a']  #=> ["a"]
h[1] += ['b']  #=> ["b"]
h[2]           #=> []
h              #=> {0=>["a"], 1=>["b"]}

However, note that ([].freeze + [].freeze).frozen? == false. So, if you want to ensure that the immutability is preserved throughout, then you must take care to re-freeze the new object.

Conclusion

Of all the ways, I personally prefer “the immutable way”—immutability generally makes reasoning about things much simpler. It is, after all, the only method that has no possibility of hidden or subtle unexpected behavior. However, the most common and idiomatic way is “the mutable way”.

As a final aside, this behavior of Hash default values is noted in Ruby Koans.

_{^† This isn’t strictly true, methods like instance_variable_set bypass this, but they must exist for metaprogramming since the l-value in = cannot be dynamic.}

Hash default value not being used

When you did a[:key] << 2, you slipped that empty array default value out and added 2 to it (modifying the actual array, not the reference) without letting the hash object a know that you had changed anything. You modified the object that a was using as a default, so you will see this as well:

p a[:wat] #=> [2]
p a[:anything] #=> [2]

In the second example, you made a new array, and use b[:key]= which tells b that it has a value under that key.

Try this if you want the best of both worlds:

c = Hash.new([])
c[:key] += [2]

This will access c[:key] and make a new array with + and reassign it.

How to remove a key from Hash and get the remaining hash in Ruby/Rails?

Rails has an except/except! method that returns the hash with those keys removed. If you're already using Rails, there's no sense in creating your own version of this.

class Hash
  # Returns a hash that includes everything but the given keys.
  #   hash = { a: true, b: false, c: nil}
  #   hash.except(:c) # => { a: true, b: false}
  #   hash # => { a: true, b: false, c: nil}
  #
  # This is useful for limiting a set of parameters to everything but a few known toggles:
  #   @person.update(params[:person].except(:admin))
  def except(*keys)
    dup.except!(*keys)
  end

  # Replaces the hash without the given keys.
  #   hash = { a: true, b: false, c: nil}
  #   hash.except!(:c) # => { a: true, b: false}
  #   hash # => { a: true, b: false }
  def except!(*keys)
    keys.each { |key| delete(key) }
    self
  end
end

Is this correct behaviour for a Ruby hash with a default value?

What's going on? Ruby's hiding data (1.9.3p125)

Ruby hides neither data nor its docs.

Default value you pass into the Hash constructor is returned whenever the key is not found in the hash. But this default value is never actually stored into the hash on its own.

To get what you want you should use Hash constructor with block and store default value into the hash yourself (on both levels of your nested hash):

hash = Hash.new { |hash, key| hash[key] = Hash.new { |h, k| h[k] = [] } } 

hash[1][2] << 3

p hash[1][2]  #=> [3]
p hash        #=> {1=>{2=>[3]}}
p hash.keys   #=> [1]
p hash.values #=> [{2=>[3]}]

How to optimize mapping hash that contains similar keys and values?

Use Symbols instead of constants.
Don't expose the mapping.

Constants in Ruby are mostly about information hiding. For example, if the key changes from consumer1 to consumer_1 as long as everything accesses the Hash with CONSUMER_1_TYPE you're ok. Why risk it?

Instead, fully hide the Hash. Now that it's hidden, constants are not necessary. Use Symbols.

If all the values are going to be the same, put them into their own methods.

def classification_attributes(product_type)
  product_type_mapping[product_type]
end

private def consumer_config
  { abc: abc, vpn: vpn, lbc: lbc }
end

private def industrial_config
  { vpn: vpn, htt: htt, bnn: bnn }
end

private def services_config
  { dhy: dhy, rtt: rtt, abc: abc }
end

private def product_type_mapping
  {
     conumser1: consumer_config,
     consumer2: consumer_config,
     consumer3: consumer_config,
     industrial1: industrial_config,
     industrial2: industrial_config,
     industrial3: industrial_config,
     services1: services_config,
     services2: services_config,
     services3: services_config
  }
end

That's about as far as I can say without more context. If there's that much redundancy you may be able to split product_type into type and subtype.

Consider moving product_type_mapping into config/application.rb, plus any other related configurations. This keeps the application configuration in one place, not scattered around in various classes.

module YourApp
  class Application < Rails::Application
    config.x.consumer_config = { abc: abc, vpn: vpn, lbc: lbc }.freeze
    config.x.industrial_config = { vpn: vpn, htt: htt, bnn: bnn }.freeze
    config.x.services_config = { dhy: dhy, rtt: rtt, abc: abc }.freeze

    config.x.product_type_mapping = {
      conumser1: config.x.consumer_config,
      consumer2: config.x.consumer_config,
      consumer3: config.x.consumer_config,
      industrial1: config.x.industrial_config,
      industrial2: config.x.industrial_config,
      industrial3: config.x.industrial_config,
      services1: config.x.services_config,
      services2: config.x.services_config,
      services3: config.x.services_config
    }.freeze
  end
end

# in your class...

def classification_attributes(product_type)
  Rails.configuration.x.product_type_mapping[product_type]
end

How to avoid NoMethodError for missing elements in nested hashes, without repeated nil checks?

Ruby 2.3.0 introduced a new method called dig on both Hash and Array that solves this problem entirely.

name = params.dig(:company, :owner, :name)

It returns nil if the key is missing at any level.

If you are using a version of Ruby older than 2.3, you can use the ruby_dig gem or implement it yourself:

module RubyDig
  def dig(key, *rest)
    if value = (self[key] rescue nil)
      if rest.empty?
        value
      elsif value.respond_to?(:dig)
        value.dig(*rest)
      end
    end
  end
end

if RUBY_VERSION < '2.3'
  Array.send(:include, RubyDig)
  Hash.send(:include, RubyDig)
end

Why Does Hash.New({}) Hide Hash Members