How to Copy a Hash in Ruby

How do I copy a hash in Ruby?

The clone method is Ruby's standard, built-in way to do a shallow-copy:

h0 = {"John" => "Adams", "Thomas" => "Jefferson"}
# => {"John"=>"Adams", "Thomas"=>"Jefferson"}
h1 = h0.clone
# => {"John"=>"Adams", "Thomas"=>"Jefferson"}
h1["John"] = "Smith"
# => "Smith"
h1
# => {"John"=>"Smith", "Thomas"=>"Jefferson"}
h0
# => {"John"=>"Adams", "Thomas"=>"Jefferson"}

Note that the behavior may be overridden:

This method may have class-specific behavior. If so, that behavior will be documented under the #initialize_copy method of the class.

Duplicating a Hash in Ruby

To @pjs's point, Hash#dup will 'do the right thing' for the top level of a hash. For nested hashes however, it still fails.

If you're open to using a gem, consider using deep_enumerable, a gem I wrote for exactly this purpose (among others).

DEFAULT_HASH = { a:{a:1, b:2}, b:{a:2, b:1} }
dupped = DEFAULT_HASH.dup

dupped[:a][:a] = 'updated'

puts "dupped: #{dupped.inspect}"
puts "DEFAULT_HASH: #{DEFAULT_HASH.inspect}"


require 'deep_enumerable'
DEFAULT_HASH = { a:{a:1, b:2}, b:{a:2, b:1} }

deep_dupped = DEFAULT_HASH.deep_dup
deep_dupped[:a][:a] = 'updated'

puts "deep_dupped: #{deep_dupped.inspect}"
puts "DEFAULT_HASH: #{DEFAULT_HASH.inspect}"

Output:

dupped:       {:a=>{:a=>"updated", :b=>2}, :b=>{:a=>2, :b=>1}}
DEFAULT_HASH: {:a=>{:a=>"updated", :b=>2}, :b=>{:a=>2, :b=>1}}

deep_dupped: {:a=>{:a=>"updated", :b=>2}, :b=>{:a=>2, :b=>1}}
DEFAULT_HASH: {:a=>{:a=>1, :b=>2}, :b=>{:a=>2, :b=>1}}

Alternatively, you could try something along the lines of:

def deep_dup(h)
Hash[h.map{|k, v| [k,
if v.is_a?(Hash)
deep_dup(v)
else
v.dup rescue v
end
]}]
end

Note, this last function is nowhere near as well tested as deep_enumerable.

Cloning a Hash in Ruby2

Hash is a collection of keys and values, where values are references to objects. When duplicating a hash, new hash is being created, but all object references are being copied, so as result you get new hash containing the same values. That is why this will work:

hash = {1 => 'Some string'} #Strings are mutable
hash2 = hash.clone

hash2[1] #=> 'Some string'
hash2[1].upcase! # modifying mutual object
hash[1] #=> 'SOME STRING; # so it appears modified on both hashes
hash2[1] = 'Other string' # changing reference on second hash to another object
hash[1] #=> 'SOME STRING' # original obejct has not been changed

hash2[2] = 'new value' # adding obejct to original hash
hash[2] #=> nil

If you want duplicate the referenced objects, you need to perform deep duplication. It is added in rails (activesupport gem) as deep_dup method. If you are not using rails and don;t want to install the gem, you can write it like:

class Hash
def deep_dup
Hash[map {|key, value| [key, value.respond_to?(:deep_dup) ? value.deep_dup : begin
value.dup
rescue
value
end]}]
end
end

hash = {1 => 'Some string'} #Strings are mutable
hash2 = hash.deep_dup

hash2[1] #=> 'Some string'
hash2[1].upcase! # modifying referenced object
hash2[1] #=> 'SOME STRING'
hash[1] #=> 'Some string; # now other hash point to original object's clone

You probably should write something similar for arrays. I would also thought about writing it for whole enumerable module, but it might be slightly trickier.

Copy hash without pointing to the same object

You can use 'Marshal' to deep copy.

h1 = {:key_1 => {:sub_1 => "sub_1", :sub_2 => "sub_2"}}

h2 = Marshal.load(Marshal.dump(h1))

h2[:key_1][:sub_1] = "SUB_1"
h2[:key_1].delete(:sub_2)

p h1
# => {:key_1=>{:sub_1=>"sub_1", :sub_2=>"sub_2"}}
p h2
# => {:key_1=>{:sub_1=>"SUB_1"}}

How to clone array of hashes and add key value using each loop

Let's see what is happening.

arr = [{a: "cat", b: "dog"}, {a: "uno", b: "due"}]
arr.object_id
#=> 4557280

arr1 = arr
arr1.object_id
#=> 4557280

As you see, the variables arr and arr1 hold the same object, because the objects have the same object id.1 Therefore, if that object is modified, arr and arr1 will still both hold that object. Let's try it.

arr[0] = {a: "cat", b: "dog"}
arr
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"uno", :b=>"due"}]
arr.object_id
#=> 4557280

arr1
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"uno", :b=>"due"}]
arr1.object_id
#=> 4557280

If we want to be able to modify arr in this way without it affecting arr1, we use the method Kernel#dup.

arr
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"uno", :b=>"due"}]
arr1 = arr.dup
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"uno", :b=>"due"}]

arr.object_id
#=> 4557280
arr1.object_id
#=> 3693480

arr.map(&:object_id)
#=> [2631980, 4557300]
arr1.map(&:object_id)
#=> [2631980, 4557300]

As you see, arr and arr1 now hold different objects. Those objects, however, are arrays whose corresponding elements (hashes) are the same objects. Let's modify one of arr's elements.

arr[1][:a] = "owl"
arr
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"owl", :b=>"due"}]
arr.map(&:object_id)
#=> [2631980, 4557300]

arr still contains the same objects, but we have modified one. Let's look at arr1.

arr1
#=> [{:a=>"cat", :b=>"dog"}, {:a=>"owl", :b=>"due"}]
arr1.map(&:object_id)
#=> [2631980, 4557300]

Should we be surprised that arr1 has changed as well?

We need to dup both arr and the elements of arr.

arr = [{a: "one", b: "two"}, {a: "uno", b: "due"}]
arr1 = arr.dup.map(&:dup)
#=> [{:a=>"one", :b=>"two"}, {:a=>"uno", :b=>"due"}]

arr.object_id
#=> 4149120
arr1.object_id
#=> 4182360

arr.map(&:object_id)
#=> [4149200, 4149140]
arr1.map(&:object_id)
#=> [4182340, 4182280]

Now arr and arr1 are different objects and they contain different (hash) objects, so any change to one will not affect the other. (Try it.)

Now suppose arr were as follows.

arr = [{a: "cat", b: [1,2]}]

Let's make the copy.

arr1 = arr.dup.map(&:dup)
#=> [{:a=>"cat", :b=>[1, 2]}]

Now modify arr[0][:b].

arr[0][:b] << 3
#=> [{:a=>"cat", :b=>[1, 2, 3]}]
arr1
#=> [{:a=>"cat", :b=>[1, 2, 3]}]

Drat! arr1 changed. We can again look at object ids to see why that happened.

arr.object_id
#=> 4488500
arr1.object_id
#=> 4503140

arr.map(&:object_id)
#=> [4488520]
arr1.map(&:object_id)
#=> [4503100]

arr[0][:b].object_id
#=> 4488560
arr1[0][:b].object_id
#=> 4488560

We see that arr and arr1 are different objects and there respective hashes are the same elements, but the array is the same object for both hashes. We therefore need to do something like this:

arr1[0][:b] = arr[0][:b].dup

but that's still not good enough if arr were:

arr = [{a: "cat", b: [1,[2,3]]}]

What we need is a method that will make a deep copy. A common solution for that is to use the methods Marshal::dump and Marshal::load.

arr = [{a: "cat", b: [1,2]}]
str = Marshal.dump(arr)
#=> "\x04\b[\x06{\a:\x06aI\"\bcat\x06:\x06ET:\x06b[\ai\x06i\a"
arr1 = Marshal.load(str)
#=> [{:a=>"cat", :b=>[1, 2]}]

arr[0][:b] << 3
#=> [{:a=>"cat", :b=>[1, 2, 3]}]
arr
#=> [{:a=>"cat", :b=>[1, 2, 3]}]
arr1
#=> [{:a=>"cat", :b=>[1, 2]}]

Note we could write:

arr1 = Marshal.load(Marshal.dump(arr))

As explained in the doc, the serialization used by the Marshal methods is not necessarily the same for different Ruby versions. If, for example, dump were used to produce a string that was saved to file and later load was invoked on the contents of the file, using a different version of Ruby, the contents may not be readable. Of course that's not a problem in this application of the methods.

1. To make it easier to see differences in object id's I've only shown the last seven digits. They in all cases are preceded by the digits 4877798.

Duplicate Hash Key unique Pair

When I look at the provide scenario I see the following solution:

data = [{:mobile=>21, :web=>43},{:mobile=>23, :web=>543},{:mobile=>23, :web=>430},{:mobile=>34, :web=>13},{:mobile=>26, :web=>893}]
keys = [:mobile, :web]
result = keys.zip(data.map { |hash| hash.values_at(*keys) }.transpose).to_h
#=> {:mobile=>[21, 23, 23, 34, 26], :web=>[43, 543, 430, 13, 893]}

This first extracts the values of the keys from each hash, then transposes the the resulting array. This changes [[21, 43], [23, 543], [23, 430], ...] into [[21, 23, 23, ...], [43, 543, 430, ...]]. This result can be zipped back to the keys and converted into a hash.

To get rid of duplicates you could add .each(&:uniq!) after the transpose call, or map the collections to a set .map(&:to_set) (you need to require 'set') if you don't mind the values being sets instead of arrays.

result = keys.zip(data.map { |hash| hash.values_at(*keys) }.transpose.each(&:uniq!)).to_h
#=> {:mobile=>[21, 23, 34, 26], :web=>[43, 543, 430, 13, 893]}

require 'set'
result = keys.zip(data.map { |hash| hash.values_at(*keys) }.transpose.map(&:to_set)).to_h
#=> {:mobile=>#<Set: {21, 23, 34, 26}>, :web=>#<Set: {43, 543, 430, 13, 893}>}

References:

  • Array#map
  • Hash#values_at
  • Splat operator * (in hash.values_at(*keys))
  • Array#zip
  • Array#transpose
  • Array#to_h
  • Array#each
  • Array#uniq!
  • Enumerable#to_set

Understanding how hash copy is behaving under the hood

I just needed a little push to understand, and thanks to @Stefan I think I can answer my own question. Breaking it down we have:

root = {}
base = root
puts root.object_id
=> 47193371579760
puts base.object_id
=> 47193371579760

So both root and base became a reference for the same object.

base[:a] = {}
base[:a].object_id
=> 47193372751820
base = base[:a]
puts base.object_id
=> 47193372751820
puts root.object_id
=> 47193371579760
puts root

base[:a] is a new hash object, and base assigned to it becomes this object while root keeps the reference for the old object that was assigned {:a=>{}}. That's why root doesn't change at the end.



Related Topics



Leave a reply



Submit