What's the Most Efficient Way to Deep Copy an Object in Ruby

What's the most efficient way to deep copy an object in Ruby?

I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.

I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).

Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.

This was all run with ruby 1.9.2p180.

#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'

def dc1(value)
Marshal.load(Marshal.dump(value))
end

def dc2(value)
YAML.load(YAML.dump(value))
end

def dc3(value)
JSON.load(JSON.dump(value))
end

def dc4(value)
if value.is_a?(Hash)
result = value.clone
value.each{|k, v| result[k] = dc4(v)}
result
elsif value.is_a?(Array)
result = value.clone
result.clear
value.each{|v| result << dc4(v)}
result
else
value
end
end

def dc5(value)
MessagePack.unpack(value.to_msgpack)
end

value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}

Benchmark.bm do |x|
iterations = 10000
x.report {iterations.times {dc1(value)}}
x.report {iterations.times {dc2(value)}}
x.report {iterations.times {dc3(value)}}
x.report {iterations.times {dc4(value)}}
x.report {iterations.times {dc5(value)}}
end

results in:

user       system     total       real
0.230000 0.000000 0.230000 ( 0.239257) (Marshal)
3.240000 0.030000 3.270000 ( 3.262255) (YAML)
0.590000 0.010000 0.600000 ( 0.601693) (JSON)
0.060000 0.000000 0.060000 ( 0.067661) (Custom)
0.090000 0.010000 0.100000 ( 0.097705) (MessagePack)

How to create a deep copy of an object in Ruby?

Deep copy isn't built into vanilla Ruby, but you can hack it by marshalling and unmarshalling the object:

Marshal.load(Marshal.dump(@object))

This isn't perfect though, and won't work for all objects. A more robust method:

class Object
def deep_clone
return @deep_cloning_obj if @deep_cloning
@deep_cloning_obj = clone
@deep_cloning_obj.instance_variables.each do |var|
val = @deep_cloning_obj.instance_variable_get(var)
begin
@deep_cloning = true
val = val.deep_clone
rescue TypeError
next
ensure
@deep_cloning = false
end
@deep_cloning_obj.instance_variable_set(var, val)
end
deep_cloning_obj = @deep_cloning_obj
@deep_cloning_obj = nil
deep_cloning_obj
end
end

Source:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-list/43424

Ruby: object deep copying

This is a really thin, very specific implementation of a "deep copy". What it's demonstrating is creating an independent @name instance variable in the clone so that modifying the name of one with an in-place operation won't have the side-effect of changing the clone.

Normally deep-copy operations are important for things like nested arrays or hashes, but they're also applicable to any object with attributes that refer to things of that sort.

In your case, to make an object with a more robust dup method, you should call dup on each of the attributes in question, but I think this example is a bit broken. What it does is replace the @name in the original with a copy, which may break any references you have.

A better version is:

def dup
copy = super
copy.make_independent!
copy
end

def make_independent!
instance_variables.each do |var|
value = instance_variable_get(var)

if (value.respond_to?(:dup))
instance_variable_set(var, value.dup)
end
end
end

This should have the effect of duplicating any instance variables which support the dup method. This skips things like numbers, booleans, and nil which can't be duplicated.

Methods to create deep copy of objects without the help of Marshal

This solution works

class CashRegister
attr_accessor :bills

def initialize
@bills = []
end

def clone
cloned = super
cloned.bills = @bills.map { |bill| bill.clone }
cloned
end
end

class Bill
attr_accessor :positions

def initialize(nr)
@nr = nr
@positions = []
end

def clone
cloned = super
cloned.positions = @positions.map{ |pos| pos.clone }
cloned
end
end

class Position
attr_reader :price
attr_writer :product

# this method is given
def product
@product.clone
end

def initialize(product, price)
@product = product
@price = price
end

def clone
cloned = super
cloned.product = product
cloned
end
end

Why isn't there a deep copy method in Ruby?

I'm not sure why there's no deep copy method in Ruby, but I'll try to make an educated guess based on the information I could find (see links and quotes below the line).

Judging from this information, I could only infer that the reason Ruby does not have a deep copy method is because it's very rarely necessary and, in the few cases where it truly is necessary, there are other, relatively simple ways to accomplish the same task:

As you already know, using Marshal.dump and Marshal.load is currently the recommended way to do this. This is also the approach recommended by Programming Ruby (see excerpts below).

Alternatively, there are at least 3 available implementations found in these gems: deep_cloneable, deep_clone and ruby_deep_clone; the first being the most popular.


Related Information

Here's a discussion over at comp.lang.ruby which might shed some light on this. There's another answer here with some associated discussions, but it all comes back to using Marshal.

There weren't any mentions of deep copying in Programming Ruby, but there were a few mentions in The Ruby Programming Language. Here are a few related excerpts:

[…]

Another use for Marshal.dump and Marshal.load is to create deep copies
of objects:

def deepcopy(o)
Marshal.load(Marshal.dump(o))
end

[…]

… the binary format used by Marshal.dump and Marshal.load is
version-dependent, and newer versions of Ruby are not guaranteed to be
able to read marshalled objects written by older versions of Ruby.

[…]

Note that files and I/O streams, as well as Method and Binding
objects, are too dynamic to be marshalled; there would be no reliable
way to restore their state.

[…]

Instead of making a defensive deep copy of the array, just call
to_enum on it, and pass the resulting enumerator instead of the array
itself. In effect, you’re creating an enumerable but immutable proxy
object for your array.

Provide simplest example where deep copy is needed in ruby

The example you have shown does not describe the difference between a deep and a shallow copy. Instead, consider this example:

class Klass
attr_accessor :name
end

anna = Klass.new
anna.name = 'Anna'

anna_lisa = anna.dup
anna_lisa.name << ' Lisa'
# => "Anna Lisa"

anna.name
# => "Anna Lisa"

Generally, dup and clone are both expected to just duplicate the actual object you are calling the method on. No other referenced objects like the name String in the above example are duplicated. Thus, after the duplication, both, the original and the duplicated object point to the very same name string.

With a deep_dup, typically all (relevant) referenced objects are duplicated too, often to an infinite depth. Since this is rather hard to achieve for all possible object references, often people rely on implementation for specific objects like hashes and arrays.

A common workaround for a rather generic deep-dup is to use Ruby's Marshal class to serialize an object graph and directly unserializing it again.

anna_lena = Marshal.load( Marshal.dump(anna))

This creates new objects and is effectively a deep_dup. Since most objects support marshaling right away, this is a rather powerful mechanism. Note though than you should never unmarshal (i.e. load) user-provided data since this will lead to a remote-code execution vulnerability.

Ruby on Rails deep copy/ deep clone of object and its attributes

You should clone every trial and assign them to the cloned experiment:

@experiment_new = @experiment_old.clone
@experiment_old.trials.each do |trial|
@experiment_new.trials << trial.clone
end


Related Topics



Leave a reply



Submit