How to Override the Copy/Deepcopy Operations for a Python Object

How to override the copy/deepcopy operations for a Python object?

The recommendations for customizing are at the very end of the docs page:

Classes can use the same interfaces to
control copying that they use to
control pickling. See the description
of module pickle for information on
these methods. The copy module does
not use the copy_reg registration
module.

In order for a class to define its own
copy implementation, it can define
special methods __copy__() and
__deepcopy__(). The former is called to implement the shallow copy
operation; no additional arguments are
passed. The latter is called to
implement the deep copy operation; it
is passed one argument, the memo
dictionary. If the __deepcopy__()
implementation needs to make a deep
copy of a component, it should call
the deepcopy() function with the
component as first argument and the
memo dictionary as second argument.

Since you appear not to care about pickling customization, defining __copy__ and __deepcopy__ definitely seems like the right way to go for you.

Specifically, __copy__ (the shallow copy) is pretty easy in your case...:

def __copy__(self):
newone = type(self)()
newone.__dict__.update(self.__dict__)
return newone

__deepcopy__ would be similar (accepting a memo arg too) but before the return it would have to call self.foo = deepcopy(self.foo, memo) for any attribute self.foo that needs deep copying (essentially attributes that are containers -- lists, dicts, non-primitive objects which hold other stuff through their __dict__s).

deepcopy override clarification

You are right you could to this in the way you presented. However this would be specific to your object implementation. The example you posted is much more generic and could handle copying many different classes of objects. It could also be made as a mixin to easy add it to your class.

The code presented, do this by:

def __deepcopy__(self, memo):
cls = self.__class__ # Extract the class of the object
result = cls.__new__(cls) # Create a new instance of the object based on extracted class
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, deepcopy(v, memo)) # Copy over attributes by copying directly or in case of complex objects like lists for exaample calling the `__deepcopy()__` method defined by them. Thus recursively copying the whole tree of objects.
return result

Note also that if your class consisted of complex attributes like lists you would also need to call directly deepcopy on them or otherwise you would end up with a shallow copy for some of the attributes.

EDIT

memo is a dict, where id-to-object correspondence is kept to reconstruct complex object graphs perfectly.

Override deepcopy with copy

Since nobody else has ventured a solution to this, I'll post my own solution. I ended up modifying the constructor for the test class so that it could take in arguments and then passing the shallow copies of the dictionaries, like so:

import copy

class test(object):
def __init__(self, test_dict1 = {}, test_dict2 = {}):
self.test_dict1 = test_dict1
self.test_dict2 = test_dict2
...

a = test()

# Set up initial values for a
...

for i in range(1000):
b = test(copy.copy(a.test_dict1), copy.copy(a.test_dict2)

# do stuff
...

How to copy a Python class instance if deepcopy() does not work?

I have mostly figured it out. The only problem which I cannot overcome is knowing an acceptable set of initialization arguments (arguments for __init__) for all classes. So I have to make the following two assumtions:

1) I have a set of default arguments for class C which I call argsC.
2) All objects in C can be initialized with empty arguments.

In which case I can
First:
Initialize a new instance of the class C from it's instance which I want to copy c:

c_copy = c.__class__(**argsC)

Second:
Go through all the attributes of c and set the attributes c_copy to be a copy of the attributes of c

for att in c.__dict__:
setattr(c_copy, att, object_copy(getattr(c,att)))

where object_copy is a recursive application of the function we are building.

Last:
Delete all attributes in c_copy but not in c:

for att in c_copy.__dict__:
if not hasattr(c, att):
delattr(c_copy, att)

Putting this all together we have:

import copy

def object_copy(instance, init_args=None):
if init_args:
new_obj = instance.__class__(**init_args)
else:
new_obj = instance.__class__()
if hasattr(instance, '__dict__'):
for k in instance.__dict__ :
try:
attr_copy = copy.deepcopy(getattr(instance, k))
except Exception as e:
attr_copy = object_copy(getattr(instance, k))
setattr(new_obj, k, attr_copy)

new_attrs = list(new_obj.__dict__.keys())
for k in new_attrs:
if not hasattr(instance, k):
delattr(new_obj, k)
return new_obj
else:
return instance

So putting it all together we have:

argsC = {'a':1, 'b':1}
c = C(4,5,r=[[1],2,3])
c.a = 11
del c.b
c_copy = object_copy(c, argsC)
c.__dict__

{'a': 11, 'r': [[1], 2, 3]}

c_copy.__dict__

{'a': 11, 'r': [[1], 2, 3]}

c.__dict__

{'a': 11, 'r': [[1, 33], 2, 3]}

c_copy.__dict__

{'a': 11, 'r': [[1], 2, 3]}

Which is the desired outcome. It uses deepcopy if it can, but for the cases where it would raise an exception, it can do without.

Preventing reference re-use during deepcopy

If your whole point is to copy data that could come from JSON, i.e. list, dict, string, numbers, bool, then you can trivially implement your own function:

def copy_jsonlike(data):
if isinstance(data, list):
return [copy_jsonlike(x) for x in data]
elif isinstance(data, dict):
return {k: copy_jsonlike(v) for k,v in data.items()}
else:
return data

It has the added bonus of probably being faster than copy.deepcopy

Or, your original solution, json.loads(json.dumps(data)) isn't a bad idea either.

How to exclude specific references from deepcopy?

Your class can implement a __deepcopy__ method to control how it is copied. From the copy module documentation:

In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.

Simply return a new instance of your class, with the reference you don't want to be deep-copied just taken across as-is. Use the deepcopy() function to copy other objects:

from copy import deepcopy

class Foo:
def __init__(self, content, linked_to):
self.content = content
self.linked_to = linked_to

def __deepcopy__(self, memo):
# create a copy with self.linked_to *not copied*, just referenced.
return Foo(deepcopy(self.content, memo), self.linked_to)

Demo:

>>> a1 = Foo([[1, 2], [3, 4]], None)
>>> a2 = Foo([[5, 6], [7, 8]], a1)
>>> a3 = deepcopy(a2)
>>> a3.linked_to.content.append([9, 10]) # still linked to a1
>>> a1.content
[[1, 2], [3, 4], [9, 10]]
>>> a1 is a3.linked_to
True
>>> a2.content is a3.content # content is no longer shared
False

If I want non-recursive deep copy of my object, should I override copy or deepcopy in Python?

The correct magic method for you to implement here is __copy__.

The behaviour you've described is deeper than what the default behaviour of copy would do (i.e. for an object which hasn't bothered to implement __copy__), but it is not deep enough to be called a deepcopy. Therefore you should implement __copy__ to get the desired behaviour.

Do not make the mistake of thinking that simply assigning another name makes a "copy":

t1 = T('google.com', 123)
t2 = t1 # this does *not* use __copy__

That just binds another name to the same instance. Rather, the __copy__ method is hooked into by a function:

import copy
t2 = copy.copy(t1)


Related Topics



Leave a reply



Submit