Python Deepcopy and Shallow Copy and Pass Reference

Is pass-by-value/reference equivalent to making a deep/shallow copy, respectively?

No. Those two things are completely unrelated.

Shallow copy/deep copy is talking about object copying; whereas pass-by-value/pass-by-reference is talking about the passing of variables.

In many modern languages, like Python (which you mentioned that you're most familiar with) and Java, "objects" are not values in the language, so "objects" cannot be assigned or passed. Rather, objects are always manipulated through pointers to objects (references), which are values and can be assigned or passed.

Python and Java are pass-by-value only. When you pass a reference, it copies the pointer and you end up with two pointers to the same object. No object copying happens. In these languages, object copying is not done through assigning or passing, but rather is done by calling special methods like .clone() or by passing an object to a constructor to construct a new object. (There is in fact no general way to copy an object in Java.)

There are some languages, like C++, where objects can be values (of course, you can also have pointers to objects, which work similarly to references in other languages). C also has both pass-by-value and pass-by-reference. If you pass an object by reference, no copying happens. If you assign or pass an object by value, the object is copied. But by default this is a shallow copy.

The distinction between shallow copy and deep copy is how they deal with members of the object which are pointers to another object. Members which are not pointers are simply copied; there is no concept of "shallow" or "deep". "Shallow" or "deep" only talks about members which are pointers -- whether to copy the object pointed to or not. The default assignment operator and copy constructor in C++ simply copies each member. For members that are pointers, that means the pointer is copied and thus you have two pointers to the same object. That is a shallow copy.

You might want a "deep" copy in cases where the member that is a pointer to another object is really pointing to a "sub-object" that is really "a part of" your object (whether a pointer to another object means a sub-object or not depends on the design of your objects), so you don't want multiple of your objects pointing to the same sub-object, and that's why you want to copy the sub-object when copying the main object. To implement this in C++, you would need to implement a custom assignment operator and custom copy constructor to copy the "sub-objects" in the process of copying. In other languages, you would similarly need to customize whatever copying method is used.

Reason why deep and shallow copy differences between primitive and non-primitive objects?

The catch here is mutability and immutability.

There is no such thing is primitive and non primitive in python, everything is a type, some are just inbuilt.

You need to understand how python stores data in variables. Assuming you come from a C background, you can think of all python variables are pointers.

All python variables store the reference to the location where the value of the variable actually is.

The builtin id function somewhat lets us look into where the value of a variable is actually stored.

>>> x = 12345678
>>> id(x)
1886797010128
>>> y = x
>>> id(y)
1886797010128
>>> y += 1
>>> y
12345679
>>> x
12345678
>>> id(y)
1886794729648

The variable x points to location 1886797010128 and location 1886797010128 holds the value of 10. int is an immutable type in python, which means that the data stored in the location 1886797010128 can't be changed.

When we assign y = x, y now also points to the same address, since it's not necessary to allocate more memory for the same value.

When y is changed (remember that int is an immutable type and it's value can't be changed), a new int is created in the new location 1886794729648 and y now points to this new int object at the new address.

The same happens when you try to update the value of a variable that holds immutable data.

>>> id(x)
140707671077744
>>> x = 30
>>> id(x)
140707671078064

Changing the value of a variable that has immutable data simply makes the variable point to a new object with the updated value.


This is not the case with mutable types like list.

>>> a = [1, 2, 3]
>>> b = a
>>> id(a), id(b)
(1886794896456, 1886794896456)
>>> b.append(4)
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>>

a is a list and is mutable, changing it using methods like append will actually mutate the value at address 1886794896456. Since b also points to the same address, the value of a also gets updated.


A deepcopy creates a new object at a different memory location with the same value as its parameter, i.e. the object passed to it.

I would like to know the reason why this design choice was made

This is simply because of how python is designed as an object oriented language. Similar behavior can be seen in java objects.

I personally find this behavior counter-intuitive

Intuition comes from practice. Practicing one language cannot help with how other languages work, there are different design patterns and conventions for different languages and I think some amount of effort should be put in to learn what they are for the language we are about to use.

What is the difference between shallow copy, deepcopy and normal assignment operation?

Normal assignment operations will simply point the new variable towards the existing object. The docs explain the difference between shallow and deep copies:

The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the
    original.

Here's a little demonstration:

import copy

a = [1, 2, 3]
b = [4, 5, 6]
c = [a, b]

Using normal assignment operatings to copy:

d = c

print id(c) == id(d) # True - d is the same object as c
print id(c[0]) == id(d[0]) # True - d[0] is the same object as c[0]

Using a shallow copy:

d = copy.copy(c)

print id(c) == id(d) # False - d is now a new object
print id(c[0]) == id(d[0]) # True - d[0] is the same object as c[0]

Using a deep copy:

d = copy.deepcopy(c)

print id(c) == id(d) # False - d is now a new object
print id(c[0]) == id(d[0]) # False - d[0] is now a new object

python deepcopy and shallow copy and pass reference

Shallow copy makes a copy of mutable objects in the top-level container. A deep copy makes a new instance of all mutable containers in the data structure.

"e.g. 2" results in 10 because you copy the dict on the outside, but the two lists inside are still the old lists, and lists can be changed in-place (they're mutable).

Deep copy makes runs aList.copy(), bList.copy() and replaces the values in your dict with their copies.


e.g. 1 explained:

kvps = {'1': 1, '2': 2}
theCopy = kvps.copy()

# the above is equivalent to:
kvps = {'1': 1, '2': 2}
theCopy = {'1': 1, '2': 2}

When you apply this to e.g. 2:

kvps = {'1': aList, '2': bList}
theCopy = {'1': aList, '2': bList}

The list objects in both dicts are the same objects, so modifying one of the lists will be reflected in both dicts.


Doing a deep copy (e.g. 3) results in this:

kvps = {'1': aList, '2': bList}
theCopy = {'1': [1, 2], '2': [3, 4]}

This means both dicts have entirely different contents, and modifying one won't modify the other.


e.g. 4 via dict() is equivalent to a shallow copy.

Understanding dict.copy() - shallow or deep?

By "shallow copying" it means the content of the dictionary is not copied by value, but just creating a new reference.

>>> a = {1: [1,2,3]}
>>> b = a.copy()
>>> a, b
({1: [1, 2, 3]}, {1: [1, 2, 3]})
>>> a[1].append(4)
>>> a, b
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})

In contrast, a deep copy will copy all contents by value.

>>> import copy
>>> c = copy.deepcopy(a)
>>> a, c
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})
>>> a[1].append(5)
>>> a, c
({1: [1, 2, 3, 4, 5]}, {1: [1, 2, 3, 4]})

So:

  1. b = a: Reference assignment, Make a and b points to the same object.

    Illustration of 'a = b': 'a' and 'b' both point to '{1: L}', 'L' points to '[1, 2, 3]'.

  2. b = a.copy(): Shallow copying, a and b will become two isolated objects, but their contents still share the same reference

    Illustration of 'b = a.copy()': 'a' points to '{1: L}', 'b' points to '{1: M}', 'L' and 'M' both point to '[1, 2, 3]'.

  3. b = copy.deepcopy(a): Deep copying, a and b's structure and content become completely isolated.

    Illustration of 'b = copy.deepcopy(a)': 'a' points to '{1: L}', 'L' points to '[1, 2, 3]'; 'b' points to '{1: M}', 'M' points to a different instance of '[1, 2, 3]'.

Preventing reference re-use during deepcopy

If your whole point is to copy data that could come from JSON, i.e. list, dict, string, numbers, bool, then you can trivially implement your own function:

def copy_jsonlike(data):
if isinstance(data, list):
return [copy_jsonlike(x) for x in data]
elif isinstance(data, dict):
return {k: copy_jsonlike(v) for k,v in data.items()}
else:
return data

It has the added bonus of probably being faster than copy.deepcopy

Or, your original solution, json.loads(json.dumps(data)) isn't a bad idea either.

What is the difference between a deep copy and a shallow copy?

Shallow copies duplicate as little as possible. A shallow copy of a collection is a copy of the collection structure, not the elements. With a shallow copy, two collections now share the individual elements.

Deep copies duplicate everything. A deep copy of a collection is two collections with all of the elements in the original collection duplicated.

When a key has a list value, using deep copy and shallow copy will have some differences

From the docs:

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

1.A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in
the original.

2.A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

so..

  1. yes
  2. yes
  3. No
    When you use shallow copy,for string and tuple it does not create a new object.
    ex :
>>> s1 = "string1"
>>> s2 = copy.copy(s1)
>>> id(s2) == id(s1)
True
>>> id(s2[0]) == id(s1[0])
True

both are pointing to same object..!


  1. Yes,if we use deep copy,new objects are created for dictionaries and lists as they are compound objects.

check this link for more info:
What exactly is the difference between shallow copy, deepcopy and normal assignment operation?

Python deepcopy changes the reference to an object

Is there such a function in copy module?

No, I'm afraid there is not.

It sounds like you want some sort of hybrid deep/shallow behavior, and that's not what exists in the copy module. You either get a shallow copy, or a deep one.

Sorry for the bad news. :) You may just want to write your own copier, or see if you can accomplish your task without copying per se.

(I see you edited your question, so I've answered again to see if I can help. (Please see my first answer for a response to your original Q.))



Related Topics



Leave a reply



Submit