What is the difference between a deep copy and a shallow copy?
Shallow copies duplicate as little as possible. A shallow copy of a collection is a copy of the collection structure, not the elements. With a shallow copy, two collections now share the individual elements.
Deep copies duplicate everything. A deep copy of a collection is two collections with all of the elements in the original collection duplicated.
Why is a deep copy so much slower than a shallow copy for lists of the same size?
deepcopy
isn't copying the ints. There's no way it could do that anyway.
deepcopy
is slow because it needs to handle the full complexity of a deep copy, even if that turns out to be unnecessary. That includes dispatching to the appropriate copier for every object it finds, even if the copier turns out to basically just be lambda x: x
. That includes maintaining a memo dict and keeping track of every object copied, to handle duplicate references to the same objects, even if there are none. That includes special copy handling for data structures like lists and dicts, so it doesn't go into an infinite recursion when trying to copy a data structure with recursive references.
All of that has to be done no matter whether it pays off. All of it is expensive.
Also, deepcopy
is pure-Python. That doesn't help. Comparing deepcopy
to pickle.loads(pickle.dumps(whatever))
, which performs a very similar job, pickle
wins handily due to the C implementation. (On Python 2, replace pickle
with cPickle
.) pickle
still loses hard to an implementation that takes advantage of the known structure of the input, though:
In [15]: x = [[0]*1000 for i in range(1000)]
In [16]: %timeit copy.deepcopy(x)
1.05 s ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [17]: %timeit pickle.loads(pickle.dumps(x))
78 ms ± 4.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [18]: %timeit [l[:] for l in x]
4.56 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
What is the difference between shallow copy, deepcopy and normal assignment operation?
Normal assignment operations will simply point the new variable towards the existing object. The docs explain the difference between shallow and deep copies:
The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the
original.
Here's a little demonstration:
import copy
a = [1, 2, 3]
b = [4, 5, 6]
c = [a, b]
Using normal assignment operatings to copy:
d = c
print id(c) == id(d) # True - d is the same object as c
print id(c[0]) == id(d[0]) # True - d[0] is the same object as c[0]
Using a shallow copy:
d = copy.copy(c)
print id(c) == id(d) # False - d is now a new object
print id(c[0]) == id(d[0]) # True - d[0] is the same object as c[0]
Using a deep copy:
d = copy.deepcopy(c)
print id(c) == id(d) # False - d is now a new object
print id(c[0]) == id(d[0]) # False - d[0] is now a new object
Why there is no difference between shallow copy and deep copy for a list of immutables
Since you cannot change the immutable objects, there is no point in creating copies of the same while copying.
Shallow Copy
As per copy
's source code, shallow copy of immutable types is done like this
def _copy_immutable(x):
return x
for t in (type(None), int, long, float, bool, str, tuple,
frozenset, type, xrange, types.ClassType,
types.BuiltinFunctionType, type(Ellipsis),
types.FunctionType, weakref.ref):
d[t] = _copy_immutable
For all the immutable types, the _copy_immutable
function returns the object as it is, during shallow copy.
Deep Copy
The same way, during the deepcopy of tuples, the object is returned as is, as per the _deepcopy_tuple
function,
d = id(x)
try:
return memo[d]
Is pass-by-value/reference equivalent to making a deep/shallow copy, respectively?
No. Those two things are completely unrelated.
Shallow copy/deep copy is talking about object copying; whereas pass-by-value/pass-by-reference is talking about the passing of variables.
In many modern languages, like Python (which you mentioned that you're most familiar with) and Java, "objects" are not values in the language, so "objects" cannot be assigned or passed. Rather, objects are always manipulated through pointers to objects (references), which are values and can be assigned or passed.
Python and Java are pass-by-value only. When you pass a reference, it copies the pointer and you end up with two pointers to the same object. No object copying happens. In these languages, object copying is not done through assigning or passing, but rather is done by calling special methods like .clone()
or by passing an object to a constructor to construct a new object. (There is in fact no general way to copy an object in Java.)
There are some languages, like C++, where objects can be values (of course, you can also have pointers to objects, which work similarly to references in other languages). C also has both pass-by-value and pass-by-reference. If you pass an object by reference, no copying happens. If you assign or pass an object by value, the object is copied. But by default this is a shallow copy.
The distinction between shallow copy and deep copy is how they deal with members of the object which are pointers to another object. Members which are not pointers are simply copied; there is no concept of "shallow" or "deep". "Shallow" or "deep" only talks about members which are pointers -- whether to copy the object pointed to or not. The default assignment operator and copy constructor in C++ simply copies each member. For members that are pointers, that means the pointer is copied and thus you have two pointers to the same object. That is a shallow copy.
You might want a "deep" copy in cases where the member that is a pointer to another object is really pointing to a "sub-object" that is really "a part of" your object (whether a pointer to another object means a sub-object or not depends on the design of your objects), so you don't want multiple of your objects pointing to the same sub-object, and that's why you want to copy the sub-object when copying the main object. To implement this in C++, you would need to implement a custom assignment operator and custom copy constructor to copy the "sub-objects" in the process of copying. In other languages, you would similarly need to customize whatever copying method is used.
When does javascript shallow copy vs. deep copy
https://www.freecodecamp.org/news/copying-stuff-in-javascript-how-to-differentiate-between-deep-and-shallow-copies-b6d8c1ef09cd/
Hi everyone. Here is the answer to this question. Thanks!
Related Topics
How Can My C/C++ Application Determine If the Root User Is Executing the Command
Segmentation Fault When Sending Struct Having Std::Vector Member
How to Increase Thread Priority in Pthreads
Qt - Updating Main Window with Second Thread
How to Compare Multiple Strings Inside an If Statement
Why Is Stack Memory Size So Limited
Is Passing a C++ Object into Its Own Constructor Legal
Size of Array Passed to C++ Function
C++ Auto Keyword. Why Is It Magic
Fast String Hashing Algorithm with Low Collision Rates with 32 Bit Integer
Changing the Current Directory in Linux Using C++
Missing Lboost_Thread-Mt in Mongodb Cpp Driver (Ubuntu Server X64)
Overriding a Base's Overloaded Function in C++
Why Is Std::Function Not Equality Comparable
How to Read a Binary File into a Vector of Unsigned Chars
How to Include Header Files in Gcc Search Path
Undefined Reference to Vtable. Trying to Compile a Qt Project