Python - Using the Multiply Operator to Create Copies of Objects in Lists
The multiplication operator on a sequence means repetition of the item(s) -- NOT creation of copies (shallow or deep ones) of the items. Nothing stops you from going crazy, a la:
import copy
class Crazy(object):
def __init__(self, body, weird=copy.copy):
self.gomez = body
self.cousinitt = weird
def __mul__(self, n):
return [self.cousinitt(x) for x in (self.gomez * n)]
a = Crazy([[]]) * 3
...except your sanity and common sense, if any. Checking on those, how DID you dream operator *
could be made to mean something utterly different than it's intended to mean, except by defining another class overloading __mul__
in weird ways...?-)
Is creating a new list object via multiplication by one with an existing list equivalent to making a deep copy?
No, they aren't equivalent. The multiplication operator makes only a shallow copy. A deep copy means that the references within the list are also copied (that is, new references are created), while a shallow copy only makes a new copy of the top-level reference but not the references within, as demonstrated below:
import copy
a = [[],[]]
b = copy.deepcopy(a)
c = a * 1
for i, v in enumerate(a):
print(id(v), id(b[i]), id(c[i]))
This outputs:
31231832 31261480 31231832
31260800 31261400 31260800
Why does using multiplication operator on list create list of pointers?
The behaviour is not specific to the repetition operator (*
). For example, if you concatenate two lists using +
, the behaviour is the same:
In [1]: a = [[1]]
In [2]: b = a + a
In [3]: b
Out[3]: [[1], [1]]
In [4]: b[0][0] = 10
In [5]: b
Out[5]: [[10], [10]]
This has to do with the fact that lists are objects, and objects are stored by reference. When you use *
et al, it is the reference that gets repeated, hence the behaviour that you're seeing.
The following demonstrates that all elements of rows
have the same identity (i.e. memory address in CPython):
In [6]: rows = [['']*5]*5
In [7]: for row in rows:
...: print id(row)
...:
...:
15975992
15975992
15975992
15975992
15975992
The following is equivalent to your example except it creates five distinct lists for the rows:
rows = [['']*5 for i in range(5)]
Multiply operator applied to list(data structure)
EVERYTHING in python are objects, and python never makes copies unless explicity asked to do so.
When you do
innerList = [0] * 10
you create a list with 10 elements, all of them refering to the same int
object 0
.
Since integer objects are immutable, when you do
innerList[1] = 15
You are changing the second element of the list so that it refers to another integer 15
. That always works because of int
objects immutability.
That's why
outerList = innerList * 5
Will create a list
object with 5 elements, each one is a reference to the same innerList
just as above. But since list
objects are mutable:
outerList[2].append('something')
Is the same as:
innerList.append('something')
Because they are two references to the same list
object. So the element ends up in that single list
. It appears to be duplicated, but the fact is that there is only one list
object, and many references to it.
By contrast if you do
outerList[1] = outerList[1] + ['something']
Here you are creating another list
object (using +
with lists is an explicit copy), and assigning a reference to it into the second position of outerList
. If you "append" the element this way (not really appending, but creating another list), innerList
will be unaffected.
Generating sublists using multiplication ( * ) unexpected behavior
My best guess is that using multiplication in the form
[[]] * x
causes Python to store a reference to a single cell...?
Yes. And you can test this yourself
>>> lst = [[]] * 3
>>> print [id(x) for x in lst]
[11124864, 11124864, 11124864]
This shows that all three references refer to the same object. And note that it really makes perfect sense that this happens1. It just copies the values, and in this case, the values are references. And that's why you see the same reference repeated three times.
It is interesting to note that if I do
lst = [[]]*3
lst[0] = [5]
lst[0].append(3)
then the 'linkage' of cell 0 is broken and I get
[[5,3],[],[]]
, butlst[1].append(0)
still causes[[5,3],[0],[0]
.
You changed the reference that occupies lst[0]
; that is, you assigned a new value to lst[0]
. But you didn't change the value of the other elements, they still refer to the same object that they referred to. And lst[1]
and lst[2]
still refer to exactly the same instance, so of course appending an item to lst[1]
causes lst[2]
to also see that change.
This is a classic mistake people make with pointers and references. Here's the simple analogy. You have a piece of paper. On it, you write the address of someone's house. You now take that piece of paper, and photocopy it twice so you end up with three pieces of paper with the same address written on them. Now, take the first piece of paper, scribble out the address written on it, and write a new address to someone else's house. Did the address written on the other two pieces of paper change? No. That's exactly what your code did, though. That's why the other two items don't change. Further, imagine that the owner of the house with address that is still on the second piece of paper builds an add-on garage to their house. Now I ask you, does the house whose address is on the third piece of paper have an add-on garage? Yes, it does, because it's exactly the same house as the one whose address is written on the second piece of paper. This explains everything about your second code example.
1: You didn't expect Python to invoke a "copy constructor" did you? Puke.
When does list multiplication create multiple copies of the object and when does it reference to the same object?
In fact, creating [0] * m
also refers to the same integer:
>>> lst = [0] * 3
>>> [id(i) for i in lst]
[2751327895760, 2751327895760, 2751327895760]
However, for immutable types, the reason why this is not harmful is that they do not operate in place. If you apply an action that seems to be an operation in place to them, you will only get a new object, which makes it impossible for you to modify the referenced integer in place in the list, so other integers will not change because an element is modified:
>>> x = 0
>>> id(x)
2751327895760
>>> x += 1
>>> id(x)
2751327895792
Therefore, for objects such as strings, integers and so on (there is no problem with tuples in most cases, but be careful if there are mutable objects inside tuples! Thanks for @Kelly Bundy's reminder), there is no harm in using list multiplication. When you want to create a list containing the same objects by multiplication, if you can ensure that you will not modify any of them in place, then this operation can be said to be safe.
How do I multiply each element in a list by a number?
You can just use a list comprehension:
my_list = [1, 2, 3, 4, 5]
my_new_list = [i * 5 for i in my_list]
>>> print(my_new_list)
[5, 10, 15, 20, 25]
Note that a list comprehension is generally a more efficient way to do a for
loop:
my_new_list = []
for i in my_list:
my_new_list.append(i * 5)
>>> print(my_new_list)
[5, 10, 15, 20, 25]
As an alternative, here is a solution using the popular Pandas package:
import pandas as pd
s = pd.Series(my_list)
>>> s * 5
0 5
1 10
2 15
3 20
4 25
dtype: int64
Or, if you just want the list:
>>> (s * 5).tolist()
[5, 10, 15, 20, 25]
Finally, one could use map
, although this is generally frowned upon.
my_new_list = map(lambda x: x * 5, my_list)
Using map
, however, is generally less efficient. Per a comment from ShadowRanger on a deleted answer to this question:
The reason "no one" uses it is that, in general, it's a performance
pessimization. The only time it's worth consideringmap
in CPython is
if you're using a built-in function implemented in C as the mapping
function; otherwise,map
is going to run equal to or slower than the
more Pythonic listcomp or genexpr (which are also more explicit about
whether they're lazy generators or eagerlist
creators; on Py3, your
code wouldn't work without wrapping themap
call inlist
). If you're
usingmap
with alambda
function, stop, you're doing it wrong.
And another one of his comments posted to this reply:
Please don't teach people to use
map
withlambda
; the instant you
need alambda
, you'd have been better off with a list comprehension
or generator expression. If you're clever, you can makemap
work
withoutlambda
s a lot, e.g. in this case,map((5).__mul__, my_list)
, although in this particular case, thanks to some
optimizations in the byte code interpreter for simpleint
math,[x * 5 for x in my_list]
is faster, as well as being more Pythonic and simpler.
Creating a list in Python with multiple copies of a given object in a single line
itertools.repeat()
is your friend.
L = list(itertools.repeat("a", 20)) # 20 copies of "a"
L = list(itertools.repeat(10, 20)) # 20 copies of 10
L = list(itertools.repeat(['x','y'], 20)) # 20 copies of ['x','y']
Note that in the third case, since lists are referred to by reference, changing one instance of ['x','y'] in the list will change all of them, since they all refer to the same list.
To avoid referencing the same item, you can use a comprehension instead to create new objects for each list element:
L = [['x','y'] for i in range(20)]
(For Python 2.x, use xrange()
instead of range()
for performance.)
Related Topics
How to Do Row-To-Column Transposition of Data in CSV Table
Axes Class - Set Explicitly Size (Width/Height) of Axes in Given Units
Difference Between Data and JSON Parameters in Python Requests Package
Pyeval_Initthreads in Python 3: How/When to Call It? (The Saga Continues Ad Nauseam)
Pycharm Import External Library
Drag and Drop Explorer Files to Tkinter Entry Widget
Python - When to Use File VS Open
Why Do Two Identical Lists Have a Different Memory Footprint
Adding a Particle Effect to My Clicker Game
Gunicorn Autoreload on Source Change
Python Regular Expression Pattern * Is Not Working as Expected
How to Unimport a Python Module Which Is Already Imported
Flask App: Update Progress Bar While Function Runs
Opencv Python: Cv2.Findcontours - Valueerror: Too Many Values to Unpack
How to Draw a Line with Matplotlib
How to Return a String from a Regex Match in Python
How Does Python Find a Module File If the Import Statement Only Contains the Filename