Python - Using the Multiply Operator to Create Copies of Objects in Lists

Python - Using the Multiply Operator to Create Copies of Objects in Lists

The multiplication operator on a sequence means repetition of the item(s) -- NOT creation of copies (shallow or deep ones) of the items. Nothing stops you from going crazy, a la:

import copy

class Crazy(object):
def __init__(self, body, weird=copy.copy):
self.gomez = body
self.cousinitt = weird
def __mul__(self, n):
return [self.cousinitt(x) for x in (self.gomez * n)]

a = Crazy([[]]) * 3

...except your sanity and common sense, if any. Checking on those, how DID you dream operator * could be made to mean something utterly different than it's intended to mean, except by defining another class overloading __mul__ in weird ways...?-)

Is creating a new list object via multiplication by one with an existing list equivalent to making a deep copy?

No, they aren't equivalent. The multiplication operator makes only a shallow copy. A deep copy means that the references within the list are also copied (that is, new references are created), while a shallow copy only makes a new copy of the top-level reference but not the references within, as demonstrated below:

import copy
a = [[],[]]
b = copy.deepcopy(a)
c = a * 1
for i, v in enumerate(a):
print(id(v), id(b[i]), id(c[i]))

This outputs:

31231832 31261480 31231832
31260800 31261400 31260800

Why does using multiplication operator on list create list of pointers?

The behaviour is not specific to the repetition operator (*). For example, if you concatenate two lists using +, the behaviour is the same:

In [1]: a = [[1]]

In [2]: b = a + a

In [3]: b
Out[3]: [[1], [1]]

In [4]: b[0][0] = 10

In [5]: b
Out[5]: [[10], [10]]

This has to do with the fact that lists are objects, and objects are stored by reference. When you use * et al, it is the reference that gets repeated, hence the behaviour that you're seeing.

The following demonstrates that all elements of rows have the same identity (i.e. memory address in CPython):

In [6]: rows = [['']*5]*5

In [7]: for row in rows:
...: print id(row)
...:
...:
15975992
15975992
15975992
15975992
15975992

The following is equivalent to your example except it creates five distinct lists for the rows:

rows = [['']*5 for i in range(5)]

Multiply operator applied to list(data structure)

EVERYTHING in python are objects, and python never makes copies unless explicity asked to do so.

When you do

innerList = [0] * 10

you create a list with 10 elements, all of them refering to the same int object 0.

Since integer objects are immutable, when you do

innerList[1] = 15

You are changing the second element of the list so that it refers to another integer 15. That always works because of int objects immutability.

That's why

outerList = innerList * 5

Will create a list object with 5 elements, each one is a reference to the same innerList just as above. But since list objects are mutable:

outerList[2].append('something')

Is the same as:

innerList.append('something')

Because they are two references to the same list object. So the element ends up in that single list. It appears to be duplicated, but the fact is that there is only one list object, and many references to it.

By contrast if you do

outerList[1] = outerList[1] + ['something']

Here you are creating another list object (using + with lists is an explicit copy), and assigning a reference to it into the second position of outerList. If you "append" the element this way (not really appending, but creating another list), innerList will be unaffected.

Generating sublists using multiplication ( * ) unexpected behavior

My best guess is that using multiplication in the form [[]] * x causes Python to store a reference to a single cell...?

Yes. And you can test this yourself

>>> lst = [[]] * 3
>>> print [id(x) for x in lst]
[11124864, 11124864, 11124864]

This shows that all three references refer to the same object. And note that it really makes perfect sense that this happens1. It just copies the values, and in this case, the values are references. And that's why you see the same reference repeated three times.

It is interesting to note that if I do

lst = [[]]*3
lst[0] = [5]
lst[0].append(3)

then the 'linkage' of cell 0 is broken and I get [[5,3],[],[]], but lst[1].append(0) still causes [[5,3],[0],[0].

You changed the reference that occupies lst[0]; that is, you assigned a new value to lst[0]. But you didn't change the value of the other elements, they still refer to the same object that they referred to. And lst[1] and lst[2] still refer to exactly the same instance, so of course appending an item to lst[1] causes lst[2] to also see that change.

This is a classic mistake people make with pointers and references. Here's the simple analogy. You have a piece of paper. On it, you write the address of someone's house. You now take that piece of paper, and photocopy it twice so you end up with three pieces of paper with the same address written on them. Now, take the first piece of paper, scribble out the address written on it, and write a new address to someone else's house. Did the address written on the other two pieces of paper change? No. That's exactly what your code did, though. That's why the other two items don't change. Further, imagine that the owner of the house with address that is still on the second piece of paper builds an add-on garage to their house. Now I ask you, does the house whose address is on the third piece of paper have an add-on garage? Yes, it does, because it's exactly the same house as the one whose address is written on the second piece of paper. This explains everything about your second code example.

1: You didn't expect Python to invoke a "copy constructor" did you? Puke.

When does list multiplication create multiple copies of the object and when does it reference to the same object?

In fact, creating [0] * m also refers to the same integer:

>>> lst = [0] * 3
>>> [id(i) for i in lst]
[2751327895760, 2751327895760, 2751327895760]

However, for immutable types, the reason why this is not harmful is that they do not operate in place. If you apply an action that seems to be an operation in place to them, you will only get a new object, which makes it impossible for you to modify the referenced integer in place in the list, so other integers will not change because an element is modified:

>>> x = 0
>>> id(x)
2751327895760
>>> x += 1
>>> id(x)
2751327895792

Therefore, for objects such as strings, integers and so on (there is no problem with tuples in most cases, but be careful if there are mutable objects inside tuples! Thanks for @Kelly Bundy's reminder), there is no harm in using list multiplication. When you want to create a list containing the same objects by multiplication, if you can ensure that you will not modify any of them in place, then this operation can be said to be safe.

How do I multiply each element in a list by a number?

You can just use a list comprehension:

my_list = [1, 2, 3, 4, 5]
my_new_list = [i * 5 for i in my_list]

>>> print(my_new_list)
[5, 10, 15, 20, 25]

Note that a list comprehension is generally a more efficient way to do a for loop:

my_new_list = []
for i in my_list:
my_new_list.append(i * 5)

>>> print(my_new_list)
[5, 10, 15, 20, 25]

As an alternative, here is a solution using the popular Pandas package:

import pandas as pd

s = pd.Series(my_list)

>>> s * 5
0 5
1 10
2 15
3 20
4 25
dtype: int64

Or, if you just want the list:

>>> (s * 5).tolist()
[5, 10, 15, 20, 25]

Finally, one could use map, although this is generally frowned upon.

my_new_list = map(lambda x: x * 5, my_list)

Using map, however, is generally less efficient. Per a comment from ShadowRanger on a deleted answer to this question:

The reason "no one" uses it is that, in general, it's a performance
pessimization. The only time it's worth considering map in CPython is
if you're using a built-in function implemented in C as the mapping
function; otherwise, map is going to run equal to or slower than the
more Pythonic listcomp or genexpr (which are also more explicit about
whether they're lazy generators or eager list creators; on Py3, your
code wouldn't work without wrapping the map call in list). If you're
using map with a lambda function, stop, you're doing it wrong.

And another one of his comments posted to this reply:

Please don't teach people to use map with lambda; the instant you
need a lambda, you'd have been better off with a list comprehension
or generator expression. If you're clever, you can make map work
without lambdas a lot, e.g. in this case, map((5).__mul__, my_list), although in this particular case, thanks to some
optimizations in the byte code interpreter for simple int math, [x * 5 for x in my_list] is faster, as well as being more Pythonic and simpler.

Creating a list in Python with multiple copies of a given object in a single line

itertools.repeat() is your friend.

L = list(itertools.repeat("a", 20)) # 20 copies of "a"

L = list(itertools.repeat(10, 20)) # 20 copies of 10

L = list(itertools.repeat(['x','y'], 20)) # 20 copies of ['x','y']

Note that in the third case, since lists are referred to by reference, changing one instance of ['x','y'] in the list will change all of them, since they all refer to the same list.

To avoid referencing the same item, you can use a comprehension instead to create new objects for each list element:

L = [['x','y'] for i in range(20)]

(For Python 2.x, use xrange() instead of range() for performance.)



Related Topics



Leave a reply



Submit