Generating Sublists Using Multiplication ( * ) Unexpected Behavior

Generating sublists using multiplication ( * ) unexpected behavior

My best guess is that using multiplication in the form [[]] * x causes Python to store a reference to a single cell...?

Yes. And you can test this yourself

>>> lst = [[]] * 3
>>> print [id(x) for x in lst]
[11124864, 11124864, 11124864]

This shows that all three references refer to the same object. And note that it really makes perfect sense that this happens1. It just copies the values, and in this case, the values are references. And that's why you see the same reference repeated three times.

It is interesting to note that if I do

lst = [[]]*3
lst[0] = [5]
lst[0].append(3)

then the 'linkage' of cell 0 is broken and I get [[5,3],[],[]], but lst[1].append(0) still causes [[5,3],[0],[0].

You changed the reference that occupies lst[0]; that is, you assigned a new value to lst[0]. But you didn't change the value of the other elements, they still refer to the same object that they referred to. And lst[1] and lst[2] still refer to exactly the same instance, so of course appending an item to lst[1] causes lst[2] to also see that change.

This is a classic mistake people make with pointers and references. Here's the simple analogy. You have a piece of paper. On it, you write the address of someone's house. You now take that piece of paper, and photocopy it twice so you end up with three pieces of paper with the same address written on them. Now, take the first piece of paper, scribble out the address written on it, and write a new address to someone else's house. Did the address written on the other two pieces of paper change? No. That's exactly what your code did, though. That's why the other two items don't change. Further, imagine that the owner of the house with address that is still on the second piece of paper builds an add-on garage to their house. Now I ask you, does the house whose address is on the third piece of paper have an add-on garage? Yes, it does, because it's exactly the same house as the one whose address is written on the second piece of paper. This explains everything about your second code example.

1: You didn't expect Python to invoke a "copy constructor" did you? Puke.

List of lists changes reflected across sublists unexpectedly

When you write [x]*3 you get, essentially, the list [x, x, x]. That is, a list with 3 references to the same x. When you then modify this single x it is visible via all three references to it:

x = [1] * 4
xs = [x] * 3
print(f"id(x): {id(x)}")
# id(x): 140560897920048
print(
f"id(xs[0]): {id(xs[0])}\n"
f"id(xs[1]): {id(xs[1])}\n"
f"id(xs[2]): {id(xs[2])}"
)
# id(xs[0]): 140560897920048
# id(xs[1]): 140560897920048
# id(xs[2]): 140560897920048

x[0] = 42
print(f"x: {x}")
# x: [42, 1, 1, 1]
print(f"xs: {xs}")
# xs: [[42, 1, 1, 1], [42, 1, 1, 1], [42, 1, 1, 1]]

To fix it, you need to make sure that you create a new list at each position. One way to do it is

[[1]*4 for _ in range(3)]

which will reevaluate [1]*4 each time instead of evaluating it once and making 3 references to 1 list.


You might wonder why * can't make independent objects the way the list comprehension does. That's because the multiplication operator * operates on objects, without seeing expressions. When you use * to multiply [[1] * 4] by 3, * only sees the 1-element list [[1] * 4] evaluates to, not the [[1] * 4 expression text. * has no idea how to make copies of that element, no idea how to reevaluate [[1] * 4], and no idea you even want copies, and in general, there might not even be a way to copy the element.

The only option * has is to make new references to the existing sublist instead of trying to make new sublists. Anything else would be inconsistent or require major redesigning of fundamental language design decisions.

In contrast, a list comprehension reevaluates the element expression on every iteration. [[1] * 4 for n in range(3)] reevaluates [1] * 4 every time for the same reason [x**2 for x in range(3)] reevaluates x**2 every time. Every evaluation of [1] * 4 generates a new list, so the list comprehension does what you wanted.

Incidentally, [1] * 4 also doesn't copy the elements of [1], but that doesn't matter, since integers are immutable. You can't do something like 1.value = 2 and turn a 1 into a 2.

Python multiply sequence trick

A = [ [] ] * 2 creates a list with two references to the same list:

>>> A = [ [] ] * 2
>>> id(A[0])
24956880
>>> id(A[1])
24956880
>>> id(A[0]) == id(A[1])
True
>>>

Instead, you need to use a list comprehension:

>>> A = [[] for _ in xrange(2)]
>>> A
[[], []]
>>> A[0].append(1)
>>> A
[[1], []]
>>>

Note that, if you are on Python 3.x, you will need to replace xrange with range.

Python - Using the Multiply Operator to Create Copies of Objects in Lists

The multiplication operator on a sequence means repetition of the item(s) -- NOT creation of copies (shallow or deep ones) of the items. Nothing stops you from going crazy, a la:

import copy

class Crazy(object):
def __init__(self, body, weird=copy.copy):
self.gomez = body
self.cousinitt = weird
def __mul__(self, n):
return [self.cousinitt(x) for x in (self.gomez * n)]

a = Crazy([[]]) * 3

...except your sanity and common sense, if any. Checking on those, how DID you dream operator * could be made to mean something utterly different than it's intended to mean, except by defining another class overloading __mul__ in weird ways...?-)

Nested List Indices

The problem is caused by the fact that python chooses to pass lists around by reference.

Normally variables are passed "by value", so they operate independently:

>>> a = 1
>>> b = a
>>> a = 2
>>> print b
1

But since lists might get pretty large, rather than shifting the whole list around memory, Python chooses to just use a reference ('pointer' in C terms). If you assign one to another variable, you assign just the reference to it. This means that you can have two variables pointing to the same list in memory:

>>> a = [1]
>>> b = a
>>> a[0] = 2
>>> print b
[2]

So, in your first line of code you have 4 * [0]. Now [0] is a pointer to the value 0 in memory, and when you multiply it, you get four pointers to the same place in memory. BUT when you change one of the values then Python knows that the pointer needs to change to point to the new value:

>>> a = 4 * [0]
>>> a
[0, 0, 0, 0]
>>> [id(v) for v in a]
[33302480, 33302480, 33302480, 33302480]
>>> a[0] = 1
>>> a
[1, 0, 0, 0]

The problem comes when you multiply this list - you get four copies of the list pointer. Now when you change one of the values in one list, all four change together:

>>> a[0][0] = 1
>>> a
[[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]]

The solution is to avoid the second multiplication. A loop does the job:

>>> some_list = [(4 * [0]) for _ in range(4)]

Difference between [[]]*3 and [[] for i in range(3)]

[[]]*3 creates a list with three references to the same list object:

>>> lst = [[]]*3
>>> # The object ids of the lists in 'lst' are the same
>>> id(lst[0])
25130048
>>> id(lst[1])
25130048
>>> id(lst[2])
25130048
>>>

[[] for i in range(3)] creates a list with three unique list objects:

>>> lst = [[] for i in range(3)]
>>> # The object ids of the lists in 'lst' are different
>>> id(lst[0])
25131768
>>> id(lst[1])
25130008
>>> id(lst[2])
25116064
>>>

Nested List Indices

The problem is caused by the fact that python chooses to pass lists around by reference.

Normally variables are passed "by value", so they operate independently:

>>> a = 1
>>> b = a
>>> a = 2
>>> print b
1

But since lists might get pretty large, rather than shifting the whole list around memory, Python chooses to just use a reference ('pointer' in C terms). If you assign one to another variable, you assign just the reference to it. This means that you can have two variables pointing to the same list in memory:

>>> a = [1]
>>> b = a
>>> a[0] = 2
>>> print b
[2]

So, in your first line of code you have 4 * [0]. Now [0] is a pointer to the value 0 in memory, and when you multiply it, you get four pointers to the same place in memory. BUT when you change one of the values then Python knows that the pointer needs to change to point to the new value:

>>> a = 4 * [0]
>>> a
[0, 0, 0, 0]
>>> [id(v) for v in a]
[33302480, 33302480, 33302480, 33302480]
>>> a[0] = 1
>>> a
[1, 0, 0, 0]

The problem comes when you multiply this list - you get four copies of the list pointer. Now when you change one of the values in one list, all four change together:

>>> a[0][0] = 1
>>> a
[[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]]

The solution is to avoid the second multiplication. A loop does the job:

>>> some_list = [(4 * [0]) for _ in range(4)]

How to append an element of a sublist in python

By doing x = y = [] you are creating x and y and referencing them to the same list, hence the erroneous output. (The Object IDs are same below)

>>> x = y = []
>>> id(x)
43842656
>>> id(y)
43842656

If you fix that, you get the correct result.

>>> x = []
>>> y = []
>>> for sublist in lst:
x.append(sublist[0])
y.append(sublist[1])

>>> x
[3, 3, 3]
>>> y
[1, 2, 3]

Although, this could be made pretty easier by doing.

x,y = zip(*lst)

P.S. - Please don't use list as a variable name, it shadows the builtin.



Related Topics



Leave a reply



Submit