Slicing a List in Python Without Generating a Copy

Slicing a list in Python without generating a copy

The short answer

Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.

The long answer

Testing on mutable and immutable values

First, let's test the basic claim. We can show that even in the case of immutable objects like integers, only the reference is copied. Here are three different integer objects, each with the same value:

>>> a = [1000 + 1, 1000 + 1, 1000 + 1]

They have the same value, but you can see they are three distinct objects because they have different ids:

>>> map(id, a)
[140502922988976, 140502922988952, 140502922988928]

When you slice them, the references remain the same. No new objects have been created:

>>> b = a[1:3]
>>> map(id, b)
[140502922988952, 140502922988928]

Using different objects with the same value shows that the copy process doesn't bother with interning -- it just directly copies the references.

Testing with mutable values gives the same result:

>>> a = [{0: 'zero', 1: 'one'}, ['foo', 'bar']]
>>> map(id, a)
[4380777000, 4380712040]
>>> map(id, a[1:]
... )
[4380712040]

Examining remaining memory overhead

Of course the references themselves are copied. Each one costs 8 bytes on a 64-bit machine. And each list has its own memory overhead of 72 bytes:

>>> for i in range(len(a)):
... x = a[:i]
... print('len: {}'.format(len(x)))
... print('size: {}'.format(sys.getsizeof(x)))
...
len: 0
size: 72
len: 1
size: 80
len: 2
size: 88

As Joe Pinsonault reminds us, that overhead adds up. And integer objects themselves are not very large -- they are three times larger than references. So this saves you some memory in an absolute sense, but asymptotically, it might be nice to be able to have multiple lists that are "views" into the same memory.

Saving memory by using views

Unfortunately, Python provides no easy way to produce objects that are "views" into lists. Or perhaps I should say "fortunately"! It means you don't have to worry about where a slice comes from; changes to the original won't affect the slice. Overall, that makes reasoning about a program's behavior much easier.

If you really want to save memory by working with views, consider using numpy arrays. When you slice a numpy array, the memory is shared between the slice and the original:

>>> a = numpy.arange(3)
>>> a
array([0, 1, 2])
>>> b = a[1:3]
>>> b
array([1, 2])

What happens when we modify a and look again at b?

>>> a[2] = 1001
>>> b
array([ 1, 1001])

But this means you have to be sure that when you modify one object, you aren't inadvertently modifying another. That's the trade-off when you use numpy: less work for the computer, and more work for the programmer!

Slicing a list without copying

If you want to modify just a part of the list, you can send in only that slice, and assign the modifications to that same section using return:

def myFunc(aList):
for i in range(0,len(aList)):
aList[i] = 0
return aList

wholeList = range(0,10)
wholeList[3:6] = myFunc(wholeList[3:6])

Otherwise as you said before, you are just creating new lists that happen to reference what is contained in the slice of the original, you'll have to reassign any modifications to the new list back into the old one.

To further clarify, the elements of wholeList[3:6] and partOfList are the same objects. But the lists are just references to those objects. So id(wholeList[3]) == id(partOfList[0]), but if we change one of those references, eg partOfList[0] = 10, then we change the reference of partOfList[0] to 10 not the actual object (3) that was being pointed to. Thus there is no action in this modification that would reflect any changes in wholeList, since it is still referencing the original object..

If Python slice copy the reference, why can't I use it to modify the original list?

You are right that slicing doesn't copy the items in the list. However, it does create a new list object.

Your comment suggests a misunderstanding:

# Attempting to modify the element at index 1
l[0:2][-1] = 10

This is not a modification of the element, it's a modification of the list. In other words it is really "change the list so that index 1 now points to the number 10". Since your slice created a new list, you are just changing that new list to point at some other object.

In your comment to oldrinb's answer, you said:

Why are l[0:1] and l[0:1][0] different? Shouldn't they both refer to the same object, i.e. the first item of l?

Aside from the fact that l[0:1] is a list while l[0:1][0] is a single element, there is again the same misunderstanding here. Suppose that some_list is a list and the object at index ix is obj. This:

some_list[ix] = blah

. . . is an operation on some_list. The object obj is not involved. This can be confusing because it means some_list[ix] has slightly different semantics depending on which side of the assignment it is on. If you do

blah = some_list[ix] + 2

. . .then you are indeed operating on the object inside the list (i.e., it is the same as obj + 2). But when the indexing operation is on the left of the assignment, it no longer involves the contained object at all, only the list itself.

When you assign to a list index you are modifying the list, not the object inside it. So in your example l[0] is the same as l[0:2][0], but that doesn't matter; because your indexing is an assignment target, it's modifying the list and doesn't care what object was in there already.

Does Python copy references to objects when slicing a list?

Slicing will copy the references. If you have a list of 100 million things:

l = [object() for i in xrange(100000000)]

and you make a slice:

l2 = l[:-1]

l2 will have its own backing array of 99,999,999 pointers, rather than sharing l's array. However, the objects those pointers refer to are not copied:

>>> l2[0] is l[0]
True

If you want to iterate over overlapping pairs of elements of a list without making a copy, you can zip the list with an iterator that has been advanced one position:

second_items = iter(l)
next(second_items, None) # Avoid exception on empty input
for thing1, thing2 in itertools.izip(l, second_items):
whatever()

This takes advantage of the fact that zip stops when any input iterator stops. This can be extended to cases where you're already working with an iterator using itertools.tee

i1, i2 = itertools.tee(iterator)
next(i2, None)
for thing1, thing2 in itertools.izip(i1, i2):
whatever()

Does a slicing operation give me a deep or shallow copy?

You are creating a shallow copy, because nested values are not copied, merely referenced. A deep copy would create copies of the values referenced by the list too.

Demo:

>>> lst = [{}]
>>> lst_copy = lst[:]
>>> lst_copy[0]['foo'] = 'bar'
>>> lst_copy.append(42)
>>> lst
[{'foo': 'bar'}]
>>> id(lst) == id(lst_copy)
False
>>> id(lst[0]) == id(lst_copy[0])
True

Here the nested dictionary is not copied; it is merely referenced by both lists. The new element 42 is not shared.

Remember that everything in Python is an object, and names and list elements are merely references to those objects. A copy of a list creates a new outer list, but the new list merely receives references to the exact same objects.

A proper deep copy creates new copies of each and every object contained in the list, recursively:

>>> from copy import deepcopy
>>> lst_deepcopy = deepcopy(lst)
>>> id(lst_deepcopy[0]) == id(lst[0])
False

Is there a way to not make a copy when a numpy array is sliced?

In Python slice is a well defined class, with start, stop, step values. It is used when we index a list with alist[1: 10: 2]. This makes a new list with copies of the pointers from the original. In numpy these are used in basic indexing, e.g. arr[:3, -3:]. This creates a view of the original. The view shares the data buffer, but has its own shape and strides.

But when we index arrays with lists, arrays or boolean arrays (mask), it has to make a copy, an array with its own data buffer. The selection of elements is too complex or irregular to express in terms of the shape and strides attributes.

In some cases the index array is small (compared to the original) and copy is also small. But if we are permuting the whole array, then the index array, and copy will both be as large as the original.

Understanding slicing

The syntax is:

a[start:stop]  # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array

There is also the step value, which can be used with any of the above:

a[start:stop:step] # start through not past stop, by step

The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).

The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:

a[-1]    # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items

Similarly, step may be a negative number:

a[::-1]    # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

Relationship with the slice object

A slice object can represent a slicing operation, i.e.:

a[start:stop:step]

is equivalent to:

a[slice(start, stop, step)]

Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported.
To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].

While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.

Python list slice syntax used for no obvious reason

Like NXC said, Python variable names actually point to an object, and not a specific spot in memory.

newList = oldList would create two different variables that point to the same object, therefore, changing oldList would also change newList.

However, when you do newList = oldList[:], it "slices" the list, and creates a new list. The default values for [:] are 0 and the end of the list, so it copies everything. Therefore, it creates a new list with all the data contained in the first one, but both can be altered without changing the other.



Related Topics



Leave a reply



Submit