Slicing a list in Python without generating a copy
The short answer
Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.
The long answer
Testing on mutable and immutable values
First, let's test the basic claim. We can show that even in the case of immutable objects like integers, only the reference is copied. Here are three different integer objects, each with the same value:
>>> a = [1000 + 1, 1000 + 1, 1000 + 1]
They have the same value, but you can see they are three distinct objects because they have different id
s:
>>> map(id, a)
[140502922988976, 140502922988952, 140502922988928]
When you slice them, the references remain the same. No new objects have been created:
>>> b = a[1:3]
>>> map(id, b)
[140502922988952, 140502922988928]
Using different objects with the same value shows that the copy process doesn't bother with interning -- it just directly copies the references.
Testing with mutable values gives the same result:
>>> a = [{0: 'zero', 1: 'one'}, ['foo', 'bar']]
>>> map(id, a)
[4380777000, 4380712040]
>>> map(id, a[1:]
... )
[4380712040]
Examining remaining memory overhead
Of course the references themselves are copied. Each one costs 8 bytes on a 64-bit machine. And each list has its own memory overhead of 72 bytes:
>>> for i in range(len(a)):
... x = a[:i]
... print('len: {}'.format(len(x)))
... print('size: {}'.format(sys.getsizeof(x)))
...
len: 0
size: 72
len: 1
size: 80
len: 2
size: 88
As Joe Pinsonault reminds us, that overhead adds up. And integer objects themselves are not very large -- they are three times larger than references. So this saves you some memory in an absolute sense, but asymptotically, it might be nice to be able to have multiple lists that are "views" into the same memory.
Saving memory by using views
Unfortunately, Python provides no easy way to produce objects that are "views" into lists. Or perhaps I should say "fortunately"! It means you don't have to worry about where a slice comes from; changes to the original won't affect the slice. Overall, that makes reasoning about a program's behavior much easier.
If you really want to save memory by working with views, consider using numpy
arrays. When you slice a numpy
array, the memory is shared between the slice and the original:
>>> a = numpy.arange(3)
>>> a
array([0, 1, 2])
>>> b = a[1:3]
>>> b
array([1, 2])
What happens when we modify a
and look again at b
?
>>> a[2] = 1001
>>> b
array([ 1, 1001])
But this means you have to be sure that when you modify one object, you aren't inadvertently modifying another. That's the trade-off when you use numpy
: less work for the computer, and more work for the programmer!
Slicing a list without copying
If you want to modify just a part of the list, you can send in only that slice, and assign the modifications to that same section using return
:
def myFunc(aList):
for i in range(0,len(aList)):
aList[i] = 0
return aList
wholeList = range(0,10)
wholeList[3:6] = myFunc(wholeList[3:6])
Otherwise as you said before, you are just creating new lists that happen to reference what is contained in the slice of the original, you'll have to reassign any modifications to the new list back into the old one.
To further clarify, the elements of wholeList[3:6]
and partOfList
are the same objects. But the lists are just references to those objects. So id(wholeList[3]) == id(partOfList[0])
, but if we change one of those references, eg partOfList[0] = 10
, then we change the reference of partOfList[0]
to 10
not the actual object (3
) that was being pointed to. Thus there is no action in this modification that would reflect any changes in wholeList
, since it is still referencing the original object..
If Python slice copy the reference, why can't I use it to modify the original list?
You are right that slicing doesn't copy the items in the list. However, it does create a new list object.
Your comment suggests a misunderstanding:
# Attempting to modify the element at index 1
l[0:2][-1] = 10
This is not a modification of the element, it's a modification of the list. In other words it is really "change the list so that index 1 now points to the number 10". Since your slice created a new list, you are just changing that new list to point at some other object.
In your comment to oldrinb's answer, you said:
Why are
l[0:1]
andl[0:1][0]
different? Shouldn't they both refer to the same object, i.e. the first item ofl
?
Aside from the fact that l[0:1]
is a list while l[0:1][0]
is a single element, there is again the same misunderstanding here. Suppose that some_list
is a list and the object at index ix
is obj
. This:
some_list[ix] = blah
. . . is an operation on some_list
. The object obj
is not involved. This can be confusing because it means some_list[ix]
has slightly different semantics depending on which side of the assignment it is on. If you do
blah = some_list[ix] + 2
. . .then you are indeed operating on the object inside the list (i.e., it is the same as obj + 2
). But when the indexing operation is on the left of the assignment, it no longer involves the contained object at all, only the list itself.
When you assign to a list index you are modifying the list, not the object inside it. So in your example l[0]
is the same as l[0:2][0]
, but that doesn't matter; because your indexing is an assignment target, it's modifying the list and doesn't care what object was in there already.
Does Python copy references to objects when slicing a list?
Slicing will copy the references. If you have a list of 100 million things:
l = [object() for i in xrange(100000000)]
and you make a slice:
l2 = l[:-1]
l2
will have its own backing array of 99,999,999 pointers, rather than sharing l
's array. However, the objects those pointers refer to are not copied:
>>> l2[0] is l[0]
True
If you want to iterate over overlapping pairs of elements of a list without making a copy, you can zip
the list with an iterator that has been advanced one position:
second_items = iter(l)
next(second_items, None) # Avoid exception on empty input
for thing1, thing2 in itertools.izip(l, second_items):
whatever()
This takes advantage of the fact that zip
stops when any input iterator stops. This can be extended to cases where you're already working with an iterator using itertools.tee
i1, i2 = itertools.tee(iterator)
next(i2, None)
for thing1, thing2 in itertools.izip(i1, i2):
whatever()
Does a slicing operation give me a deep or shallow copy?
You are creating a shallow copy, because nested values are not copied, merely referenced. A deep copy would create copies of the values referenced by the list too.
Demo:
>>> lst = [{}]
>>> lst_copy = lst[:]
>>> lst_copy[0]['foo'] = 'bar'
>>> lst_copy.append(42)
>>> lst
[{'foo': 'bar'}]
>>> id(lst) == id(lst_copy)
False
>>> id(lst[0]) == id(lst_copy[0])
True
Here the nested dictionary is not copied; it is merely referenced by both lists. The new element 42
is not shared.
Remember that everything in Python is an object, and names and list elements are merely references to those objects. A copy of a list creates a new outer list, but the new list merely receives references to the exact same objects.
A proper deep copy creates new copies of each and every object contained in the list, recursively:
>>> from copy import deepcopy
>>> lst_deepcopy = deepcopy(lst)
>>> id(lst_deepcopy[0]) == id(lst[0])
False
Is there a way to not make a copy when a numpy array is sliced?
In Python slice
is a well defined class, with start
, stop
, step
values. It is used when we index a list with alist[1: 10: 2]
. This makes a new list with copies of the pointers from the original. In numpy
these are used in basic indexing
, e.g. arr[:3, -3:]
. This creates a view
of the original. The view
shares the data buffer, but has its own shape
and strides
.
But when we index arrays with lists, arrays or boolean arrays (mask), it has to make a copy, an array with its own data buffer. The selection of elements is too complex or irregular to express in terms of the shape
and strides
attributes.
In some cases the index array is small (compared to the original) and copy is also small. But if we are permuting the whole array, then the index array, and copy will both be as large as the original.
Understanding slicing
The syntax is:
a[start:stop] # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array
There is also the step
value, which can be used with any of the above:
a[start:stop:step] # start through not past stop, by step
The key point to remember is that the :stop
value represents the first value that is not in the selected slice. So, the difference between stop
and start
is the number of elements selected (if step
is 1, the default).
The other feature is that start
or stop
may be a negative number, which means it counts from the end of the array instead of the beginning. So:
a[-1] # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items
Similarly, step
may be a negative number:
a[::-1] # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed
Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2]
and a
only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.
Relationship with the slice
object
A slice
object can represent a slicing operation, i.e.:
a[start:stop:step]
is equivalent to:
a[slice(start, stop, step)]
Slice objects also behave slightly differently depending on the number of arguments, similarly to range()
, i.e. both slice(stop)
and slice(start, stop[, step])
are supported.
To skip specifying a given argument, one might use None
, so that e.g. a[start:]
is equivalent to a[slice(start, None)]
or a[::-1]
is equivalent to a[slice(None, None, -1)]
.
While the :
-based notation is very helpful for simple slicing, the explicit use of slice()
objects simplifies the programmatic generation of slicing.
Python list slice syntax used for no obvious reason
Like NXC said, Python variable names actually point to an object, and not a specific spot in memory.
newList = oldList
would create two different variables that point to the same object, therefore, changing oldList
would also change newList
.
However, when you do newList = oldList[:]
, it "slices" the list, and creates a new list. The default values for [:]
are 0 and the end of the list, so it copies everything. Therefore, it creates a new list with all the data contained in the first one, but both can be altered without changing the other.
Related Topics
Generate 'N' Unique Random Numbers Within a Range
How to Round a Number to Significant Figures in Python
How to Select Rows in a Dataframe Between Two Values, in Python Pandas
Circular Import Dependency in Python
Passing an Integer by Reference in Python
Python3 --Version Shows "Nameerror: Name 'Python3' Is Not Defined"
How to Upgrade All Python Packages with Pip
Why Does Random.Shuffle Return None
Python Subprocess Get Children's Output to File and Terminal
Getting the Class Name of an Instance
What Is a Cross-Platform Way to Get the Home Directory
Apply VS Transform on a Group Object
Django Media_Url and Media_Root
Split a String by Spaces -- Preserving Quoted Substrings -- in Python
What Are Logits? Differencebetween Softmax and Softmax_Cross_Entropy_With_Logits