Python Copy a List of Lists

Python copy a list of lists

From the docs for the copy module:

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

When you call regular copy.copy() you are performing a shallow copy. This means that in a case of a list-of-lists, you will get a new copy of the outer list, but it will contain the original inner lists as its elements. Instead you should use copy.deepcopy(), which will create a new copy of both the outer and inner lists.

The reason that you didn't notice this with your first example of using copy([1,2]) is that the primitives like int are immutable, and thus it is impossible to change their value without creating a new instance. If the contents of the list had instead been mutable objects (like lists, or any user-defined object with mutable members), any mutation of those objects would have been seen in both copies of the list.

How to copy a list of lists in Python?

As quamrana mentions, you can use deepcopy.

import copy

a = [[1,2],[3,4]]
b = copy.deepcopy(a)

Taken from here.

How do I clone a list so that it doesn't change unexpectedly after assignment?

new_list = my_list doesn't actually create a second list. The assignment just copies the reference to the list, not the actual list, so both new_list and my_list refer to the same list after the assignment.

To actually copy the list, you have several options:

  • You can use the builtin list.copy() method (available since Python 3.3):

    new_list = old_list.copy()
  • You can slice it:

    new_list = old_list[:]

    Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).

  • You can use the built in list() constructor:

    new_list = list(old_list)
  • You can use generic copy.copy():

    import copy
    new_list = copy.copy(old_list)

    This is a little slower than list() because it has to find out the datatype of old_list first.

  • If you need to copy the elements of the list as well, use generic copy.deepcopy():

    import copy
    new_list = copy.deepcopy(old_list)

    Obviously the slowest and most memory-needing method, but sometimes unavoidable. This operates recursively; it will handle any number of levels of nested lists (or other containers).

Example:

import copy

class Foo(object):
def __init__(self, val):
self.val = val

def __repr__(self):
return f'Foo({self.val!r})'

foo = Foo(1)

a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)

# edit orignal list and instance
a.append('baz')
foo.val = 5

print(f'original: {a}\nlist.copy(): {b}\nslice: {c}\nlist(): {d}\ncopy: {e}\ndeepcopy: {f}')

Result:

original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]

How to deep copy a list?

E0_copy is not a deep copy. You don't make a deep copy using list(). (Both list(...) and testList[:] are shallow copies.)

You use copy.deepcopy(...) for deep copying a list.

deepcopy(x, memo=None, _nil=[])
Deep copy operation on arbitrary Python objects.

See the following snippet -

>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = list(a)
>>> a
[[1, 2, 3], [4, 5, 6]]
>>> b
[[1, 2, 3], [4, 5, 6]]
>>> a[0][1] = 10
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b # b changes too -> Not a deepcopy.
[[1, 10, 3], [4, 5, 6]]

Now see the deepcopy operation

>>> import copy
>>> b = copy.deepcopy(a)
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b
[[1, 10, 3], [4, 5, 6]]
>>> a[0][1] = 9
>>> a
[[1, 9, 3], [4, 5, 6]]
>>> b # b doesn't change -> Deep Copy
[[1, 10, 3], [4, 5, 6]]

To explain, list(...) does not recursively make copies of the inner objects. It only makes a copy of the outermost list, while still referencing the same inner lists, hence, when you mutate the inner lists, the change is reflected in both the original list and the shallow copy. You can see that shallow copying references the inner lists by checking that id(a[0]) == id(b[0]) where b = list(a).

Removing duplicates from a list of lists

>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> import itertools
>>> k.sort()
>>> list(k for k,_ in itertools.groupby(k))
[[1, 2], [3], [4], [5, 6, 2]]

itertools often offers the fastest and most powerful solutions to this kind of problems, and is well worth getting intimately familiar with!-)

Edit: as I mention in a comment, normal optimization efforts are focused on large inputs (the big-O approach) because it's so much easier that it offers good returns on efforts. But sometimes (essentially for "tragically crucial bottlenecks" in deep inner loops of code that's pushing the boundaries of performance limits) one may need to go into much more detail, providing probability distributions, deciding which performance measures to optimize (maybe the upper bound or the 90th centile is more important than an average or median, depending on one's apps), performing possibly-heuristic checks at the start to pick different algorithms depending on input data characteristics, and so forth.

Careful measurements of "point" performance (code A vs code B for a specific input) are a part of this extremely costly process, and standard library module timeit helps here. However, it's easier to use it at a shell prompt. For example, here's a short module to showcase the general approach for this problem, save it as nodup.py:

import itertools

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

def doset(k, map=map, list=list, set=set, tuple=tuple):
return map(list, set(map(tuple, k)))

def dosort(k, sorted=sorted, xrange=xrange, len=len):
ks = sorted(k)
return [ks[i] for i in xrange(len(ks)) if i == 0 or ks[i] != ks[i-1]]

def dogroupby(k, sorted=sorted, groupby=itertools.groupby, list=list):
ks = sorted(k)
return [i for i, _ in itertools.groupby(ks)]

def donewk(k):
newk = []
for i in k:
if i not in newk:
newk.append(i)
return newk

# sanity check that all functions compute the same result and don't alter k
if __name__ == '__main__':
savek = list(k)
for f in doset, dosort, dogroupby, donewk:
resk = f(k)
assert k == savek
print '%10s %s' % (f.__name__, sorted(resk))

Note the sanity check (performed when you just do python nodup.py) and the basic hoisting technique (make constant global names local to each function for speed) to put things on equal footing.

Now we can run checks on the tiny example list:

$ python -mtimeit -s'import nodup' 'nodup.doset(nodup.k)'
100000 loops, best of 3: 11.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort(nodup.k)'
100000 loops, best of 3: 9.68 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby(nodup.k)'
100000 loops, best of 3: 8.74 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.donewk(nodup.k)'
100000 loops, best of 3: 4.44 usec per loop

confirming that the quadratic approach has small-enough constants to make it attractive for tiny lists with few duplicated values. With a short list without duplicates:

$ python -mtimeit -s'import nodup' 'nodup.donewk([[i] for i in range(12)])'
10000 loops, best of 3: 25.4 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dogroupby([[i] for i in range(12)])'
10000 loops, best of 3: 23.7 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.doset([[i] for i in range(12)])'
10000 loops, best of 3: 31.3 usec per loop
$ python -mtimeit -s'import nodup' 'nodup.dosort([[i] for i in range(12)])'
10000 loops, best of 3: 25 usec per loop

the quadratic approach isn't bad, but the sort and groupby ones are better. Etc, etc.

If (as the obsession with performance suggests) this operation is at a core inner loop of your pushing-the-boundaries application, it's worth trying the same set of tests on other representative input samples, possibly detecting some simple measure that could heuristically let you pick one or the other approach (but the measure must be fast, of course).

It's also well worth considering keeping a different representation for k -- why does it have to be a list of lists rather than a set of tuples in the first place? If the duplicate removal task is frequent, and profiling shows it to be the program's performance bottleneck, keeping a set of tuples all the time and getting a list of lists from it only if and where needed, might be faster overall, for example.

join list of lists in python

import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))

How to copy list of lists from python without last elements from each list

In one step, you'll want to use a list comprehension. Assuming your lists are two dimensional only and the sublists composed of scalars, you can use the slicing syntax to create a copy.

>>> [x[:-1] for x in test2]
[['A', 'A'], ['C'], ['A', 'B', 'C']]

If your sublists contain mutable/custom objects, call copy.deepcopy inside the expression.

>>> [copy.deepcopy(x[:-1]) for x in test2]
[['A', 'A'], ['C'], ['A', 'B', 'C']]


Related Topics



Leave a reply



Submit