Efficient Ways to Duplicate Array/List in Python

Efficient ways to duplicate array/list in Python

Use the timeit module in python for testing timings.

from copy import *

a=range(1000)

def cop():
b=copy(a)

def func1():
b=list(a)

def slice():
b=a[:]

def slice_len():
b=a[0:len(a)]

if __name__=="__main__":
import timeit
print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")

Results:

copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277 #winner
a[0:len(a)] 10.5431251526

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

Here's the byte code comparison of the two:

In [19]: dis.dis(func1)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)

3 12 LOAD_FAST 0 (a)
15 SLICE+0
16 STORE_FAST 1 (b)
19 LOAD_CONST 0 (None)
22 RETURN_VALUE

In [20]: dis.dis(func2)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)

3 12 LOAD_FAST 0 (a) #same up to here
15 LOAD_CONST 2 (0) #loads 0
18 LOAD_GLOBAL 1 (len) # loads the builtin len(),
# so it might take some lookup time
21 LOAD_FAST 0 (a)
24 CALL_FUNCTION 1
27 SLICE+3
28 STORE_FAST 1 (b)
31 LOAD_CONST 0 (None)
34 RETURN_VALUE

How do I clone a list so that it doesn't change unexpectedly after assignment?

new_list = my_list doesn't actually create a second list. The assignment just copies the reference to the list, not the actual list, so both new_list and my_list refer to the same list after the assignment.

To actually copy the list, you have several options:

  • You can use the builtin list.copy() method (available since Python 3.3):

    new_list = old_list.copy()
  • You can slice it:

    new_list = old_list[:]

    Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).

  • You can use the built in list() constructor:

    new_list = list(old_list)
  • You can use generic copy.copy():

    import copy
    new_list = copy.copy(old_list)

    This is a little slower than list() because it has to find out the datatype of old_list first.

  • If you need to copy the elements of the list as well, use generic copy.deepcopy():

    import copy
    new_list = copy.deepcopy(old_list)

    Obviously the slowest and most memory-needing method, but sometimes unavoidable. This operates recursively; it will handle any number of levels of nested lists (or other containers).

Example:

import copy

class Foo(object):
def __init__(self, val):
self.val = val

def __repr__(self):
return f'Foo({self.val!r})'

foo = Foo(1)

a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)

# edit orignal list and instance
a.append('baz')
foo.val = 5

print(f'original: {a}\nlist.copy(): {b}\nslice: {c}\nlist(): {d}\ncopy: {e}\ndeepcopy: {f}')

Result:

original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]

How to deep copy a list?

E0_copy is not a deep copy. You don't make a deep copy using list(). (Both list(...) and testList[:] are shallow copies.)

You use copy.deepcopy(...) for deep copying a list.

deepcopy(x, memo=None, _nil=[])
Deep copy operation on arbitrary Python objects.

See the following snippet -

>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = list(a)
>>> a
[[1, 2, 3], [4, 5, 6]]
>>> b
[[1, 2, 3], [4, 5, 6]]
>>> a[0][1] = 10
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b # b changes too -> Not a deepcopy.
[[1, 10, 3], [4, 5, 6]]

Now see the deepcopy operation

>>> import copy
>>> b = copy.deepcopy(a)
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b
[[1, 10, 3], [4, 5, 6]]
>>> a[0][1] = 9
>>> a
[[1, 9, 3], [4, 5, 6]]
>>> b # b doesn't change -> Deep Copy
[[1, 10, 3], [4, 5, 6]]

To explain, list(...) does not recursively make copies of the inner objects. It only makes a copy of the outermost list, while still referencing the same inner lists, hence, when you mutate the inner lists, the change is reflected in both the original list and the shallow copy. You can see that shallow copying references the inner lists by checking that id(a[0]) == id(b[0]) where b = list(a).

Efficiently finding duplicates in a list

The idea here is to do a single sweep in linear time. You can use a counter to do this. The idea is to count each element and then return all those which were counted multiple times.

from collections import Counter

def get_duplicates(array):
c = Counter(array)
return [k for k in c if c[k] > 1]

The approach above is linear in complexity, but makes two passes over the input - once, to count (this is abstracted away by the Counter constructor) and the other is to return non-unique values in the list comp. You can optimise this a bit using a yield statement, and return duplicates as you find them.

def get_duplicates(array):
c = Counter()
seen = set()
for i in array:
c[i] += 1
if c[i] > 1 and i not in seen:
seen.add(i)
yield i

This results in a compulsory if check and extra space in the form of a set, but you reduce two passes to one.

What is the best way to copy a list?

If you want a shallow copy (elements aren't copied) use:

lst2=lst1[:]

If you want to make a deep copy then use the copy module:

import copy
lst2=copy.deepcopy(lst1)

What is the most time efficient way to remove unordered duplicates in a 2D array?

Since you want to find unordered duplicates the best way to go is by typecasting. Typecast them as set. Since set only contains immutable elements. So, I made a set of tuples.

Note:
The best way to eliminate duplicates is by making a set of the given elements.

>>> set(map(tuple,map(sorted,x)))
{(-3, -2, 4, 5), (-5, 0, 4, 5)}

How do I find the duplicates in a list and create another list with them?

To remove duplicates use set(a). To print duplicates, something like:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably overkill here. set will perform better. This code computes a list of unique elements in the source order:

seen = set()
uniq = []
for x in a:
if x not in seen:
uniq.append(x)
seen.add(x)

or, more concisely:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]

I don't recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

To compute the list of duplicated elements without libraries:

seen = set()
dupes = []

for x in a:
if x in seen:
dupes.append(x)
else:
seen.add(x)

or, more concisely:

seen = set()
dupes = [x for x in a if x in seen or seen.add(x)]

If list elements are not hashable, you cannot use sets/dicts and have to resort to a quadratic time solution (compare each with each). For example:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

Efficient way to remove half of the duplicate items in a list

If order isn't important, a way would be to get the odd or even indexes only after a sort. Those lists will be the same so you only need one of them.

l = [1,8,8,8,1,3,3,8]
l.sort()

# Get all odd indexes
odd = l[1::2]

# Get all even indexes
even = l[::2]

print(odd)
print(odd == even)

Result:

[1, 3, 8, 8]
True


Related Topics



Leave a reply



Submit