Efficient ways to duplicate array/list in Python
Use the timeit
module in python for testing timings.
from copy import *
a=range(1000)
def cop():
b=copy(a)
def func1():
b=list(a)
def slice():
b=a[:]
def slice_len():
b=a[0:len(a)]
if __name__=="__main__":
import timeit
print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")
Results:
copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277 #winner
a[0:len(a)] 10.5431251526
It's surely the extra steps involved in a[0:len(a)]
are the reason for it's slowness.
Here's the byte code comparison of the two:
In [19]: dis.dis(func1)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a)
15 SLICE+0
16 STORE_FAST 1 (b)
19 LOAD_CONST 0 (None)
22 RETURN_VALUE
In [20]: dis.dis(func2)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a) #same up to here
15 LOAD_CONST 2 (0) #loads 0
18 LOAD_GLOBAL 1 (len) # loads the builtin len(),
# so it might take some lookup time
21 LOAD_FAST 0 (a)
24 CALL_FUNCTION 1
27 SLICE+3
28 STORE_FAST 1 (b)
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
How do I clone a list so that it doesn't change unexpectedly after assignment?
new_list = my_list
doesn't actually create a second list. The assignment just copies the reference to the list, not the actual list, so both new_list
and my_list
refer to the same list after the assignment.
To actually copy the list, you have several options:
You can use the builtin
list.copy()
method (available since Python 3.3):new_list = old_list.copy()
You can slice it:
new_list = old_list[:]
Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).
You can use the built in
list()
constructor:new_list = list(old_list)
You can use generic
copy.copy()
:import copy
new_list = copy.copy(old_list)This is a little slower than
list()
because it has to find out the datatype ofold_list
first.If you need to copy the elements of the list as well, use generic
copy.deepcopy()
:import copy
new_list = copy.deepcopy(old_list)Obviously the slowest and most memory-needing method, but sometimes unavoidable. This operates recursively; it will handle any number of levels of nested lists (or other containers).
Example:
import copy
class Foo(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Foo({self.val!r})'
foo = Foo(1)
a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)
# edit orignal list and instance
a.append('baz')
foo.val = 5
print(f'original: {a}\nlist.copy(): {b}\nslice: {c}\nlist(): {d}\ncopy: {e}\ndeepcopy: {f}')
Result:
original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]
How to deep copy a list?
E0_copy
is not a deep copy. You don't make a deep copy using list()
. (Both list(...)
and testList[:]
are shallow copies.)
You use copy.deepcopy(...)
for deep copying a list.
deepcopy(x, memo=None, _nil=[])
Deep copy operation on arbitrary Python objects.
See the following snippet -
>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = list(a)
>>> a
[[1, 2, 3], [4, 5, 6]]
>>> b
[[1, 2, 3], [4, 5, 6]]
>>> a[0][1] = 10
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b # b changes too -> Not a deepcopy.
[[1, 10, 3], [4, 5, 6]]
Now see the deepcopy
operation
>>> import copy
>>> b = copy.deepcopy(a)
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b
[[1, 10, 3], [4, 5, 6]]
>>> a[0][1] = 9
>>> a
[[1, 9, 3], [4, 5, 6]]
>>> b # b doesn't change -> Deep Copy
[[1, 10, 3], [4, 5, 6]]
To explain, list(...)
does not recursively make copies of the inner objects. It only makes a copy of the outermost list, while still referencing the same inner lists, hence, when you mutate the inner lists, the change is reflected in both the original list and the shallow copy. You can see that shallow copying references the inner lists by checking that id(a[0]) == id(b[0])
where b = list(a)
.
Efficiently finding duplicates in a list
The idea here is to do a single sweep in linear time. You can use a counter to do this. The idea is to count each element and then return all those which were counted multiple times.
from collections import Counter
def get_duplicates(array):
c = Counter(array)
return [k for k in c if c[k] > 1]
The approach above is linear in complexity, but makes two passes over the input - once, to count (this is abstracted away by the Counter
constructor) and the other is to return non-unique values in the list comp. You can optimise this a bit using a yield
statement, and return duplicates as you find them.
def get_duplicates(array):
c = Counter()
seen = set()
for i in array:
c[i] += 1
if c[i] > 1 and i not in seen:
seen.add(i)
yield i
This results in a compulsory if
check and extra space in the form of a set
, but you reduce two passes to one.
What is the best way to copy a list?
If you want a shallow copy (elements aren't copied) use:
lst2=lst1[:]
If you want to make a deep copy then use the copy module:
import copy
lst2=copy.deepcopy(lst1)
What is the most time efficient way to remove unordered duplicates in a 2D array?
Since you want to find unordered duplicates the best way to go is by typecasting. Typecast them as set
. Since set only contains immutable elements. So, I made a set of tuples
.
Note:
The best way to eliminate duplicates is by making aset
of the given elements.
>>> set(map(tuple,map(sorted,x)))
{(-3, -2, 4, 5), (-5, 0, 4, 5)}
How do I find the duplicates in a list and create another list with them?
To remove duplicates use set(a)
. To print duplicates, something like:
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print([item for item, count in collections.Counter(a).items() if count > 1])
## [1, 2, 5]
Note that Counter
is not particularly efficient (timings) and probably overkill here. set
will perform better. This code computes a list of unique elements in the source order:
seen = set()
uniq = []
for x in a:
if x not in seen:
uniq.append(x)
seen.add(x)
or, more concisely:
seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]
I don't recommend the latter style, because it is not obvious what not seen.add(x)
is doing (the set add()
method always returns None
, hence the need for not
).
To compute the list of duplicated elements without libraries:
seen = set()
dupes = []
for x in a:
if x in seen:
dupes.append(x)
else:
seen.add(x)
or, more concisely:
seen = set()
dupes = [x for x in a if x in seen or seen.add(x)]
If list elements are not hashable, you cannot use sets/dicts and have to resort to a quadratic time solution (compare each with each). For example:
a = [[1], [2], [3], [1], [5], [3]]
no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]
dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]
Efficient way to remove half of the duplicate items in a list
If order isn't important, a way would be to get the odd or even indexes only after a sort. Those lists will be the same so you only need one of them.
l = [1,8,8,8,1,3,3,8]
l.sort()
# Get all odd indexes
odd = l[1::2]
# Get all even indexes
even = l[::2]
print(odd)
print(odd == even)
Result:
[1, 3, 8, 8]
True
Related Topics
How to Link Pycharm with Pyspark
How to Return a Value from _Init_ in Python
Is Python's Sorted() Function Guaranteed to Be Stable
Python Giving Filenotfounderror for File Name Returned by Os.Listdir
How to Save a Dictionary to a File
What Does "While True" Mean in Python
Is This Bad Programming Practice in Tkinter
Alternative Way to Split a List into Groups of N
How to Add Sum to Zero Constraint to Glm in Python
How to Convert a Dictionary into a List of Tuples
Check If Any Alert Exists Using Selenium with Python
Python Sockets Error Typeerror: a Bytes-Like Object Is Required, Not 'Str' with Send Function
How to Refer to Relative Paths of Resources When Working with a Code Repository
Driving Excel from Python in Windows