[] and {} vs list() and dict(), which is better?
In terms of speed, it's no competition for empty lists/dicts:
>>> from timeit import timeit
>>> timeit("[]")
0.040084982867934334
>>> timeit("list()")
0.17704233359267718
>>> timeit("{}")
0.033620194745424214
>>> timeit("dict()")
0.1821558326547077
and for non-empty:
>>> timeit("[1,2,3]")
0.24316302770330367
>>> timeit("list((1,2,3))")
0.44744206316727286
>>> timeit("list(foo)", setup="foo=(1,2,3)")
0.446036018543964
>>> timeit("{'a':1, 'b':2, 'c':3}")
0.20868602015059423
>>> timeit("dict(a=1, b=2, c=3)")
0.47635635255323905
>>> timeit("dict(bar)", setup="bar=[('a', 1), ('b', 2), ('c', 3)]")
0.9028228448029267
Also, using the bracket notation lets you use list and dictionary comprehensions, which may be reason enough.
In Python, when to use a Dictionary, List or Set?
A list
keeps order, dict
and set
don't: when you care about order, therefore, you must use list
(if your choice of containers is limited to these three, of course ;-) ).
dict
associates each key with a value, while list
and set
just contain values: very different use cases, obviously.
set
requires items to be hashable, list
doesn't: if you have non-hashable items, therefore, you cannot use set
and must instead use list
.
set
forbids duplicates, list
does not: also a crucial distinction. (A "multiset", which maps duplicates into a different count for items present more than once, can be found in collections.Counter
-- you could build one as a dict
, if for some weird reason you couldn't import collections
, or, in pre-2.7 Python as a collections.defaultdict(int)
, using the items as keys and the associated value as the count).
Checking for membership of a value in a set
(or dict
, for keys) is blazingly fast (taking about a constant, short time), while in a list it takes time proportional to the list's length in the average and worst cases. So, if you have hashable items, don't care either way about order or duplicates, and want speedy membership checking, set
is better than list
.
Should I use dict or list?
You really want a set
. Sets are faster than lists because they can only contain unique elements, which allows them to be implemented as hash tables. Hash tables allow membership testing (if element in my_set
) in O(1)
time. This contrasts with lists, where the only way to check if an element is in the list is to check every element of the list in turn (in O(n)
time.)
A dict
is similar to a set
in that both allow unique keys only, and both are implemented as hash tables. They both allow O(1)
membership testing. The difference is that a set
only has keys, while a dict
has both keys and values (which is extra overhead you don't need in this application.)
Using a set
, and replacing the nested for loop with an itertools.chain()
to flatten the 2D list to a 1D list:
import itertools
seen = set()
for author in itertools.chain(*authors):
seen.add(author)
Or shorter:
import itertools
seen = set( itertools.chain(*authors) )
Edit (thanks, @jamylak) more memory efficient for large lists:
import itertools
seen = set( itertools.chain.from_iterable(authors) )
Example on a list of lists:
>>> a = [[1,2],[1,2],[1,2],[3,4]]
>>> set ( itertools.chain(*a) )
set([1, 2, 3, 4])
P.S. : If, instead of finding all the unique authors, you want to count the number of times you see each author, use a collections.Counter
, a special kind of dictionary optimised for counting things.
Here's an example of counting characters in a string:
>>> a = "DEADBEEF CAFEBABE"
>>> import collections
>>> collections.Counter(a)
Counter({'E': 5, 'A': 3, 'B': 3, 'D': 2, 'F': 2, ' ': 1, 'C': 1})
Difference between using [] and list() in Python
When you convert a dict
object to a list, it only takes the keys.
However, if you surround it with square brackets, it keeps everything the same, it just makes it a list of dict
s, with only one item in it.
>>> obj = {1: 2, 3: 4, 5: 6, 7: 8}
>>> list(obj)
[1, 3, 5, 7]
>>> [obj]
[{1: 2, 3: 4, 5: 6, 7: 8}]
>>>
This is because, when you loop over with a for
loop, it only takes the keys as well:
>>> for k in obj:
... print k
...
1
3
5
7
>>>
But if you want to get the keys and the values, use .items()
:
>>> list(obj.items())
[(1, 2), (3, 4), (5, 6), (7, 8)]
>>>
Using a for
loop:
>>> for k, v in obj.items():
... print k, v
...
1 2
3 4
5 6
7 8
>>>
However, when you type in list.__doc__
, it gives you the same as [].__doc__
:
>>> print list.__doc__
list() -> new list
list(sequence) -> new list initialized from sequence's items
>>>
>>> print [].__doc__
list() -> new list
list(sequence) -> new list initialized from sequence's items
>>>
Kind of misleading :)
Python Typing List[Dict] vs List[dict]
Since Python 3.9, the standard collections can be subscripted. The typing
variants are now deprecated as a result:
tuple # typing.Tuple
list # typing.List
dict # typing.Dict
set # typing.Set
...
Importing those from
typing
is deprecated. Due to PEP 563 and the intention to minimize the runtime impact of typing, this deprecation will not generateDeprecationWarning
s. Instead, type checkers may warn about such deprecated usage when the target version of the checked program is signalled to be Python 3.9 or newer. It's recommended to allow for those warnings to be silenced on a project-wide basis.The deprecated functionality will be removed from the
typing
module in the first Python version released 5 years after the release of Python 3.9.0.
Python inserting to list vs dict
Update after new code posted and more back and forth between Mark and Antti:
I believe the reason why your original test case didn't show the problem is that, in your original test case, you only showed using len()
for the dict case, where in your full example, you always use len()
.
After playing around with timing some more, I found that if I modify the list example below to match your full example more closely, e.g. coll.append(len(coll))
-- because you are calculating len() in both cases, then the dict case is faster. Your original small example was incorrectly attributing the cost of len() to the dict version, even though len() is called for both versions.
However, (assuming that you are executing this code in a function, and not at script level where all lookups are costly), I found that I could recover that time and make lists faster than dicts again by predefining a = coll.append
outside the loop and using that instead.
If this is the issue that you are seeing and it is that time-critical, then you will probably want to move some other lookups outside of your loop, such as the ones for len
and g.AddNode
.
** Original answer below **
I cannot verify your claim that dicts are faster than lists in this use-case, so perhaps you changed something else. You didn't specify a version, so I mostly used Python 2, but used 3 to determine it was comparable in some instances. Here is the code:
def usedict(times, size):
cursor = list(range(size))
for iteration in range(times):
coll = {}
for i in cursor:
coll[len(coll)] = i
def uselist(times, size):
cursor = list(range(size))
for iteration in range(times):
coll = []
for i in cursor:
coll.append(i)
And here are some results:
$ python -m timeit "import listdict;listdict.uselist(10, 100000)"
10 loops, best of 3: 49.7 msec per loop
$ python -m timeit "import listdict;listdict.usedict(10, 100000)"
10 loops, best of 3: 103 msec per loop
$ python3 -m timeit "import listdict;listdict.uselist(10, 100000)"
10 loops, best of 3: 65.9 msec per loop
$ python3 -m timeit "import listdict;listdict.usedict(10, 100000)"
10 loops, best of 3: 131 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1, 10000000)"
10 loops, best of 3: 666 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1, 10000000)"
10 loops, best of 3: 3.66 sec per loop
$ python -m timeit "import listdict;listdict.uselist(100000, 10)"
10 loops, best of 3: 69.6 msec per loop
$ python -m timeit "import listdict;listdict.usedict(100000, 10)"
10 loops, best of 3: 82.1 msec per loop
$ python -m timeit "import listdict;listdict.uselist(100000, 4)"
10 loops, best of 3: 29.4 msec per loop
$ python -m timeit "import listdict;listdict.usedict(100000, 4)"
10 loops, best of 3: 34.4 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1000000, 1)"
10 loops, best of 3: 151 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1000000, 1)"
10 loops, best of 3: 148 msec per loop
$ python3 -m timeit "import listdict;listdict.uselist(1000000, 1)"
10 loops, best of 3: 177 msec per loop
$ python3 -m timeit "import listdict;listdict.usedict(1000000, 1)"
10 loops, best of 3: 213 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1000000, 3)"
10 loops, best of 3: 245 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1000000, 3)"
10 loops, best of 3: 272 msec per loop
For larger containers, lists were a clear winner -- 5 times faster with 10e6 items. The only place where dicts came out ahead was Python 2.7 with a single entry per container, and that was by a trivial amount.
This was on a 32 bit Ubuntu 14.04 system.
What's the difference between dict() and {}?
>>> def f():
... return {'a' : 1, 'b' : 2}
...
>>> def g():
... return dict(a=1, b=2)
...
>>> g()
{'a': 1, 'b': 2}
>>> f()
{'a': 1, 'b': 2}
>>> import dis
>>> dis.dis(f)
2 0 BUILD_MAP 0
3 DUP_TOP
4 LOAD_CONST 1 ('a')
7 LOAD_CONST 2 (1)
10 ROT_THREE
11 STORE_SUBSCR
12 DUP_TOP
13 LOAD_CONST 3 ('b')
16 LOAD_CONST 4 (2)
19 ROT_THREE
20 STORE_SUBSCR
21 RETURN_VALUE
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (dict)
3 LOAD_CONST 1 ('a')
6 LOAD_CONST 2 (1)
9 LOAD_CONST 3 ('b')
12 LOAD_CONST 4 (2)
15 CALL_FUNCTION 512
18 RETURN_VALUE
dict() is apparently some C built-in. A really smart or dedicated person (not me) could look at the interpreter source and tell you more. I just wanted to show off dis.dis. :)
Is a list or dictionary faster in Python?
The timeit
module in the standard library is designed just to answer such questions! Forget the print
(which would have the nasty side effect of spewing stuff to your terminal;-) and compare:
$ python -mtimeit 'tmp=[]; tmp.append(True); x=tmp[0]'
1000000 loops, best of 3: 0.716 usec per loop
$ python -mtimeit 'tmp={}; tmp[0]=True; x=tmp[0]'
1000000 loops, best of 3: 0.515 usec per loop
So, the dict is the winner -- by 0.2 microseconds...!-)
Related Topics
How to Call Python Code from C Code
Reload Flask App When Template File Changes
Python Matplotlib Figure Title Overlaps Axes Label When Using Twiny
Pygame 2 Dimensional Movement of an Enemy Towards the Player, How to Calculate X and Y Velocity
Paging/Scrolling Through Set of 2D Heat Maps in Matplotlib
Finding Index of Nearest Point in Numpy Arrays of X and Y Coordinates
How to Find Tags with Only Certain Attributes - Beautifulsoup
Reading Data from a CSV File in Python
How to Create a Slug in Django
Python Module to Change System Date and Time
Install MySQL-Python (Windows)
Cannot Use Geometry Manager Pack Inside
How to Get Ftp File's Modify Time Using Python Ftplib
Pandas: Convert Dtype 'Object' to Int
If Two Variables Point to the Same Object, Why Doesn't Reassigning One Variable Affect the Other