[] and {} VS List() and Dict(), Which Is Better

[] and {} vs list() and dict(), which is better?

In terms of speed, it's no competition for empty lists/dicts:

>>> from timeit import timeit
>>> timeit("[]")
0.040084982867934334
>>> timeit("list()")
0.17704233359267718
>>> timeit("{}")
0.033620194745424214
>>> timeit("dict()")
0.1821558326547077

and for non-empty:

>>> timeit("[1,2,3]")
0.24316302770330367
>>> timeit("list((1,2,3))")
0.44744206316727286
>>> timeit("list(foo)", setup="foo=(1,2,3)")
0.446036018543964
>>> timeit("{'a':1, 'b':2, 'c':3}")
0.20868602015059423
>>> timeit("dict(a=1, b=2, c=3)")
0.47635635255323905
>>> timeit("dict(bar)", setup="bar=[('a', 1), ('b', 2), ('c', 3)]")
0.9028228448029267

Also, using the bracket notation lets you use list and dictionary comprehensions, which may be reason enough.

In Python, when to use a Dictionary, List or Set?

A list keeps order, dict and set don't: when you care about order, therefore, you must use list (if your choice of containers is limited to these three, of course ;-) ).

dict associates each key with a value, while list and set just contain values: very different use cases, obviously.

set requires items to be hashable, list doesn't: if you have non-hashable items, therefore, you cannot use set and must instead use list.

set forbids duplicates, list does not: also a crucial distinction. (A "multiset", which maps duplicates into a different count for items present more than once, can be found in collections.Counter -- you could build one as a dict, if for some weird reason you couldn't import collections, or, in pre-2.7 Python as a collections.defaultdict(int), using the items as keys and the associated value as the count).

Checking for membership of a value in a set (or dict, for keys) is blazingly fast (taking about a constant, short time), while in a list it takes time proportional to the list's length in the average and worst cases. So, if you have hashable items, don't care either way about order or duplicates, and want speedy membership checking, set is better than list.

Should I use dict or list?

You really want a set. Sets are faster than lists because they can only contain unique elements, which allows them to be implemented as hash tables. Hash tables allow membership testing (if element in my_set) in O(1) time. This contrasts with lists, where the only way to check if an element is in the list is to check every element of the list in turn (in O(n) time.)

A dict is similar to a set in that both allow unique keys only, and both are implemented as hash tables. They both allow O(1) membership testing. The difference is that a set only has keys, while a dict has both keys and values (which is extra overhead you don't need in this application.)


Using a set, and replacing the nested for loop with an itertools.chain() to flatten the 2D list to a 1D list:

import itertools
seen = set()
for author in itertools.chain(*authors):
seen.add(author)

Or shorter:

import itertools
seen = set( itertools.chain(*authors) )

Edit (thanks, @jamylak) more memory efficient for large lists:

import itertools
seen = set( itertools.chain.from_iterable(authors) )

Example on a list of lists:

>>> a = [[1,2],[1,2],[1,2],[3,4]]
>>> set ( itertools.chain(*a) )
set([1, 2, 3, 4])

P.S. : If, instead of finding all the unique authors, you want to count the number of times you see each author, use a collections.Counter, a special kind of dictionary optimised for counting things.

Here's an example of counting characters in a string:

>>> a = "DEADBEEF CAFEBABE"
>>> import collections
>>> collections.Counter(a)
Counter({'E': 5, 'A': 3, 'B': 3, 'D': 2, 'F': 2, ' ': 1, 'C': 1})

Difference between using [] and list() in Python

When you convert a dict object to a list, it only takes the keys.

However, if you surround it with square brackets, it keeps everything the same, it just makes it a list of dicts, with only one item in it.

>>> obj = {1: 2, 3: 4, 5: 6, 7: 8}
>>> list(obj)
[1, 3, 5, 7]
>>> [obj]
[{1: 2, 3: 4, 5: 6, 7: 8}]
>>>

This is because, when you loop over with a for loop, it only takes the keys as well:

>>> for k in obj:
... print k
...
1
3
5
7
>>>

But if you want to get the keys and the values, use .items():

>>> list(obj.items())
[(1, 2), (3, 4), (5, 6), (7, 8)]
>>>

Using a for loop:

>>> for k, v in obj.items():
... print k, v
...
1 2
3 4
5 6
7 8
>>>

However, when you type in list.__doc__, it gives you the same as [].__doc__:

>>> print list.__doc__
list() -> new list
list(sequence) -> new list initialized from sequence's items
>>>
>>> print [].__doc__
list() -> new list
list(sequence) -> new list initialized from sequence's items
>>>

Kind of misleading :)

Python Typing List[Dict] vs List[dict]

Since Python 3.9, the standard collections can be subscripted. The typing variants are now deprecated as a result:

tuple # typing.Tuple

list # typing.List

dict # typing.Dict

set # typing.Set

...

Importing those from typing is deprecated. Due to PEP 563 and the intention to minimize the runtime impact of typing, this deprecation will not generate DeprecationWarnings. Instead, type checkers may warn about such deprecated usage when the target version of the checked program is signalled to be Python 3.9 or newer. It's recommended to allow for those warnings to be silenced on a project-wide basis.

The deprecated functionality will be removed from the typing module in the first Python version released 5 years after the release of Python 3.9.0.

Python inserting to list vs dict

Update after new code posted and more back and forth between Mark and Antti:

I believe the reason why your original test case didn't show the problem is that, in your original test case, you only showed using len() for the dict case, where in your full example, you always use len().

After playing around with timing some more, I found that if I modify the list example below to match your full example more closely, e.g. coll.append(len(coll)) -- because you are calculating len() in both cases, then the dict case is faster. Your original small example was incorrectly attributing the cost of len() to the dict version, even though len() is called for both versions.

However, (assuming that you are executing this code in a function, and not at script level where all lookups are costly), I found that I could recover that time and make lists faster than dicts again by predefining a = coll.append outside the loop and using that instead.

If this is the issue that you are seeing and it is that time-critical, then you will probably want to move some other lookups outside of your loop, such as the ones for len and g.AddNode.

** Original answer below **

I cannot verify your claim that dicts are faster than lists in this use-case, so perhaps you changed something else. You didn't specify a version, so I mostly used Python 2, but used 3 to determine it was comparable in some instances. Here is the code:

def usedict(times, size):
cursor = list(range(size))
for iteration in range(times):
coll = {}
for i in cursor:
coll[len(coll)] = i

def uselist(times, size):
cursor = list(range(size))
for iteration in range(times):
coll = []
for i in cursor:
coll.append(i)

And here are some results:

$ python -m timeit "import listdict;listdict.uselist(10, 100000)"
10 loops, best of 3: 49.7 msec per loop
$ python -m timeit "import listdict;listdict.usedict(10, 100000)"
10 loops, best of 3: 103 msec per loop
$ python3 -m timeit "import listdict;listdict.uselist(10, 100000)"
10 loops, best of 3: 65.9 msec per loop
$ python3 -m timeit "import listdict;listdict.usedict(10, 100000)"
10 loops, best of 3: 131 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1, 10000000)"
10 loops, best of 3: 666 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1, 10000000)"
10 loops, best of 3: 3.66 sec per loop
$ python -m timeit "import listdict;listdict.uselist(100000, 10)"
10 loops, best of 3: 69.6 msec per loop
$ python -m timeit "import listdict;listdict.usedict(100000, 10)"
10 loops, best of 3: 82.1 msec per loop
$ python -m timeit "import listdict;listdict.uselist(100000, 4)"
10 loops, best of 3: 29.4 msec per loop
$ python -m timeit "import listdict;listdict.usedict(100000, 4)"
10 loops, best of 3: 34.4 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1000000, 1)"
10 loops, best of 3: 151 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1000000, 1)"
10 loops, best of 3: 148 msec per loop
$ python3 -m timeit "import listdict;listdict.uselist(1000000, 1)"
10 loops, best of 3: 177 msec per loop
$ python3 -m timeit "import listdict;listdict.usedict(1000000, 1)"
10 loops, best of 3: 213 msec per loop
$ python -m timeit "import listdict;listdict.uselist(1000000, 3)"
10 loops, best of 3: 245 msec per loop
$ python -m timeit "import listdict;listdict.usedict(1000000, 3)"
10 loops, best of 3: 272 msec per loop

For larger containers, lists were a clear winner -- 5 times faster with 10e6 items. The only place where dicts came out ahead was Python 2.7 with a single entry per container, and that was by a trivial amount.

This was on a 32 bit Ubuntu 14.04 system.

What's the difference between dict() and {}?

>>> def f():
... return {'a' : 1, 'b' : 2}
...
>>> def g():
... return dict(a=1, b=2)
...
>>> g()
{'a': 1, 'b': 2}
>>> f()
{'a': 1, 'b': 2}
>>> import dis
>>> dis.dis(f)
2 0 BUILD_MAP 0
3 DUP_TOP
4 LOAD_CONST 1 ('a')
7 LOAD_CONST 2 (1)
10 ROT_THREE
11 STORE_SUBSCR
12 DUP_TOP
13 LOAD_CONST 3 ('b')
16 LOAD_CONST 4 (2)
19 ROT_THREE
20 STORE_SUBSCR
21 RETURN_VALUE
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (dict)
3 LOAD_CONST 1 ('a')
6 LOAD_CONST 2 (1)
9 LOAD_CONST 3 ('b')
12 LOAD_CONST 4 (2)
15 CALL_FUNCTION 512
18 RETURN_VALUE

dict() is apparently some C built-in. A really smart or dedicated person (not me) could look at the interpreter source and tell you more. I just wanted to show off dis.dis. :)

Is a list or dictionary faster in Python?

The timeit module in the standard library is designed just to answer such questions! Forget the print (which would have the nasty side effect of spewing stuff to your terminal;-) and compare:

$ python -mtimeit 'tmp=[]; tmp.append(True); x=tmp[0]'
1000000 loops, best of 3: 0.716 usec per loop
$ python -mtimeit 'tmp={}; tmp[0]=True; x=tmp[0]'
1000000 loops, best of 3: 0.515 usec per loop

So, the dict is the winner -- by 0.2 microseconds...!-)



Related Topics



Leave a reply



Submit