Use cases for the 'setdefault' dict method
You could say defaultdict
is useful for settings defaults before filling the dict and setdefault
is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby
)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict
doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
python dict: get vs setdefault
Your two examples do the same thing, but that doesn't mean get
and setdefault
do.
The difference between the two is basically manually setting d[key]
to point to the list every time, versus setdefault
automatically setting d[key]
to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault
is around 10% faster than get
for this purpose.
The get
method allows you to do less than you can with setdefault
. You can use it to avoid getting a KeyError
when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault
concludes that most of the time, you want to use a defaultdict
. The thread about get
concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).
How is setdefault() method working in this invert dictionary implementation?
Print statements are a very useful and easy way to understand what's happening in a program:
def invert_dict(d):
inverse = {}
for key in d:
new_key = d[key]
print('key:', key)
print('new_key:', new_key)
print('inverse before:', inverse)
value = inverse.setdefault(new_key, [])
print('inverse in the middle:', inverse)
print('value before:', value)
value.append(key)
print('value after:', value)
print('inverse after:', inverse)
return inverse
letters_in_word = {"mine": 4, "yours": 5, "ours": 4, "sunday": 6, "friend": 6, "fun": 3, "happy": 5, "beautiful": 8}
print(invert_dict(letters_in_word))
Output:
key: beautiful
new_key: 8
inverse before: {}
inverse in the middle: {8: []}
value before: []
value after: ['beautiful']
inverse after: {8: ['beautiful']}
key: yours
new_key: 5
inverse before: {8: ['beautiful']}
inverse in the middle: {8: ['beautiful'], 5: []}
value before: []
value after: ['yours']
inverse after: {8: ['beautiful'], 5: ['yours']}
key: ours
new_key: 4
inverse before: {8: ['beautiful'], 5: ['yours']}
inverse in the middle: {8: ['beautiful'], 4: [], 5: ['yours']}
value before: []
value after: ['ours']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours']}
key: sunday
new_key: 6
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours']}
inverse in the middle: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: []}
value before: []
value after: ['sunday']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
key: happy
new_key: 5
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
value before: ['yours']
value after: ['yours', 'happy']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
key: fun
new_key: 3
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: [], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: []
value after: ['fun']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
key: mine
new_key: 4
inverse before: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: ['ours']
value after: ['ours', 'mine']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
key: friend
new_key: 6
inverse before: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: ['sunday']
value after: ['sunday', 'friend']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday', 'friend']}
{8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday', 'friend']}
Also very useful is a good debugger such as the one in PyCharm. Try that out.
Python dict.setdefault uses more memory?
Your original code every time round the loop will create a list that mostly then just gets thrown away. It also makes multiple dictionary lookups (looking up the method setdefault
is a dictionary lookup and then the method itself does a dictionary lookup to see whether the object was set and if it isn't does another to store the value). .name
and .append()
are also dictionary lookups but they are still present in the revised code.
for element in iterable:
values.setdefault(element.name, []).append(element)
The revised code only looks up the dictionary when the name changes, so it it removes two dictionary lookups and a method call from every loop. That's why it's faster.
As for the memory use, when the list grows it may sometimes have to copy the data but can avoid that if the memory block can just be expanded. My guess would be that creating all of those unused temporary lists may be fragmenting the memory more and forcing more copies. In other words Python isn't actually using more memory, but it may have more allocated but unused memory.
When you feel a need for setdefault
consider using collections.defaultdict
instead. That avoids creating the list except when it's needed:
from collections import defaultdict
values = defaultdict(list)
for element in iterable:
values[element.name].append(element)
That will probably still be slower than your second code because it doesn't take advantage of your knowledge that names are all grouped, but for the general case it is better than setdefault
.
Another way would be to use itertools.groupby
. Something like this:
from itertools import groupby
from operator import attrgetter
values = { name: list(elements) for name,elements in
groupby(elements, attrgetter('name')) }
That takes advantage of the ordering and simplifies everything down to a single dictionary comprehension.
python3: understand usage of `setdefault` dictionary method
It works as you understood it - as to why a list-value is used instead of directly using an integer - it does not work using a "pure" integer and setdefault(..)
.
Reason: The setdefault()
returns the value of your dictionary for this key. If you return a list you get a reference to that list. If you modify the list, the change reflects inside the dictionary (because: reference). If you use a integer you get it returned, but modifying it does not change the value that is assigned to the key inside the dict.
# list representing sequence of states
states = ['a','b','c','d','a','a','a','b','c','b','b','b']
# matrix of transitions
M = {}
for i in range(len(states)-1):
M.setdefault((states[i], states[i+1]), [0])[0] += 1
print(M)
Output:
{('a', 'b'): 2, ('b', 'c'): 2, ('c', 'd'): 1, ('d', 'a'): 1,
('a', 'a'): 2, ('c', 'b'): 1, ('b', 'b'): 2}
If you want to not use a list containing a single counter integer, you could do:
for i in range(len(states)-1):
# does not work, error: M.setdefault((states[i], states[i+1]), 0) += 1
M.setdefault((states[i], states[i+1]), 0)
M[(states[i], states[i+1])] += 1
print(M)
Output:
{('a', 'b'): 2, ('b', 'c'): 2, ('c', 'd'): 1, ('d', 'a'): 1,
('a', 'a'): 2, ('c', 'b'): 1, ('b', 'b'): 2}
but that takes two lines - you cannot directly assign to a integer.
I personally would probably do:
# list representing sequence of states
states = ['a','b','c','d','a','a','a','b','c','b','b','b']
# matrix of transitions
from collections import defaultdict
M = defaultdict(int)
for a, b in zip(states,states[1:]):
M[(a,b)] += 1
print(dict(M)) # or use M directly - its str-output is not that nice though
which should be more performant then using a base dictionaries setdefault
.
Replace collections' defaultdict by a normal dict with setdefault
collections.defaultdict
is generally more performant, it is optimised exactly for this task and C-implemented. However, you should use dict.setdefault
if you want accessing an absent key in your resulting dictionary to result in a KeyError
rather than inserting an empty list. This is the most important practical difference.
Related Topics
How to Check a String for Specific Characters
How to Check If There Are Duplicates in a Flat List
Label Python Data Points on Plot
How to Get First Element in a List of Tuples
Python Sharing a Lock Between Processes
Cmd Opens Windows Store When I Type 'Python'
How to Decode Base64 Data in Python
Matplotlib Scatter Plot Legend
Run Python Script Without Windows Console Appearing
Matplotlib: Save Plot to Numpy Array
What Is the '@=' Symbol for in Python
How to Add Both File and JSON Body in a Fastapi Post Request