Use Cases for the 'Setdefault' Dict Method

Use cases for the 'setdefault' dict method

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.

Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)

# really verbose
new = {}
for (key, value) in data:
    if key in new:
        new[key].append( value )
    else:
        new[key] = [value]

# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # key might exist already
    group.append( value )

# even simpler with defaultdict 
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append( value ) # all keys have a default already

Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:

headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
    headers.setdefault( headername, defaultvalue )

python dict: get vs setdefault

Your two examples do the same thing, but that doesn't mean get and setdefault do.

The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.

Making the two methods as similar as possible, I ran

from timeit import timeit

print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)

and got

0.794723378711
0.811882272256
0.724429205999
0.722129751973

So setdefault is around 10% faster than get for this purpose.

The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.

See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.

The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).

How is setdefault() method working in this invert dictionary implementation?

Print statements are a very useful and easy way to understand what's happening in a program:

def invert_dict(d):
    inverse = {}
    for key in d:
        new_key = d[key]
        print('key:', key)
        print('new_key:', new_key)
        print('inverse before:', inverse)
        value = inverse.setdefault(new_key, [])
        print('inverse in the middle:', inverse)
        print('value before:', value)
        value.append(key)
        print('value after:', value)
        print('inverse after:', inverse)
    return inverse

letters_in_word = {"mine": 4, "yours": 5, "ours": 4, "sunday": 6, "friend": 6, "fun": 3, "happy": 5, "beautiful": 8}

print(invert_dict(letters_in_word))

Output:

key: beautiful
new_key: 8
inverse before: {}
inverse in the middle: {8: []}
value before: []
value after: ['beautiful']
inverse after: {8: ['beautiful']}
key: yours
new_key: 5
inverse before: {8: ['beautiful']}
inverse in the middle: {8: ['beautiful'], 5: []}
value before: []
value after: ['yours']
inverse after: {8: ['beautiful'], 5: ['yours']}
key: ours
new_key: 4
inverse before: {8: ['beautiful'], 5: ['yours']}
inverse in the middle: {8: ['beautiful'], 4: [], 5: ['yours']}
value before: []
value after: ['ours']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours']}
key: sunday
new_key: 6
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours']}
inverse in the middle: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: []}
value before: []
value after: ['sunday']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
key: happy
new_key: 5
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 4: ['ours'], 5: ['yours'], 6: ['sunday']}
value before: ['yours']
value after: ['yours', 'happy']
inverse after: {8: ['beautiful'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
key: fun
new_key: 3
inverse before: {8: ['beautiful'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: [], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: []
value after: ['fun']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
key: mine
new_key: 4
inverse before: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: ['fun'], 4: ['ours'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: ['ours']
value after: ['ours', 'mine']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
key: friend
new_key: 6
inverse before: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
inverse in the middle: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday']}
value before: ['sunday']
value after: ['sunday', 'friend']
inverse after: {8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday', 'friend']}
{8: ['beautiful'], 3: ['fun'], 4: ['ours', 'mine'], 5: ['yours', 'happy'], 6: ['sunday', 'friend']}

Also very useful is a good debugger such as the one in PyCharm. Try that out.

Python dict.setdefault uses more memory?

Your original code every time round the loop will create a list that mostly then just gets thrown away. It also makes multiple dictionary lookups (looking up the method setdefault is a dictionary lookup and then the method itself does a dictionary lookup to see whether the object was set and if it isn't does another to store the value). .name and .append() are also dictionary lookups but they are still present in the revised code.

for element in iterable:
    values.setdefault(element.name, []).append(element)

The revised code only looks up the dictionary when the name changes, so it it removes two dictionary lookups and a method call from every loop. That's why it's faster.

As for the memory use, when the list grows it may sometimes have to copy the data but can avoid that if the memory block can just be expanded. My guess would be that creating all of those unused temporary lists may be fragmenting the memory more and forcing more copies. In other words Python isn't actually using more memory, but it may have more allocated but unused memory.

When you feel a need for setdefault consider using collections.defaultdict instead. That avoids creating the list except when it's needed:

from collections import defaultdict
values = defaultdict(list)
for element in iterable:
    values[element.name].append(element)

That will probably still be slower than your second code because it doesn't take advantage of your knowledge that names are all grouped, but for the general case it is better than setdefault.

Another way would be to use itertools.groupby. Something like this:

from itertools import groupby
from operator import attrgetter
values = { name: list(elements) for name,elements in
    groupby(elements, attrgetter('name')) }

That takes advantage of the ordering and simplifies everything down to a single dictionary comprehension.

python3: understand usage of `setdefault` dictionary method

It works as you understood it - as to why a list-value is used instead of directly using an integer - it does not work using a "pure" integer and setdefault(..).

Reason: The setdefault() returns the value of your dictionary for this key. If you return a list you get a reference to that list. If you modify the list, the change reflects inside the dictionary (because: reference). If you use a integer you get it returned, but modifying it does not change the value that is assigned to the key inside the dict.

# list representing sequence of states
states = ['a','b','c','d','a','a','a','b','c','b','b','b']

# matrix of transitions
M = {}

for i in range(len(states)-1):
    M.setdefault((states[i], states[i+1]), [0])[0] += 1

print(M)

Output:

{('a', 'b'): 2, ('b', 'c'): 2, ('c', 'd'): 1, ('d', 'a'): 1, 
 ('a', 'a'): 2, ('c', 'b'): 1, ('b', 'b'): 2}

If you want to not use a list containing a single counter integer, you could do:

for i in range(len(states)-1):

    # does not work, error:   M.setdefault((states[i], states[i+1]), 0) += 1 
    M.setdefault((states[i], states[i+1]), 0)  
    M[(states[i], states[i+1])] += 1
print(M)

Output:

{('a', 'b'): 2, ('b', 'c'): 2, ('c', 'd'): 1, ('d', 'a'): 1, 
 ('a', 'a'): 2, ('c', 'b'): 1, ('b', 'b'): 2}

but that takes two lines - you cannot directly assign to a integer.

I personally would probably do:

# list representing sequence of states
states = ['a','b','c','d','a','a','a','b','c','b','b','b']

# matrix of transitions 
from collections import defaultdict
M = defaultdict(int)

for a, b in zip(states,states[1:]):
    M[(a,b)] += 1 

print(dict(M))  # or use M directly - its str-output is not that nice though

which should be more performant then using a base dictionaries setdefault.

Replace collections' defaultdict by a normal dict with setdefault

collections.defaultdict is generally more performant, it is optimised exactly for this task and C-implemented. However, you should use dict.setdefault if you want accessing an absent key in your resulting dictionary to result in a KeyError rather than inserting an empty list. This is the most important practical difference.

Use Cases for the 'Setdefault' Dict Method