Why Dict.Get(Key) Instead of Dict[Key]

Why dict.get(key) instead of dict[key]?

It allows you to provide a default value if the key is missing:

dictionary.get("bogus", default_value)

returns default_value (whatever you choose it to be), whereas

dictionary["bogus"]

would raise a KeyError.

If omitted, default_value is None, such that

dictionary.get("bogus")  # <-- No default specified -- defaults to None

returns None just like

dictionary.get("bogus", None)

would.

Why is key in dict() faster than dict.get(key) in Python3?

Using the dis.dis method from the linked question:

>>> import dis
>>> dis.dis(compile('d.get(key)', '', 'eval'))
1 0 LOAD_NAME 0 (d)
2 LOAD_METHOD 1 (get)
4 LOAD_NAME 2 (key)
6 CALL_METHOD 1
8 RETURN_VALUE
>>> dis.dis(compile('key in d', '', 'eval'))
1 0 LOAD_NAME 0 (key)
2 LOAD_NAME 1 (d)
4 COMPARE_OP 6 (in)
6 RETURN_VALUE

we can clearly see that d.get(key) has to run one more step: the LOAD_METHOD step. Additionally, d.get must deal with more information: it has to:

  1. check for the presence
  2. if it was found, return the value
  3. otherwise, return the specified default value (or None if no default was specified).

Also, from looking at the C code for in and the C code for .get, we can see that they are very similar.

int                                                           static PyObject * 
PyDict_Contains(PyObject *op, PyObject *key) dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{ {
Py_hash_t hash; PyObject *val = NULL;
Py_ssize_t ix; Py_hash_t hash;
PyDictObject *mp = (PyDictObject *)op; Py_ssize_t ix;
PyObject *value;

if (!PyUnicode_CheckExact(key) || if (!PyUnicode_CheckExact(key) ||
(hash = ((PyASCIIObject *) key)->hash) == -1) { (hash = ((PyASCIIObject *) key)->hash) == -1) {
hash = PyObject_Hash(key); hash = PyObject_Hash(key);
if (hash == -1) if (hash == -1)
return -1; return NULL;
} }
ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value); ix = (self->ma_keys->dk_lookup) (self, key, hash, &val);
if (ix == DKIX_ERROR) if (ix == DKIX_ERROR)
return -1; return NULL;
return (ix != DKIX_EMPTY && value != NULL); if (ix == DKIX_EMPTY || val == NULL) {
} val = default_value;
}
Py_INCREF(val);
return val;
}

In fact, they are almost the same, but .get has more overhead and must return a value.

However, it seems that d in key will use a faster method if the hash is known, while d.get recalculates the hash every time. Additionally, CALL_METHOD and LOAD_METHOD have much higher overhead than COMPARE_OP, which performs one of the built-in boolean operations. Note that COMPARE_OP will simply jump to here.

Why did dict.get(key) work but not dict[key]?

The problem is mutability:

one_groups = dict.fromkeys(range(5), []) - this passes the same list as value to all keys. So if you change one value, you change them all.

It's basically the same as saying:

tmp = []
one_groups = dict.fromkeys(range(5), tmp)
del tmp

If you want to use a new list, you need to do it in a loop - either an explicit for loop or in a dict comprehension:

one_groups = {key: [] for key in range(5)}

This thing will "execute" [] (which equals to list()) for every key, thus making the values with different lists.


Why does get work? Because you explicitly take the current list, but + makes a new result list. And it doesn't matter whether it's one_groups[x.count('1')] = one_groups.get(x.count('1')) + [x] or one_groups[x.count('1')] = one_groups[x.count('1')] + [x] - what matters is that there's +.

I know how everybody says a+=b is just a=a+b, but the implementation may be different for optimisation - in case of lists, += is just .extend because we know we want our result in the current variable, so creating new list would be waste of memory.

Python dictionaries - difference between dict.get(key) and dict.get(key, {})

The second parameter to dict.get is optional: it's what's returned if the key isn't found. If you don't supply it, it will return None.

So:

>>> d = {'a':1, 'b':2}
>>> d.get('c')
None
>>> d.get('c', {})
{}

Why do `key in dict` and `key in dict.keys()` have the same output?

To understand why key in dct returns the same result as key in dct.keys() one needs to look in the past. Historically in Python 2, one would test the existence of a key in dictionary dct with dct.has_key(key). This was changed for Python 2.2, when the preferred way became key in dct, which basically did the same thing:

In a minor related change, the in operator now works on dictionaries, so key in dict is now equivalent to dict.has_key(key)

The behaviour of in is implemented internally in terms of the __contains__ dunder method. Its behaviour is documented in the Python language reference - 3 Data Model:

object.__contains__(self, item)

Called to implement membership test operators. Should return true if item is in self, false otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs.
For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.

(emphasis mine; dictionaries in Python are mapping objects)

In Python 3, the has_key method was removed altogether and now there the correct way to test for the existence of a key is solely key in dict, as documented.


In contrast with the 2 above, key in dct.keys() has never been the correct way of testing whether a key exists in a dictionary.
The result of both your examples is indeed the same, however key in dct.keys() is slightly slower on Python 3 and is abysmally slow on Python 2.

key in dct returns true, if the key is found as a key in the dct in almost constant time operation - it does not matter whether there are two or a million keys - its time complexity is constant on average case (O(1))

dct.keys() in Python 2 creates a list of all keys; and in Python 3 a view of keys; both of these objects understand the key in x. With Python 2 it works like for any iterable; the values are iterated over and True is returned as soon as one value is equal to the given value (here key).

In practice, in Python 2 you'd find key in dct.keys() much slower than key in dict (key in dct.keys() scales linearly with the number of keys - its time complexity is O(n) - both dct.keys(), which builds a list of all keys, and key in key_list are O(n))

In Python 3, the key in dct.keys() won't be much slower than key in dctas the view does not make a list of the keys, and the access still would be O(1), however in practice it would be slower by at least a constant value, and it is 7 more characters, so there is usually practically no reason to use it, even if on Python 3.

dict.get(key, default) vs dict.get(key) or default

There is a huge difference if your value is false-y:

>>> d = {'foo': 0}
>>> d.get('foo', 'bar')
0
>>> d.get('foo') or 'bar'
'bar'

You should not use or default if your values can be false-y.

On top of that, using or adds additional bytecode; a test and jump has to be performed. Just use dict.get(), there is no advantage to using or default here.

Why does dict.get(key) run slower than dict[key]

Python has to do more work for dict.get():

  • get is an attribute, so Python has to look this up, and then bind the descriptor found to the dictionary instance.
  • () is a call, so the current frame has to be pushed on the stack, a call has to be made, then the frame has to be popped again from the stack to continue.

The [...] notation, used with a dict, doesn't require a separate attribute step or frame push and pop.

You can see the difference when you use the Python bytecode disassembler dis:

>>> import dis
>>> dis.dis(compile('d[key]', '', 'eval'))
1 0 LOAD_NAME 0 (d)
3 LOAD_NAME 1 (key)
6 BINARY_SUBSCR
7 RETURN_VALUE
>>> dis.dis(compile('d.get(key)', '', 'eval'))
1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 CALL_FUNCTION 1
12 RETURN_VALUE

so the d[key] expression only has to execute a BINARY_SUBSCR opcode, while d.get(key) adds a LOAD_ATTR opcode. CALL_FUNCTION is a lot more expensive than BINARY_SUBSCR on a built-in type (custom types with __getitem__ methods still end up doing a function call).

If the majority of your keys exist in the dictionary, you could use try...except KeyError to handle missing keys:

try:
return mydict['name']
except KeyError:
return None

Exception handling is cheap if there are no exceptions.

Why use dict.keys?

On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:

>>> for sharedkey in dct1.keys() & dct2.keys():  # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])

In Python 2.7, you'd use dct.viewkeys() for that.

In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.

You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.

Python dict.get() or None scenario

You don't need or None at all. dict.get returns None by default when it can't find the provided key in the dictionary.

It's a good idea to consult the documentation in these cases:

get(key[, default])

Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.



Related Topics



Leave a reply



Submit