What Does "Hashable" Mean in Python

What does hashable mean in Python?

From the Python glossary:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

Hashable, immutable

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change. If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

To really grasp the idea you should try to implement your own hashtable in a language like C/C++, or read the Java implementation of the HashMap class.

In Python, why is a tuple hashable but not a list?

Dicts and other objects use hashes to store and retrieve items really quickly. The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally. The basic idea is that when you create a dictionary with {key: value}, Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.

Immutable objects, or objects that can't be altered, are hashable. They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently. Objects that fall into this category include strings, tuples, integers and so on. You may think, "But I can change a string! I just go mystr = mystr + 'foo'," but in fact what this does is create a new string instance and assigns it to mystr. It doesn't modify the existing instance. Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.

You can try this for yourself: hash("mystring"), hash(('foo', 'bar')), hash(1)

Mutable objects, or objects that can be modified, aren't hashable. A list can be modified in-place: mylist.append('bar') or mylist.pop(0). You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it. You'll find that list, set, and other mutable types don't have a __hash__() method. Because of this, you can't use mutable objects as dictionary keys:

>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Eric Duminil's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys

Why and how are Python functions hashable?

It's nothing special. As you can see if you examine the unbound __hash__ method of the function type:

>>> def f(): pass
...
>>> type(f).__hash__
<slot wrapper '__hash__' of 'object' objects>

the of 'object' objects part means it just inherits the default identity-based __hash__ from object. Function == and hash work by identity. The difference between id and hash is normal for any type that inherits object.__hash__:

>>> x = object()
>>> id(x)
40145072L
>>> hash(x)
2509067

You might think __hash__ is only supposed to be defined for immutable objects, and you'd be almost right, but that's missing a key detail. __hash__ should only be defined for objects where everything involved in == comparisons is immutable. For objects whose == is based on identity, it's completely standard to base hash on identity as well, since even if the objects are mutable, they can't possibly be mutable in a way that would change their identity. Files, modules, and other mutable objects with identity-based == all behave this way.

What does Tuple[Hashable] mean in Python?

A tuple is only hashable if the values in the tuple are hashable themselves.

>>> hash((1,2))
-3550055125485641917
>>> hash(([1],2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

How to judge an object is hashable or unhashable in python?

Your function is correct. Here are some general guidelines:

  • Basic built-in types that are not containers are hashable.
  • Immutable containers (like frozenset and tuple) are hashable if (and only if) all the items they contain are also hashable.
  • User-defined classes are usually hashable.
  • Mutable built-in containers are unhashable.

See also the Python glossary.

What the difference between hash-able and immutable?

  1. Immutable means that the object, the top-level container for the item, cannot be changed. Note that this applies only to the top level; it may contain references to sub-objects that are mutable.

  2. Hashable has a functional definition: Python's built-in hash function returns a value. This generally means that the object's closure (following all references to their leaf-node values) consists of immutable objects.

  3. Your premise is incorrect: a tuple can contain mutable items. Any such reference renders the tuple un-hashable.

For instance:

>>> b = [7]
>>> a = (b, 5)
>>> a
([7], 5)
>>> type(a)
<class 'tuple'>
>>> set(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> b[0] = 'b'
>>> a
(['b'], 5)

b is a mutable list. The reference to b can be held in the tuple a. We can change the value of b[0] (not it's object handle), and the display value of a changes. However, we cannot make a set including a, because the mutability of b renders a unhashable.

Continuing the example:

>>> b = False
>>> a
(['b'], 5)
>>> b = [14]
>>> a
(['b'], 5)

a is immutable. Thus, when we change b, only b gets the reference to the new object. a retains the original object handle, still pointing to ['b'].

Asking is hashable about a Python value

Since Python 2.6 you can use the abstract base class collections.Hashable:

>>> import collections
>>> isinstance({}, collections.Hashable)
False
>>> isinstance(0, collections.Hashable)
True

This approach is also mentioned briefly in the documentation for __hash__.

Doing so means that not only will instances of the class raise an appropriate TypeError when a program attempts to retrieve their hash value, but they will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable) (unlike classes which define their own __hash__() to explicitly raise TypeError).

Why aren't Python sets hashable?

Generally, only immutable objects are hashable in Python. The immutable variant of set() -- frozenset() -- is hashable.



Related Topics



Leave a reply



Submit