Making a Python User-Defined Class Sortable, Hashable

Making a python user-defined class sortable, hashable

I almost posted this as a comment to the other answers but it's really an answer in and of itself.

To make your items sortable, they only need to implement __lt__. That's the only method used by the built in sort.

The other comparisons or functools.total_ordering are only needed if you actually want to use the comparison operators with your class.

To make your items hashable, you implement __hash__ as others noted. You should also implement __eq__ in a compatible way -- items that are equivalent should hash the same.

How to make an object properly hashable?

You also need to define __eq__() in a compatible way with __hash__() – otherwise, equality will be based on object identity.

On Python 2, it is recommended you also define __ne__ to make != consistent with ==. On Python 3, the default __ne__ implementation will delegate to __eq__ for you.

What is the default hash of user defined classes?

The relevant function appears to be:

Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}

(that code comes from here, and is then used to be the tp_hash slot in type here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id) directly. Indeed, the commit that introduced that change to the function is here, and states that the reason for the change is:

Issue #5186: Reduce hash collisions for objects with no hash
method by rotating the object pointer by 4 bits to the right.

which refers to this issue, which explains more why the change was made.

user-defined class: hash() and id() and doc

According to the documentation for __hash__()

User-defined classes have __cmp__() and __hash__() methods by default;
with them, all objects compare unequal (except with themselves) and
x.__hash__() returns a result derived from id(x).

It seems like the glossary is either outdated or inaccurate.


The documentation for python3's __hash__ is slightly different:

User-defined classes have __eq__() and __hash__() methods by default;
with them, all objects compare unequal (except with themselves) and
x.__hash__() returns an appropriate value such that x == y implies
both that x is y and hash(x) == hash(y).

So they even removed the fact that such value should depend on id.

What makes a user-defined class unhashable?

Simply setting the __hash__ method to that of the tuple class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:

class X2(list):
def __hash__(self):
return hash(tuple(self))

In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:

def __hash__(self):
return hash("foobar"*len(self))


Related Topics



Leave a reply



Submit