Making a python user-defined class sortable, hashable
I almost posted this as a comment to the other answers but it's really an answer in and of itself.
To make your items sortable, they only need to implement __lt__
. That's the only method used by the built in sort.
The other comparisons or functools.total_ordering
are only needed if you actually want to use the comparison operators with your class.
To make your items hashable, you implement __hash__
as others noted. You should also implement __eq__
in a compatible way -- items that are equivalent should hash the same.
How to make an object properly hashable?
You also need to define __eq__()
in a compatible way with __hash__()
– otherwise, equality will be based on object identity.
On Python 2, it is recommended you also define __ne__
to make !=
consistent with ==
. On Python 3, the default __ne__
implementation will delegate to __eq__
for you.
What is the default hash of user defined classes?
The relevant function appears to be:
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
(that code comes from here, and is then used to be the tp_hash
slot in type
here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id
) directly. Indeed, the commit that introduced that change to the function is here, and states that the reason for the change is:
Issue #5186: Reduce hash collisions for objects with no hash
method by rotating the object pointer by 4 bits to the right.
which refers to this issue, which explains more why the change was made.
user-defined class: hash() and id() and doc
According to the documentation for __hash__()
User-defined classes have
__cmp__()
and__hash__()
methods by default;
with them, all objects compare unequal (except with themselves) and
x.__hash__()
returns a result derived fromid(x)
.
It seems like the glossary is either outdated or inaccurate.
The documentation for python3's __hash__
is slightly different:
User-defined classes have
__eq__()
and__hash__()
methods by default;
with them, all objects compare unequal (except with themselves) and
x.__hash__()
returns an appropriate value such thatx == y
implies
both thatx is y
andhash(x) == hash(y)
.
So they even removed the fact that such value should depend on id
.
What makes a user-defined class unhashable?
Simply setting the __hash__
method to that of the tuple
class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:
class X2(list):
def __hash__(self):
return hash(tuple(self))
In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:
def __hash__(self):
return hash("foobar"*len(self))
Related Topics
Python Pandas Group by Date Using Datetime Data
Multiprocessing - Pipe VS Queue
How to Straighten a Rotated Rectangle Area of an Image Using Opencv in Python
How to Use Hex() Without 0X in Python
Fitting a Normal Distribution to 1D Data
How to Get Exception Message in Python Properly
Python Create Unix Timestamp Five Minutes in the Future
Passing a Data Frame Column and External List to Udf Under Withcolumn
Default Filter in Django Admin
Handling Urllib2's Timeout? - Python
Reading the Target of a .Lnk File in Python
Ambiguity in Pandas Dataframe/Numpy Array "Axis" Definition
Pandas Style Function to Highlight Specific Columns
Argparse with Required Subparser
Python Multiple Inheritance Passing Arguments to Constructors Using Super
Printing Tuple with String Formatting in Python
How to Check If an Object Is a List or Tuple (But Not String)