What's a correct and good way to implement __hash__()?
An easy, correct way to implement __hash__()
is to use a key tuple. It won't be as fast as a specialized hash, but if you need that then you should probably implement the type in C.
Here's an example of using a key for hash and equality:
class A:
def __key(self):
return (self.attr_a, self.attr_b, self.attr_c)
def __hash__(self):
return hash(self.__key())
def __eq__(self, other):
if isinstance(other, A):
return self.__key() == other.__key()
return NotImplemented
Also, the documentation of __hash__
has more information, that may be valuable in some particular circumstances.
How to implement a good __hash__ function in python
__hash__
should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.
A trivial implementation would be to just return 0
. This is always correct, but performs badly.
Your solution, returning the hash of a tuple of properties, is good. But note that you don't need to list all properties that you compare in __eq__
in the tuple. If some property usually has the same value for inequal objects, just leave it out. Don't make the hash computation any more expensive than it needs to be.
Edit: I would recommend against using xor to mix hashes in general. When two different properties have the same value, they will have the same hash, and with xor these will cancel eachother out. Tuples use a more complex calculation to mix hashes, see tuplehash
in tupleobject.c
.
What is the default __hash__ in python?
What you can rely on: custom objects have a default hash()
that is based in some way on the identity of the object. i.e. any object using the default hash will have a constant value for that hash over its lifetime and different objects may or may not have a different hash value.
You cannot rely on any particular relationship between the value returned by id()
and the value returned by hash()
. In the standard C implementation of Python 2.6 and earlier they were the same, in Python 2.7-3.2 hash(x)==id(x)/16
.
Edit: originally I wrote that in releases 3.2.3 and later or 2.7.3 or later the hash value may be randomised and in Python 3.3 the relationship will always be randomised. In fact that randomisation at present only applies to hashing strings so in fact the divide by 16 relationship may continue to hold for now, but don't bank on it.
Hash collisions don't usually matter: in a dictionary lookup to find an object it must have the same hash and must also compare equal. Collisions only matter if you get a very high proportion of collisions such as in the denial of service attack that led to recent versions of Python being able to randomise the hash calculation.
Recommended way to implement __eq__ and __hash__
Answering my own question. It seems one way of performing this is to define an auxillary __members
function and to use that in defining __hash__
and __eq__
. This way, there is no duplication:
class MyClass(object):
def __init__(self, a, b):
self.a = a
self.b = b
def __members(self):
return (self.a, self.b)
def __eq__(self, other):
if type(other) is type(self):
return self.__members() == other.__members()
else:
return False
def __hash__(self):
return hash(self.__members())
Best implementation for hashCode method for a collection
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a
int result
and assign a non-zero value.For every field
f
tested in theequals()
method, calculate a hash codec
by:- If the field f is a
boolean
:
calculate(f ? 0 : 1)
; - If the field f is a
byte
,char
,short
orint
: calculate(int)f
; - If the field f is a
long
: calculate(int)(f ^ (f >>> 32))
; - If the field f is a
float
: calculateFloat.floatToIntBits(f)
; - If the field f is a
double
: calculateDouble.doubleToLongBits(f)
and handle the return value like every long value; - If the field f is an object: Use the result of the
hashCode()
method or 0 iff == null
; - If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
- If the field f is a
Combine the hash value
c
withresult
:result = 37 * result + c
Return
result
This should result in a proper distribution of hash values for most use situations.
Is the hash of a GUID unique?
Not as reliably unique as the GUID itself, no.
Just to expand, you are reducing your uniqueness by a factor of 4, going from 16 bytes to 4 bytes of possible combinations.
As pointed out in the comments the hash size will make a difference. The 4 byte thing was an assumption, horrible at best I know, that it may be used in .NET, where the default hash size is 4 bytes (int). So you can replace what I said above with whatever byte size your hash may be.
Built in Python hash() function
Use hashlib as hash()
was designed to be used to:
quickly compare dictionary keys during a dictionary lookup
and therefore does not guarantee that it will be the same across Python implementations.
Related Topics
Error: Command 'Gcc' Failed with Exit Status 1 While Installing Eventlet
Combine Several Images Horizontally with Python
Python Date String to Date Object
Does Python Support Multithreading? Can It Speed Up Execution Time
Importerror: No Module Named Pil
Replace Values in List Using Python
Importing Variables from Another File
Python Setup.Py Develop VS Install
What Is the Purpose of Class Methods
Passing Functions with Arguments to Another Function in Python
How to Add Custom Methods/Attributes to Built-In Python Types
Is There a Built in Package to Parse HTML into Dom