Best implementation for hashCode method for a collection
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a
int result
and assign a non-zero value.For every field
f
tested in theequals()
method, calculate a hash codec
by:- If the field f is a
boolean
:
calculate(f ? 0 : 1)
; - If the field f is a
byte
,char
,short
orint
: calculate(int)f
; - If the field f is a
long
: calculate(int)(f ^ (f >>> 32))
; - If the field f is a
float
: calculateFloat.floatToIntBits(f)
; - If the field f is a
double
: calculateDouble.doubleToLongBits(f)
and handle the return value like every long value; - If the field f is an object: Use the result of the
hashCode()
method or 0 iff == null
; - If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
- If the field f is a
Combine the hash value
c
withresult
:result = 37 * result + c
Return
result
This should result in a proper distribution of hash values for most use situations.
Good hashCode() Implementation
- The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the
hashCode
values therefore they are preferred. - You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the
hashCode
contract:
- Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCode
method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.- If two objects are equal according to the
equals(Object)
method, then calling thehashCode
method on each of the two objects must produce the same integer result.- It is not required that if two objects are unequal according to the
equals(java.lang.Object)
method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode
implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer
a, Integer
b and your hashCode()
just sums up these values then the distribution of the hashCode
values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode
values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap
numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals()
.
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}
hashCode() method
Suppose you have a class with two String fields, and that its hashcode is calculated by summing the hashcodes of those two fields. Further suppose you have an equals that simply checks whether the class fields values are equal.
class Test {
String a;
String b;
public Test
@Override
public int hashCode() {
return a.hashCode() + b.hashCode();
}
@Override
public boolean equals(Object o) { // simplified
Test other = (Test)o;
return a.equals(other.a) && b.equals(other.b);
}
}
Let's see if non-equal instances can have the same hashcode
Test t1 = new Test("hello", "world");
Test t2 = new Test("world", "hello");
System.out.println(t1.equals(t2)); // false
System.out.println(t1.hashCode() == t2.hashCode()); // true
Are we still respecting hashCode
's contract?
Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCode
method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
Well, yes, since it only depends on a
and b
and we're using their hashCode
method which we can assume respects the contract.
If two objects are equal according to the
equals(Object)
method, then calling thehashCode
method on each of the two objects must produce the same integer result.
It does
Test t1 = new Test("hello", "world");
Test t2 = new Test("hello", "world");
System.out.println(t1.equals(t2)); // true
System.out.println(t1.hashCode() == t2.hashCode()); // true
It is not required that if two objects are unequal according to the
equals(java.lang.Object)
method, then calling thehashCode
method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
That's what we were trying to demonstrate in the first place. It's not a requirement.
Default implementation for hashCode() and equals() for record vs class in Java
In a nutshell the difference is simple:
- the default implementation of
equals()
andhashCode()
forjava.lang.Object
will never consider two objects asequal
unless they are the same object (i.e. it's "object identity", i.e.x == y
). - the default implementation of
equals()
andhashCode()
for records will consider all components (or fields) and compare them for equality (or consider them for the hash code). If they all match then.equals()
will returntrue
andhashCode
will return the same values.
The details documented details for java.lang.Object
are:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (The hashCode may or may not be implemented as some function of an object's memory address at some point in time.)
In practice this means that any object which doesn't override hashCode
anywhere in its type hierarchy will return what is called the "identity hash code" which is effectively an arbitrary but constant number.
And for java.lang.Record
it says:
The implicitly provided implementation returns true if and only if the argument is an instance of the same record class as this record, and each component of this record is equal to the corresponding component of the argument; otherwise, false is returned. Equality of a component c is determined as follows:
- If the component is of a reference type, the component is considered equal if and only if Objects.equals(this.c, r.c) would return true.
- If the component is of a primitive type, using the corresponding primitive wrapper class PW (the corresponding wrapper class for int is java.lang.Integer, and so on), the component is considered equal if and only if PW.compare(this.c, r.c) would return 0.
Apart from the semantics described above, the precise algorithm used in the implicitly provided implementation is unspecified and is subject to change. The implementation may or may not use calls to the particular methods listed, and may or may not perform comparisons in the order of component declaration.
(Those are for the respective hashCode
methods, the equals
methods have similar language).
For more discussion, see JEP 395: Records.
Efficient hashCode() implementation
Multiplying by 31 is fast because the JIT can convert it to a shift left by 5 bits and a subtract:
x * 31 == (x << 5) - x
Without any particular extra information, I'd stick to this approach. It's reasonably fast and likely to end up with reasonably well-distributed hash codes, and it's also easy to get right :)
The size of the dataset doesn't really matter, but if you have particular extra information about the values you'll be work with (e.g. "it's always even") then you may be able to design a better hash function. I'd wait until it's an actual problem first though :)
Why should I have to call/initialize the hashCode() method for each member of a class?
The hashCode
method is used by collections like HashMap
and HashSet
to distribute your object's instances in a way that they will be fast to retrieve (with a time complexity of O(1)
). The better is your hashCode
method implemented, the more efficient would be those collections when used with your object's instances. A well implemented hashCode
method would limit as much as possible the probability to get hash collisions, in other words, it would limit the possibility to get the same hash code for 2 different object's instances.
Knowing that, if we remove the line #4 from your hashCode
's implementation, we would then have all the object's instances with the same age
and marks
that would have the same resulting hash code whatever the value of name
so the probability to get hash collisions is high which is not wanted as explained above.
So since hashCode
returns a 32
bits integer and name
is obviously a String
(at least not a primitive type), we call hashCode()
on it to get an int
representation of the String
in order to add somehow its value to the resulting hash code. Thanks to that, 2 object's instances with the same age
and marks
won't have necessarily the same hash code, the risk of getting hash collisions is limited which is what we expect.
hashCode implementation
You haven't overriden equals(Object)
, so they don't compare as equal.
Just because two objects have the same hash code doesn't mean HashMap
assumes they're the same -- indeed, if that was the case, that would be really, really bad.
If you want two A
objects to be treated as equal by HashMap
, you must override equals(Object)
in A
to define one A
as equal to another.
Related Topics
How to Measure Time Elapsed in Java
Initialization of an Arraylist in One Line
How to Format the Day of the Month to Say "11Th", "21St" or "23Rd" (Ordinal Indicator)
What Is the Equivalent of the C++ Pair≪L,R≫ in Java
How to Run a Java Program from the Command Line on Windows
Add Leading Zeroes to Number in Java
How to Generate Exceptions from Repaintmanager
How to Programmatically Determine Operating System in Java
Making a Robust, Resizable Swing Chess Gui
Difference Between Final and Effectively Final
What Are the Effects of Exceptions on Performance in Java
How to Convert Java.Util.Date to Java.Sql.Date
Get Os-Level System Information