Best Implementation For Hashcode Method For a Collection

Best implementation for hashCode method for a collection

The best implementation? That is a hard question because it depends on the usage pattern.

A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.

A short version

  1. Create a int result and assign a non-zero value.

  2. For every field f tested in the equals() method, calculate a hash code c by:

    • If the field f is a boolean:
      calculate (f ? 0 : 1);
    • If the field f is a byte, char, short or int: calculate (int)f;
    • If the field f is a long: calculate (int)(f ^ (f >>> 32));
    • If the field f is a float: calculate Float.floatToIntBits(f);
    • If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
    • If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
    • If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
  3. Combine the hash value c with result:

    result = 37 * result + c
  4. Return result

This should result in a proper distribution of hash values for most use situations.

Good hashCode() Implementation

  1. The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
  2. You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().

Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:

@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}

hashCode() method

Suppose you have a class with two String fields, and that its hashcode is calculated by summing the hashcodes of those two fields. Further suppose you have an equals that simply checks whether the class fields values are equal.

class Test {
String a;
String b;

public Test

@Override
public int hashCode() {
return a.hashCode() + b.hashCode();
}


@Override
public boolean equals(Object o) { // simplified
Test other = (Test)o;
return a.equals(other.a) && b.equals(other.b);
}
}

Let's see if non-equal instances can have the same hashcode

Test t1 = new Test("hello", "world");
Test t2 = new Test("world", "hello");

System.out.println(t1.equals(t2)); // false
System.out.println(t1.hashCode() == t2.hashCode()); // true

Are we still respecting hashCode's contract?

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

Well, yes, since it only depends on a and b and we're using their hashCode method which we can assume respects the contract.

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

It does

Test t1 = new Test("hello", "world");
Test t2 = new Test("hello", "world");

System.out.println(t1.equals(t2)); // true
System.out.println(t1.hashCode() == t2.hashCode()); // true

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

That's what we were trying to demonstrate in the first place. It's not a requirement.

Default implementation for hashCode() and equals() for record vs class in Java

In a nutshell the difference is simple:

  • the default implementation of equals() and hashCode() for java.lang.Object will never consider two objects as equal unless they are the same object (i.e. it's "object identity", i.e. x == y).
  • the default implementation of equals() and hashCode() for records will consider all components (or fields) and compare them for equality (or consider them for the hash code). If they all match then .equals() will return true and hashCode will return the same values.

The details documented details for java.lang.Object are:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (The hashCode may or may not be implemented as some function of an object's memory address at some point in time.)

In practice this means that any object which doesn't override hashCode anywhere in its type hierarchy will return what is called the "identity hash code" which is effectively an arbitrary but constant number.

And for java.lang.Record it says:

The implicitly provided implementation returns true if and only if the argument is an instance of the same record class as this record, and each component of this record is equal to the corresponding component of the argument; otherwise, false is returned. Equality of a component c is determined as follows:

  • If the component is of a reference type, the component is considered equal if and only if Objects.equals(this.c, r.c) would return true.
  • If the component is of a primitive type, using the corresponding primitive wrapper class PW (the corresponding wrapper class for int is java.lang.Integer, and so on), the component is considered equal if and only if PW.compare(this.c, r.c) would return 0.

Apart from the semantics described above, the precise algorithm used in the implicitly provided implementation is unspecified and is subject to change. The implementation may or may not use calls to the particular methods listed, and may or may not perform comparisons in the order of component declaration.

(Those are for the respective hashCode methods, the equals methods have similar language).

For more discussion, see JEP 395: Records.

Efficient hashCode() implementation

Multiplying by 31 is fast because the JIT can convert it to a shift left by 5 bits and a subtract:

x * 31 == (x << 5) - x

Without any particular extra information, I'd stick to this approach. It's reasonably fast and likely to end up with reasonably well-distributed hash codes, and it's also easy to get right :)

The size of the dataset doesn't really matter, but if you have particular extra information about the values you'll be work with (e.g. "it's always even") then you may be able to design a better hash function. I'd wait until it's an actual problem first though :)

Why should I have to call/initialize the hashCode() method for each member of a class?

The hashCode method is used by collections like HashMap and HashSet to distribute your object's instances in a way that they will be fast to retrieve (with a time complexity of O(1)). The better is your hashCode method implemented, the more efficient would be those collections when used with your object's instances. A well implemented hashCode method would limit as much as possible the probability to get hash collisions, in other words, it would limit the possibility to get the same hash code for 2 different object's instances.

Knowing that, if we remove the line #4 from your hashCode's implementation, we would then have all the object's instances with the same age and marks that would have the same resulting hash code whatever the value of name so the probability to get hash collisions is high which is not wanted as explained above.

So since hashCode returns a 32 bits integer and name is obviously a String (at least not a primitive type), we call hashCode() on it to get an int representation of the String in order to add somehow its value to the resulting hash code. Thanks to that, 2 object's instances with the same age and marks won't have necessarily the same hash code, the risk of getting hash collisions is limited which is what we expect.

hashCode implementation

You haven't overriden equals(Object), so they don't compare as equal.

Just because two objects have the same hash code doesn't mean HashMap assumes they're the same -- indeed, if that was the case, that would be really, really bad.

If you want two A objects to be treated as equal by HashMap, you must override equals(Object) in A to define one A as equal to another.



Related Topics



Leave a reply



Submit