How to Ensure Hashcode() Is Consistent with Equals()

How to ensure hashCode() is consistent with equals()?

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

For example, here's a hashCode function that Idea generates for your People class:

public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}

In Java, why must equals() and hashCode() be consistent?

Sure:

public class Test {
private final int m, n;

public Test(int m, int n) {
this.m = m;
this.n = n;
}

public int hashCode() { return n * m; }

public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}

with:

Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false

Technically that should be true because m == 3 in both cases.

In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.

Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:

int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive

Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.

So to find if a given key is in the map already:

  1. Calculate the masked hash code;
  2. Find the appropriate bucket;
  3. If it's empty, key not found;
  4. If is isn't empty, loop through all entries in the bucket checking equals().

Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".

You can make a couple of observations about this.

Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.

Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.

What are ways to keep hashCode/equals consistent with the business definition of the class?

A potential answer seems to be offered in this question.

I haven't looked into Project Lombok much, but I immediately thought, hmm annotations would work with a code generator.

Inconsistent hashcode and equals java

Your equals itself breaks the contract even before you get to hashCode because it isn't transitive.

This also immediately leads to the only consistent hashCode implementation being to return a constant, because for any two points there is a (very long) chain of intermediate points so that

  1. every two neighbors are equal, therefore

  2. every two neighbors must have the same hashCode, therefore

  3. beginning and end must have the same hashCode.

Now, this is a consistent implementation, but quite obviously a useless one.

How to ensure hashcode() does not resolve to same value in Java?

Typically, hash codes don't guarantee uniqueness. HashMap implementations typically deal with collisions by storing a list behind the scenes, but they include a check that ensures that you don't get everything in the list as a match, just the ones that really match.

In other words, if you do map.get("foo") and there are collisions, the hash map will check each result (unhashed) to see if it really matches "foo". Then it returns only exact matches.

Note also that while the contract for hashcodes states that any two objects that respond true to equals() should have the same hashcode, the opposite is not necessarily true.

Hashcode and equals

Equality is only determined by method equals(). And method hashCode() is used in other situations, like by Map or Set. It is somewhat like a pre-condition or hint before actually calling equals (for efficiency). So it is assumed that if 2 objects are equal (that is, equals() returns true), then their hashCodes() must return the same value.

So in your code, 2 objects are equal, as long as your overriden equals() returns true, no matter what hashCode() does. hashCode() is not called at all when comparing for equality.

This question has more in-depth information regarding to the relationship between equals() and hashCode().

The hashCode() method for the following case: two sets are equal when they have at least one element in common?

Before considering hashCode(), I think you should reconsider your design of equals() first. I don't think you can implement equals() and fulfill the required contract -- especially the one on transitivity

From: Java Doc of Object#equals()

The equals method implements an equivalence relation on non-null
object references:

  • It is reflexive: for any non-null reference value x, x.equals(x)
    should return true.

  • It is symmetric: for any non-null reference values
    x and y, x.equals(y) should return true if and only if y.equals(x)
    returns true.

  • It is transitive: for any non-null reference values x,
    y, and z, if x.equals(y) returns true and y.equals(z) returns true,
    then x.equals(z) should return true.

  • It is consistent: for any
    non-null reference values x and y, multiple invocations of x.equals(y)
    consistently return true or consistently return false, provided no
    information used in equals comparisons on the objects is modified.

  • For
    any non-null reference value x, x.equals(null) should return false.

Why? I can construct three objects of your MyTuple:

  • A = {x1, x2}
  • B = {x2, x3}
  • C = {x3, x4}

where x1, x2, x3, x4 are all distinct. Now I have

  • A.equals(B) returns true
  • B.equals(C) returns true
  • C.equals(A) returns false

And they violate the contract on transitivity.

I think you should consider using another relationship of your own (perhaps partialEquals()) so you don't have to obey the contract. But then
you also cannot use the method like equals() and expect MyTuple to work,
for example, in HashMap, HashSet, etc



Related Topics



Leave a reply



Submit