What Is Hashcode Used For? Is It Unique

What is hashCode used for? Is it unique?

MSDN says:

A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.

The GetHashCode method is suitable for use in hashing algorithms and
data structures such as a hash table.

The default implementation of the GetHashCode method does not
guarantee unique return values for different objects. Furthermore, the
.NET Framework does not guarantee the default implementation of the
GetHashCode method, and the value it returns will be the same between
different versions of the .NET Framework. Consequently, the default
implementation of this method must not be used as a unique object
identifier for hashing purposes.

The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.

Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the
System.Collections.IEqualityComparer interface.

Basically, hash codes exist to make hashtables possible.

Two equal objects are guaranteed to have equal hashcodes.

Two unequal objects are not guaranteed to have unequal hashcodes (that's called a collision).

Are hashCodes unique for Strings?

A HashMap does use equals() to compare keys. It only uses hashCode() to find the bucket where the key is located, and thus drastically reduce the number of keys to compare with equals().

Obviously, hashCode() can't produce unique values, since int is limited to 2^32 distinct values, and there are an infinity of possible String values.

In conclusion, the result of hashCode() is not a good fit for a key of a Map.

Uniqueness of hashcode for Java's HashSet Double and its subsets

No. Object.hashCode() method implementations (e.g. HashSet.hashCode()) are not guaranteed to return a unique value, only a pseudo random value which is good enough for purpose of hash data structures.

If you nee a unique value for HashSet<Double> based on the set content you should implement it yourself.

C# - String.GetHashCode() - don't use as unique identifier

You're right that equal hash codes don't guarantee equal values.

You're wrong in thinking that that quote means otherwise.

This quote is specifically in the context of implementing a hash code calculation for a Person class containing an SSN property. Equal SSN values mean equal Person values. Different SSN values mean different Person values. (Note: this is not necessarily true in reality.)

Now, you need a hash code calculation for Person which guarantees that two equal Person instances have the same hash code, and ideally which makes it likely that two unequal Person instances have a different hash code, although the latter can never be guaranteed. Since equality is defined in terms of the SSN, that means re-using the SSN's hash code achieves that already.

hashCode uniqueness

Given a reasonable collection of objects, having two with the same hash code is quite likely. In the best case it becomes the birthday problem, with a clash with tens of thousands of objects. In practice objects a created with a relatively small pool of likely hash codes, and clashes can easily happen with merely thousands of objects.

Using memory address is just a way of obtaining a slightly random number. The Sun JDK source has a switch to enable use of a Secure Random Number Generator or a constant. I believe IBM (used to?) use a fast random number generator, but it was not at all secure. The mention in the docs of memory address appears to be of a historical nature (around a decade ago it was not unusual to have object handles with fixed locations).

Here's some code I wrote a few years ago to demonstrate clashes:

class HashClash {
public static void main(String[] args) {
final Object obj = new Object();
final int target = obj.hashCode();
Object clash;
long ct = 0;
do {
clash = new Object();
++ct;
} while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
if (clash.hashCode() == target) {
System.out.println(ct+": "+obj+" - "+clash);
} else {
System.out.println("No clashes found");
}
}
}

RFE to clarify docs, because this comes up way too frequently: CR 6321873

Why hashcode does not generate unique hashcode?

Because it can't.

Since there are only 2^32 different ints and there may be more than 2^32 live objects in any VM instance, it is technically impossible to guarantee a unique hash code for each object.

Even if the default hash code may be based on the internal address of the object, it is not identical to the internal address.

What is the purpose of hashcode method in java?

You must always override equals and hashCode in tandem, to satisfy their interdependent contracts. A class which implements them contradictorily is simply broken and unacceptable under even the minimum software engineering standards.

As to why one would ever use the hashtable data structure: because it is the fastest option around for random-access key-value storage.



Related Topics



Leave a reply



Submit