What Hashing Function Does Java Use to Implement Hashtable Class

What hashing function does Java use to implement Hashtable class?

When a key is added to or requested from a HashMap in OpenJDK, the flow of execution is the following:

  1. The key is transformed into a 32-bit value using the developer-defined hashCode() method.
  2. The 32-bit value is then transformed by a second hash function (of which Andrew's answer contains the source code) into an offset inside the hash table. This second hash function is provided by the implementation of HashMap and cannot be overridden by the developer.
  3. The corresponding entry of the hash table contains a reference to a linked list or null, if the key does not yet exist in the hash table. If there are collisions (several keys with the same offset), the keys together with their values are simply collected in a singly linked list.

If the hash table size was chosen appropriately high, the number of collisions will be limited. Thus, a single lookup takes only constant time on average. This is called expected constant time. However, if an attacker has control over the keys inserted into a hash table and knowledge of the hash algorithm in use, he can provoke a lot of hash collisions and therefore force linear lookup time. This is why some hash table implementations have been changed recently to include a random element that makes it harder for an attacker to predict which keys will cause collisions.

Some ASCII art

key.hashCode()
|
| 32-bit value
| hash table
V +------------+ +----------------------+
HashMap.hash() --+ | reference | -> | key1 | value1 | null |
| |------------| +----------------------+
| modulo size | null |
| = offset |------------| +---------------------+
+--------------> | reference | -> | key2 | value2 | ref |
|------------| +---------------------+
| .... | |
+----------------+
V
+----------------------+
| key3 | value3 | null |
+----------------------+

Hashing function in Hashtable vs. HashMap?

In particular, is there a difference between the hashing algorithm they use? What is the formula used to hash in these two classes?

The primary hash function used when you use an object as a hash table key is the object's hashCode() method. It is up the to the key class to implement a decent hash function.

The Hashtable and HashMap classes take the key's hashcode value and convert it to an index in the primary hashtable array-of-chains. However, there are differences in how this happens between Hashtable and HashMap.

  • For Hashtable (Java 8) the code is this:

     hash = key.hashCode();
    index = (hash & 0x7FFFFFFF) % tab.length;
  • For HashMap (Java 8) the code is (effectively) this:

     // (I have restructured the code for ease of comparison.)
    int h;
    hash = (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    index = (tab.length - 1) & hash;

As you can see, HashMap is scrambling the hashcode value returned by the key's hashcode function. This is explained in the source code as follows:

[This method] computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

Notes:

  1. The & versus % difference is because in Hashtable the hash array size is a prime number, but in HashMap (Java 8) the size is a power of 2.

  2. In Java 8 HashMap, the implementation will turn a long hash chain into a binary tree if the key class implements Comparable.

  3. HashMap handles null keys, but Hashtable doesn't.


However, all of this extra complexity in HashMap only comes into play if your key class has a poorly designed / implemented hashCode() method ... or if someone is deliberately trying to engineer hash collisions.

In other words, if your key class is well designed, the differences should not matter.

What hashing function does Java use by default and can we override the default behavior?

Each object in java has a public int hashCode() method that returns a hash. Each object is free to implement it in its own way by overriding that method. If the method is not overriden, the default Object#hashCode method is used.

You can have look at the source code of various objects to see how it is implemented in the JDK. This is String's hashCode for example (line 1494).

Some collections can add an additional layer of hashing on top of the objects' hashCode methods. For example, HashMap does that to improve performance when an object's hashCode is not well distributed.

Does java have a builtin class that can be used to implement a hashtable

I found a nice documentation about those classes:

https://www.tutorialspoint.com/java/java_hashtable_class.htm
https://www.tutorialspoint.com/java/util/java_util_hashtable.htm

As it describes it as:

Hashtable was part of the original java.util and is a concrete implementation of a Dictionary.

However, Java 2 re-engineered Hashtable so that it also implements the Map interface. Thus, Hashtable is now integrated into the collections framework. It is similar to HashMap, but is synchronized.

Hashing function used in Java Language

Java allows you to override the hashCode() method for your Classes to use a hashing algorithm that is not only well suited to your application, but to your individual types:

public class Employee {

private int id;
// Default implementation might want to use "name" for as part of hashCode
private String name;

@Override
public int hashCode() {
// We know that ID is always unique, so don't use name in calculating
// the hash code.
return id;
}
}

How does Java implement hash tables?

HashMap and HashSet are very similar. In fact, the second contains an instance of the first.

A HashMap contains an array of buckets in order to contain its entries. Array size is always powers of 2. If you don't specify another value, initially there are 16 buckets.

When you put an entry (key and value) in it, it decides the bucket where the entry will be inserted calculating it from its key's hashcode (hashcode is not its memory address, and the the hash is not a modulus). Different entries can collide in the same bucket, so they'll be put in a list.

Entries will be inserted until they reach the load factor. This factor is 0.75 by default, and is not recommended to change it if you are not very sure of what you're doing. 0.75 as load factor means that a HashMap of 16 buckets can only contain 12 entries (16*0.75). Then, an array of buckets will be created, doubling the size of the previous. All entries will be put again in the new array. This process is known as rehashing, and can be expensive.

Therefore, a best practice, if you know how many entries will be inserted, is to construct a HashMap specifying its final size:

new HashMap(finalSize);

Hash function for creating a generic hash table in Java (for learning purposes)

In my hashtable implementation I decided to use plain hashCode() % backingArraySize (i. e. yours suggestion) when the algorithm isn't a subject of primary clustering and hashCode() * 2654435761 (the constant is taken from this answer) when it is, i. e. for linear hashing implementation. The reason is that many default hashCode() implementations don't distribute values across full int range well (all numberic boxed types, String, List), and when the keys are somehow biased linear hashing may suffer from primary clustering.

Why Java doesn't use ArrayList class to implement Hashtable/HashMap class?

When a hashtable is resized, all of the entries need to be re-positioned.

Using an ArrayList would therefore be slower, since the ArrayList would copy over the now-useless old values before the HashTable re-calculates them all.



Related Topics



Leave a reply



Submit