Hashmap Java 8 Implementation

HashMap Java 8 implementation

HashMap contains a certain number of buckets. It uses hashCode to determine which bucket to put these into. For simplicity's sake imagine it as a modulus.

If our hashcode is 123456 and we have 4 buckets, 123456 % 4 = 0 so the item goes in the first bucket, Bucket 1.

HashMap

If our hashCode function is good, it should provide an even distribution so that all the buckets will be used somewhat equally. In this case, the bucket uses a linked list to store the values.

Linked Buckets

But you can't rely on people to implement good hash functions. People will often write poor hash functions which will result in a non-even distribution. It's also possible that we could just get unlucky with our inputs.

Bad hashmap

The less even this distribution is, the further we're moving from O(1) operations and the closer we're moving towards O(n) operations.

The implementation of HashMap tries to mitigate this by organising some buckets into trees rather than linked lists if the buckets become too large. This is what TREEIFY_THRESHOLD = 8 is for. If a bucket contains more than eight items, it should become a tree.

Tree Bucket

This tree is a Red-Black tree, presumably chosen because it offers some worst-case guarantees. It is first sorted by hash code. If the hash codes are the same, it uses the compareTo method of Comparable if the objects implement that interface, else the identity hash code.

If entries are removed from the map, the number of entries in the bucket might reduce such that this tree structure is no longer necessary. That's what the UNTREEIFY_THRESHOLD = 6 is for. If the number of elements in a bucket drops below six, we might as well go back to using a linked list.

Finally, there is the MIN_TREEIFY_CAPACITY = 64.

When a hash map grows in size, it automatically resizes itself to have more buckets. If we have a small HashMap, the likelihood of us getting very full buckets is quite high, because we don't have that many different buckets to put stuff into. It's much better to have a bigger HashMap, with more buckets that are less full. This constant basically says not to start making buckets into trees if our HashMap is very small - it should resize to be larger first instead.


To answer your question about the performance gain, these optimisations were added to improve the worst case. You would probably only see a noticeable performance improvement because of these optimisations if your hashCode function was not very good.

It is designed to protect against bad hashCode implementations and also provides basic protection against collision attacks, where a bad actor may attempt to slow down a system by deliberately selecting inputs which occupy the same buckets.

Java 8 Hashmap Internals

But when I see the source code, new node is getting added in tail. Is that correct?

It is added to the head, in older versions. However, many changes were made in Java 8 which does.

class A {
static class SameHash {
final int n;

SameHash(int n) {
this.n = n;
}

@Override
public int hashCode() {
return 1;
}

@Override
public String toString() {
return "SameHash{" +
"n=" + n +
'}';
}
}

public static void main(String[] args) {
HashSet<SameHash> set = new HashSet<>();
for (int i = 1; i <= 4; i++)
set.add(new SameHash(i));
System.out.println(set);
}
}

prints

[SameHash{n=1}, SameHash{n=2}, SameHash{n=3}, SameHash{n=4}]

NOTE: Keys can have different hashCodes but can end up in the same bucket.

I didn't completely understand this variable MIN_TREEIFY_CAPACITY. Is it like after this much count, entire map will be converted to tree(from array to tree)?

After this count, the bucket is converted to a tree provided the key is Comparable

Which tree type used in java 8 HashMap bucket?

Looking at the implementation, it looks like a binary tree. More specifically, the comment below suggests it's a red-black tree :

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
...
}

As for handling equal hash codes, this is addressed in the Javadoc :

Tree bins (i.e., bins whose elements are all TreeNodes) are
ordered primarily by hashCode, but in the case of ties, if two
elements are of the same "class C implements Comparable<C>",
type then their compareTo method is used for ordering.

The TreeNodes used in HashMap are said to be structured similarly to those of TreeMap.

You can see the implementation of the search for a TreeNode containing the required key here :

    /**
* Finds the node starting at root p with the given hash and key.
* The kc argument caches comparableClassFor(key) upon first use
* comparing keys.
*/
final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
TreeNode<K,V> p = this;
do {
int ph, dir; K pk;
TreeNode<K,V> pl = p.left, pr = p.right, q;
if ((ph = p.hash) > h)
p = pl;
else if (ph < h)
p = pr;
else if ((pk = p.key) == k || (k != null && k.equals(pk)))
return p;
else if (pl == null)
p = pr;
else if (pr == null)
p = pl;
else if ((kc != null ||
(kc = comparableClassFor(k)) != null) &&
(dir = compareComparables(kc, k, pk)) != 0)
p = (dir < 0) ? pl : pr;
else if ((q = pr.find(h, k, kc)) != null)
return q;
else
p = pl;
} while (p != null);
return null;
}

As you can see, this is similar to the standard binary search tree search. First they search for a TreeNode having the same hashCode as the searched key (since a single bucket of a HashMap may contain entries having different hashCodes). Then it proceeds until a TreeNode having a key equal to the required key is found. The secondary search uses compareTo if the class of the key implements Comparable. Otherwise, a more exhaustive search is performed.

Is there a scenario where Java7's Hashmap implementation is preferred to Java8's implementation

Java8 will use balanced tree in number of entries in the bucket in > N, where N is chosen empirically, and use list once again if that number is < K. I'd expect worse performance if number of entries in bucket changes in way that "treefyng / untreeifying" happens often. That might happen due to specific hash function.

Also I'm not sure if overhead for creating and querying tree is worth the profit for small N.



Related Topics



Leave a reply



Submit