Is a Hashmap Thread-Safe for Different Keys

Is a HashMap thread-safe for different keys?

In @dotsid's answer he says this:

If you change a HashMap in any way then your code is simply broken.

He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.

  • If one thread does a put, then another thread may see a stale value for the hashmap's size.

  • If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.

  • When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.

  • When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.

  • When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.

And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.

I can think of three solutions:

  1. Use a ConcurrentHashMap.
  2. Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
  3. Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.

1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.

Thread-safety of unique keys in a HashMap

The answer is simple: HashMap makes absolutely no thread-safety guarantees at all.

In fact it's explicitly documented that it's not thread-safe:

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

So accessing one from multiple threads without any kind of synchronization is a recipe for disaster.

I have seen cases where each thread uses a different key cause issue (like iterations happening at the same time resulting in infinite loops).

Just think of re-hashing: when the threshold is reached, the internal bucket-array needs to be resized. That's a somewhat lengthy operation (compared to a single put). During that time all manner of weird things can happen if another thread tries to put as well (and maybe even triggers a second re-hashing!).

Additionally, there's no reliable way for you to proof that your specific use case is safe, since all tests you could run could just "accidentally" work. In other words: you can never depend on this working, even if you thin k you covered it with unit tests.

And since not everyone is convinced, you can easily test it yourself with this code:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

class HashMapDemonstration {

public static void main(String[] args) throws InterruptedException {
int threadCount = 10;
int valuesPerThread = 1000;
Map<Integer, Integer> map = new HashMap<>();
List<Thread> threads = new ArrayList<>(threadCount);
for (int i = 0; i < threadCount; i++) {
Thread thread = new Thread(new MyUpdater(map, i*valuesPerThread, (i+1)*valuesPerThread - 1));
thread.start();
threads.add(thread);
}
for (Thread thread : threads) {
thread.join();
}
System.out.printf("%d threads with %d values per thread with a %s produced %d entries, should be %d%n",
threadCount, valuesPerThread, map.getClass().getName(), map.size(), threadCount * valuesPerThread);
}
}

class MyUpdater implements Runnable {
private final Map<Integer, Integer> map;
private final int startValue;
private final int endValue;

MyUpdater(Map<Integer, Integer> map, int startValue, int endValue) {
this.map = map;
this.startValue = startValue;
this.endValue = endValue;
System.out.printf("Creating updater for values %d to %d%n", startValue, endValue);
}

@Override
public void run() {
for (int i = startValue; i<= endValue; i++) {
map.put(i, i);
}
}
}

This is exactly the type of program OP mentioned: Each thread will only ever write to keys that no other thread ever touches. And still, the resulting Map will not contain all entries:

Creating updater for values 0 to 999
Creating updater for values 1000 to 1999
Creating updater for values 2000 to 2999
Creating updater for values 3000 to 3999
Creating updater for values 4000 to 4999
Creating updater for values 5000 to 5999
Creating updater for values 6000 to 6999
Creating updater for values 7000 to 7999
Creating updater for values 8000 to 8999
Creating updater for values 9000 to 9999
10 threads with 1000 values per thread with a java.util.HashMap produced 9968 entries, should be 10000

Note that the actual number of entries in the final Map will vary for each run. It even sometimes prints 10000 (because it's not thread-safe!).

Note that this failure mode (losing entries) is definitely not the only possible one: basically anything could happen.

Is it possible to use HashMapT in a multi-threaded app?

No, for several reasons. Some (but not all) would be:

  1. HashMap rehashes your hash, so you won't even know what the actual key hash is.
  2. You'd have to look in HashMap's implementation to even know how many buckets there are.
  3. Resizing would have to copy state from the old array, and that wouldn't be safely published across threads.
  4. Other state, like the map's current size, wouldn't be safely updated.

If you constructed your map with a particular jvm implementation in mind, and made sure that your map never resized, and knew you'd never care about that extra state, it's maybe possible, in the strictest of senses. But for any practical purposes, no.

Is this HashMap usage thread safe?

As it is, it is definitely not thread-safe. Each instance of HashmapUsers will use a different lock (this), which does nothing useful. You have to synchronise on the same object, such as the HashMap itself.

Change getAtype to:

private AType getAtype(boolean needsRefresh, Integer id) {
synchronized(theMap) {
if (needsRefresh) {
theMap.clear();
}

if (theMap.size() == 0) {
// populate the set
}

return theMap.get(id);
}
}

Edit:
Note that you can synchronize on any object, provided that all instances use the same object for synchronization. You could synchronize on HashmapUsers.class, which also allows for other objects to lock access to the map (though it is typically best practice to use a private lock).

Because of this, simply making your getAtype method static would work, since the implied lock would now be HashMapUsers.class instead of this. However, this exposes your lock, which may or may not be what you want.

Is setting a HashMap thread safe?

This is not thread safe. Even though there are no writes to the map itself after the point of publication (from the point of view of the thread doing the publication), and reference assignment is atomic, the new Map<> has not been safely published. It particular, there are writes to the Map during its construction - either in the constructor, or after, depending on how you add those elements, and those writes may or may not be seen by other threads, since even though they intuitively occur before the map is published to the other threads, this isn't formally the case according to the memory model.

For an object to be safely published, it must be communicated to the outside world using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:

  • Initializing an object reference from a static initializer.
  • Storing a reference to it into a final field.

Your idiom would be safe if you declared myMap volatile. More details on safe publication can be found in JCIP (highly recommended), or here, or in this longer answer on a similar topic.

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

It really depends on how/when your map is accessed.

Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.

Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:

Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

In this case, you should use ConcurrentHashMap, or synchronize externally.



Related Topics



Leave a reply



Submit