Is It Safe to Get Values from a Java.Util.Hashmap from Multiple Threads (No Modification)

Is it safe to get values from a java.util.HashMap from multiple threads (no modification)?

Your idiom is safe if and only if the reference to the HashMap is safely published. Rather than anything relating the internals of HashMap itself, safe publication deals with how the constructing thread makes the reference to the map visible to other threads.

Basically, the only possible race here is between the construction of the HashMap and any reading threads that may access it before it is fully constructed. Most of the discussion is about what happens to the state of the map object, but this is irrelevant since you never modify it - so the only interesting part is how the HashMap reference is published.

For example, imagine you publish the map like this:

class SomeClass {
public static HashMap<Object, Object> MAP;

public synchronized static setMap(HashMap<Object, Object> m) {
MAP = m;
}
}

... and at some point setMap() is called with a map, and other threads are using SomeClass.MAP to access the map, and check for null like this:

HashMap<Object,Object> map = SomeClass.MAP;
if (map != null) {
.. use the map
} else {
.. some default behavior
}

This is not safe even though it probably appears as though it is. The problem is that there is no happens-before relationship between the set of SomeObject.MAP and the subsequent read on another thread, so the reading thread is free to see a partially constructed map. This can pretty much do anything and even in practice it does things like put the reading thread into an infinite loop.

To safely publish the map, you need to establish a happens-before relationship between the writing of the reference to the HashMap (i.e., the publication) and the subsequent readers of that reference (i.e., the consumption). Conveniently, there are only a few easy-to-remember ways to accomplish that[1]:

  1. Exchange the reference through a properly locked field (JLS 17.4.5)
  2. Use static initializer to do the initializing stores (JLS 12.4)
  3. Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
  4. Initialize the value into a final field (JLS 17.5).

The ones most interesting for your scenario are (2), (3) and (4). In particular, (3) applies directly to the code I have above: if you transform the declaration of MAP to:

public static volatile HashMap<Object, Object> MAP;

then everything is kosher: readers who see a non-null value necessarily have a happens-before relationship with the store to MAP and hence see all the stores associated with the map initialization.

The other methods change the semantics of your method, since both (2) (using the static initalizer) and (4) (using final) imply that you cannot set MAP dynamically at runtime. If you don't need to do that, then just declare MAP as a static final HashMap<> and you are guaranteed safe publication.

In practice, the rules are simple for safe access to "never-modified objects":

If you are publishing an object which is not inherently immutable (as in all fields declared final) and:

  • You already can create the object that will be assigned at the moment of declarationa: just use a final field (including static final for static members).
  • You want to assign the object later, after the reference is already visible: use a volatile fieldb.

That's it!

In practice, it is very efficient. The use of a static final field, for example, allows the JVM to assume the value is unchanged for the life of the program and optimize it heavily. The use of a final member field allows most architectures to read the field in a way equivalent to a normal field read and doesn't inhibit further optimizationsc.

Finally, the use of volatile does have some impact: no hardware barrier is needed on many architectures (such as x86, specifically those that don't allow reads to pass reads), but some optimization and reordering may not occur at compile time - but this effect is generally small. In exchange, you actually get more than what you asked for - not only can you safely publish one HashMap, you can store as many more not-modified HashMaps as you want to the same reference and be assured that all readers will see a safely published map.

For more gory details, refer to Shipilev or this FAQ by Manson and Goetz.


[1] Directly quoting from shipilev.


a That sounds complicated, but what I mean is that you can assign the reference at construction time - either at the declaration point or in the constructor (member fields) or static initializer (static fields).

b Optionally, you can use a synchronized method to get/set, or an AtomicReference or something, but we're talking about the minimum work you can do.

c Some architectures with very weak memory models (I'm looking at you, Alpha) may require some type of read barrier before a final read - but these are very rare today.

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

It really depends on how/when your map is accessed.

Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.

Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:

Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

In this case, you should use ConcurrentHashMap, or synchronize externally.

Java hashmap readonly thread safety

Assuming that new MyLoader().load() returns a map that is completely initialized with all data and which is never modified after that, then it is safe for all threads to retrieve data from this map concurrently. The Javadoc for HashMap says: "If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally." Therefore, if no thread is modifying the map, then it doesn't have to be synchronized.

As a safety measure, your load() method should enforce immutability:

public Map<String, MyObject> load() {
Map<String, MyObject> mymap = new HashMap<>();
mymap.put(...);
...
return Collections.unmodifiableMap(mymap);
}

This way, you don't have to worry that some thread in some code you're unfamiliar with might inadvertently modify the map. It won't be able to.

Thread-safety of unique keys in a HashMap

The answer is simple: HashMap makes absolutely no thread-safety guarantees at all.

In fact it's explicitly documented that it's not thread-safe:

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

So accessing one from multiple threads without any kind of synchronization is a recipe for disaster.

I have seen cases where each thread uses a different key cause issue (like iterations happening at the same time resulting in infinite loops).

Just think of re-hashing: when the threshold is reached, the internal bucket-array needs to be resized. That's a somewhat lengthy operation (compared to a single put). During that time all manner of weird things can happen if another thread tries to put as well (and maybe even triggers a second re-hashing!).

Additionally, there's no reliable way for you to proof that your specific use case is safe, since all tests you could run could just "accidentally" work. In other words: you can never depend on this working, even if you thin k you covered it with unit tests.

And since not everyone is convinced, you can easily test it yourself with this code:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

class HashMapDemonstration {

public static void main(String[] args) throws InterruptedException {
int threadCount = 10;
int valuesPerThread = 1000;
Map<Integer, Integer> map = new HashMap<>();
List<Thread> threads = new ArrayList<>(threadCount);
for (int i = 0; i < threadCount; i++) {
Thread thread = new Thread(new MyUpdater(map, i*valuesPerThread, (i+1)*valuesPerThread - 1));
thread.start();
threads.add(thread);
}
for (Thread thread : threads) {
thread.join();
}
System.out.printf("%d threads with %d values per thread with a %s produced %d entries, should be %d%n",
threadCount, valuesPerThread, map.getClass().getName(), map.size(), threadCount * valuesPerThread);
}
}

class MyUpdater implements Runnable {
private final Map<Integer, Integer> map;
private final int startValue;
private final int endValue;

MyUpdater(Map<Integer, Integer> map, int startValue, int endValue) {
this.map = map;
this.startValue = startValue;
this.endValue = endValue;
System.out.printf("Creating updater for values %d to %d%n", startValue, endValue);
}

@Override
public void run() {
for (int i = startValue; i<= endValue; i++) {
map.put(i, i);
}
}
}

This is exactly the type of program OP mentioned: Each thread will only ever write to keys that no other thread ever touches. And still, the resulting Map will not contain all entries:

Creating updater for values 0 to 999
Creating updater for values 1000 to 1999
Creating updater for values 2000 to 2999
Creating updater for values 3000 to 3999
Creating updater for values 4000 to 4999
Creating updater for values 5000 to 5999
Creating updater for values 6000 to 6999
Creating updater for values 7000 to 7999
Creating updater for values 8000 to 8999
Creating updater for values 9000 to 9999
10 threads with 1000 values per thread with a java.util.HashMap produced 9968 entries, should be 10000

Note that the actual number of entries in the final Map will vary for each run. It even sometimes prints 10000 (because it's not thread-safe!).

Note that this failure mode (losing entries) is definitely not the only possible one: basically anything could happen.

Is a HashMap thread-safe for different keys?

In @dotsid's answer he says this:

If you change a HashMap in any way then your code is simply broken.

He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.

  • If one thread does a put, then another thread may see a stale value for the hashmap's size.

  • If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.

  • When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.

  • When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.

  • When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.

And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.

I can think of three solutions:

  1. Use a ConcurrentHashMap.
  2. Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
  3. Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.

1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.

HashMap resize() while single thread writing and multiple threads reading

No, this is not thread safe. As the javadoc of HashMap states:

If multiple threads access a hash map concurrently, and at least one
of the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.)

The fact that one thread is changing the map while others are reading concurrently is by definition unsafe. You will need to use ConcurrentHashMap or Collections.synchronizedMap or another synchronization solution.

Why does the code hang with HashMap.put() from multiple threads?

Why does the code hang with HashMap.put() from multiple threads?

You cannot use a HashMap with multiple threads without external synchronization. You should switch to use ConcurrentHashMap.

You could also use the Collections.synchronizedMap(new HashMap<>()); but the ConcurrentHashMap should give better performance.

I was expecting ConcurrentModificationException, but all I got was hanging threads in the executor and I don't see where exactly is the problem.

You are seeing a hang because one of the threads has a corrupted version of the HashMap -- probably a looped linked list where two hash entries are linked to each other or something. If you do a thread dump you will see that it is spinning while traversing the HashMap entries.

The ConcurrentModificationException is only thrown if the HashMap class detects a modification. Typically this is in single-threaded usage when you (for example) remove an entry by calling map.remove(...) instead of iterator.remove() while iterating across the map.

Synchronization does two important things: mutex locking and memory synchronization. Each processor has its own memory cache and threads can easily see partially synchronized views of an object (your HashMap in this case) without proper memory synchronization.

The only thing which is quite frustrating is that sometimes it throws an exception, but most of the time it just hangs.

In multithreaded situations, there are tons of race conditions by definition since the threads are usually running in parallel on multiple processors. It's very hard to predict, given the parallel nature of the environment, on the type of failure.

HashMap: safe for multithreaded access for multiple readers and one single writer?

The internal elements are in an indeterminate state when a thread "modifies the map structurally" so reads could be affected. Thus the requirement to use some method external to the map to synchronize both reads and writes.

Perhaps the writers of the .Net library were more careful to keep their internal structure in a determinate state during updates.



Related Topics



Leave a reply



Submit