Concurrenthashmap VS Synchronized Hashmap

ConcurrentHashMap vs Synchronized HashMap

Synchronized HashMap

  1. Each method is synchronized using an object level lock. So the get and put methods on synchMap acquire a lock.

  2. Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.

ConcurrentHashMap was introduced in JDK 5.

  1. There is no locking at the object level,The locking is at a much finer granularity. For a ConcurrentHashMap, the locks may be at a hashmap bucket level.

  2. The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads too much more scalability.

  3. ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it.

This article Java 7: HashMap vs ConcurrentHashMap is a very good read. Highly recommended.

What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?

For your needs, use ConcurrentHashMap. It allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency (if used properly).

Use the second option if you need to ensure data consistency, and each thread needs to have an up-to-date view of the map. Use the first if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.

Is there any reason to use a synchronized HashMap rather than ConcurrentHashMap?

The question is what is important for you from the application point of view. If you want to have a string conversion of the map synchronized with the state after the put method invocation, your code is the only solution.

If we say that a state of the map before yours synchronized block it this:

  • key1:value1
  • key2:value2

and you invoke your piece of code with

key = "key3" and value="value3"

the synchronized block ensures the string conversion will be

(key1,value1)(key2,value2)(key3,value3)

If you remove synchronized block and change map to some synchronized implementation the only synchronized part will be put itself. So the timeline of method invoke can be in some cases like:

  • Thread1 put(x,x)
  • Thread2 put(x,x)
  • Thread1 for cycle to string conversion
  • Thread2 for cycle to string conversion

So Thread1 converts incorrect state because Thread2 was fast enough to put a new entry before Thread1 invoke string conversion.

Generally synchronized implementation of any collection is useful only in cases you want methods of collection as atomic operations. If you need cooperation of more methods in the same state of collection you have to synchronize them from outside every time.

Synchronized HashMap vs ConcurrentHashMap write test


One

How many threads varies, not by CPU, but by what you are doing. If, for example, what you are doing with your threads is highly disk intensive, your CPU isn't likely to be maxed out, so doing 8 threads may just cause heavy thrashing. If, however, you have huge amounts of disk activity, followed by heavy computation, followed by more disk activity, you would benefit from staggering the threads, splitting out your activities and grouping them, as well as using more threads. You would, for example, in such a case, likely want to group together file activity that uses a single file, but maybe not activity where you are pulling from a bunch of files (unless they are written contiguously on the disk). Of course, if you overthink disk IO, you could seriously hurt your performance, but I'm making a point of saying that you shouldn't just shirk it, either. In such a program, I would probably have threads dedicated to disk IO, threads dedicated to CPU work. Divide and conquer. You'd have fewer IO threads and more CPU threads.

It is common for a synchronous server to run many more threads than cores/CPUs because most of those threads either do work for only a short time or don't do much CPU intensive work. It's not useful to have 500 threads, though, if you will only ever have 2 clients and the context switching of those excess threads hampers performance. It's a balancing act that often requires a little bit of tuning.

In short

  1. Think about what you are doing
    • Network activity is light,so more threads are generally good
    • CPU intensive things don't do much good if you have 2x more of those threads than cores... usually a little more than 1x or a little less than 1x is optimum, but you have to test, test, test
    • Having 10 disk IO intensive threads may hurt all 10 threads, just like having 30 CPU intensive threads... the thrashing hurts them all
  2. Try to spread out the pain
    • See if it helps to spread out the CPU, IO, etc, work or if clustering is better... it will depend on what you are doing
  3. Try to group things up
    • If you can, separate out your disk, IO, and network tasks and give them their own threads that are tuned to those tasks

Two

In general, thread-unsafe methods run faster. Similarly using localized synchronization runs faster than synchronizing the entire method. As such, HashMap is normally significantly faster than ConcurrentHashMap. Another example would be StringBuffer compared to StringBuilder. StringBuffer is synchronized and is not only slower, but the synchronization is heavier (more code, etc); it should rarely be used. StringBuilder, however, is unsafe if you have multiple threads hitting it. With that said, StringBuffer and ConcurrentHashMap can race, too. Being "thread-safe" doesn't mean that you can just use it without thought, particularly the way that these two classes operate. For example, you can still have a race condition if you are reading and writing at the same time (say, using contains(Object) as you are doing a put or remove). If you want to prevent such things, you have to use your own class or synchronize your calls to your ConcurrentHashMap.

I generally use the non-concurrent maps and collections and just use my own locks where I need them. You'll find that it's much faster that way and the control is great. Atomics (e.g. AtomicInteger) are nice sometimes, but really not generally useful for what I do. Play with the classes, play with synchronization, and you'll find that you can master than more efficiently than the shotgun approach of ConcurrentHashMap, StringBuffer, etc. You can have race conditions whether or not you use those classes if you don't do it right... but if you do it yourself, you can also be much more efficient and more careful.


Example

Note that we have a new Object that we are locking on. Use this instead of synchronized on a method.

public final class Fun {
private final Object lock = new Object();

/*
* (non-Javadoc)
*
* @see java.util.Map#clear()
*/
@Override
public void clear() {
// Doing things...
synchronized (this.lock) {
// Where we do sensitive work
}
}

/*
* (non-Javadoc)
*
* @see java.util.Map#put(java.lang.Object, java.lang.Object)
*/
@Override
public V put(final K key, @Nullable final V value) {
// Doing things...
synchronized (this.lock) {
// Where we do sensitive work
}
// Doing things...
}
}

And From Your Code...

I might not put that sb.append(index) in the lock or might have a separate lock for index calls, but...

    private final Object lock = new Object();

private String getNextString() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 5; i++) {
char c = chars[random.nextInt(chars.length)];
sb.append(c);
}
synchronized (lock) {
sb.append(index);
if (map.containsKey(sb.toString()))
System.out.println("dublicate:" + sb.toString());
}
return sb.toString();
}

private int getNextInt() {
synchronized (lock) {
return index++;
}
}

Java synchronized block vs concurrentHashMap vs Collections.synchronizedMap

If you want to have all read and write actions to your HashMap synchronized, you need to put the synchronize on all methods accessing the HashMap; it is not enough to block just one method.

ConcurrentHashMap allows thread-safe access to your data without locking. That means you can add/remove values in one thread and at the same time get values out in another thread without running into an exception. See also the documentation of ConcurrentHashMap

Does a ConcurrentHashMap need to be wrapped in a synchronized block?

No, you are losing the benefits of ConcurrentHashMap by doing that. You may as well be using a HashMap with synchronized or synchronizedMap() to lock the whole table (which is what you do when wrapping operations in synchronized, since the monitor implied is the entire object instance.)

The purpose of ConcurrentHashMap is to increase the throughput of your concurrent code by allowing concurrent read/writes on the table without locking the entire table. The table supports this internally by using lock striping (multiple locks instead of one, with each lock assigned to a set of hash buckets - see Java Concurrency in Practice by Goetz et al).

Once you are using ConcurrentHashMap, all standard map methods (put(), remove(), etc.) become atomic by virtue of the lock striping etc. in the implementation. The only tradeoffs are that methods like size() and isEmpty() may not necessarily return accurate results, since the only way they could would be for all operations to lock the whole table.

The ConcurrentMap interface interface also adds new atomic compound operations like putIfAbsent() (put something only if it the key is not already in the map), remove() accepting both key and value (remove an entry only if its value equals a parameter you pass), etc. These operations used to require locking the whole table because they needed two method calls to accomplish (e.g. putIfAbsent() would need calls to both containsKey() and put(), wrapped inside one synchronized block, if you were using a standard Map implementation.) Once again, you gain greater throughput using these methods, by avoiding locking the entire table.

Java concurrency using ConcurrentHashMap with synchronized block

Your threads don't share any internal state. They're working fine, but the output is interleaved.

For instance if you used a StringBuilder to do the I/O in one operation, you should see correct output.

        StringBuilder buff = new StringBuilder();
for (int i : storage.keySet()) {
buff.append("(" + i + "," + storage.get(i) + ") ");
}
System.out.println(buff);

There is no good reason for Server to be Runnable, or even to create any instances of it.

You do not share any of the maps. If you did, then you would also want to share a common lock, but this is not the usual way to use ConcurrentMap.

Java Concurrency: HashMap Vs ConcurrentHashMap when concurrent threads only remove elements

The javadoc of HashMap says:

Note that this implementation is not
synchronized.

If multiple threads access a hash map
concurrently, and at least one of the threads modifies the map
structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with a key
that an instance already contains is not a structural
modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map.

As mentioned above, deletion is a structural change and you must use synchronization.

Furthermore, in the removeNode() method of Hashmap (which is called by the remove() method), the modCount variable is incremented, which is responsible for ConcurrentModificationException. So you might get this exception if you remove elements without synchronization.

Therefore you must use a ConcurrentHashMap.



Related Topics



Leave a reply



Submit