Performance Concurrenthashmap VS Hashmap

Performance ConcurrentHashmap vs HashMap

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeter I have created tests of add, get and remove for both HashMap and ConcurrentHashMap in two scenarios:

using single thread
using as many threads as I have cores available. Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap.

Code is available on my repo.

The results are as follows:

X axis (size) presents number of elements written to the map(s)
Y axis (value) presents time in milliseconds

Add method
Get method
Remove method

The summary

If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.
If you choose a single thread access use HashMap, it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap, but not much.
When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread. So there is no need to partition your data in different structures.

To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.

Testing platform

AMD FX6100, 16GB Ram

Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8

Java ConcurrentHashMap is better than HashMap performance wise?

Doug Lea is extremely good at these things, so I won't be surprised if at one time his ConcurrentHashMap performs better than Joshua Bloch's HashMap. However as of Java 7, the first @author of HashMap has become Doug Lea too. Obviously now there's no reason HashMap would be any slower than its concurrent cousin.

Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually ConcurrentHashMap is within 3% of HashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don't fit in cache, both will be slow. In real applications, a map doesn't have to be big to compete with others for residing in cache. If a map is used often, it's cached; if not, it's not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)

public static void main(String[] args)
{
    for(int i = 0; i<100; i++)
    {
        System.out.println();

        int entries = i*100*1000;
        long t0=test( entries, new FakeMap() );
        long t1=test( entries, new HashMap() );
        long t2=test( entries, new ConcurrentHashMap() );

        long diff = (t2-t1)*100/(t1-t0);
        System.out.printf("entries=%,d time diff= %d%% %n", entries, diff);
    }
}

static long test(int ENTRIES, Map map)
{
    long SEED = 0;
    Random random = new Random(SEED);

    int RW_RATIO = 10;

    long t0 = System.nanoTime();

    for(int i=0; i<ENTRIES; i++)
        map.put( random.nextInt(), random.nextInt() );

    for(int i=0; i<RW_RATIO; i++)
    {
        random.setSeed(SEED);
        for(int j=0; j<ENTRIES; j++)
        {
            map.get( random.nextInt() );
            random.nextInt();
        }
    }
    long t = System.nanoTime()-t0;
    System.out.printf("%,d ns %s %n", t, map.getClass());
    return t;
}

static class FakeMap implements Map
{
    public Object get(Object key)
    {
        return null;  
    }
    public Object put(Object key, Object value)
    {
        return null;  
    }
    // etc. etc.
}

What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?

For your needs, use ConcurrentHashMap. It allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency (if used properly).

Use the second option if you need to ensure data consistency, and each thread needs to have an up-to-date view of the map. Use the first if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.

Java Concurrency: HashMap Vs ConcurrentHashMap when concurrent threads only remove elements

The javadoc of HashMap says:

Note that this implementation is not
synchronized.

If multiple threads access a hash map
concurrently, and at least one of the threads modifies the map
structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with a key
that an instance already contains is not a structural
modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map.

As mentioned above, deletion is a structural change and you must use synchronization.

Furthermore, in the removeNode() method of Hashmap (which is called by the remove() method), the modCount variable is incremented, which is responsible for ConcurrentModificationException. So you might get this exception if you remove elements without synchronization.

Therefore you must use a ConcurrentHashMap.

ConcurrentHashMap vs Synchronized HashMap

Synchronized HashMap：

Each method is synchronized using an object level lock. So the get and put methods on synchMap acquire a lock.
Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.

ConcurrentHashMap was introduced in JDK 5.

There is no locking at the object level,The locking is at a much finer granularity. For a ConcurrentHashMap, the locks may be at a hashmap bucket level.
The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads to much more scalability.
ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it.

This article Java 7: HashMap vs ConcurrentHashMap is a very good read. Highly recommended.

Java Concurrency: Are get(Key) for HashMap and ConcurrentHashMap equal in performance?

You could look at the source code. (I'm looking at JDK 6) HashMap.get() is pretty simple:

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Where hash() does some extra shifting and XORing to "improve" your hash code.

ConcurrentHashMap.get() is a bit more complex, but not a lot

public V get(Object key) {
    int hash = hash(key.hashCode());
    return segmentFor(hash).get(key, hash);
}

Again, hash() does some shifting and XORing. setMentFor(int hash) does a simple array lookup. The only complex stuff is in Segment.get(). But even that doesn't look like rocket science:

V get(Object key, int hash) {
   if (count != 0) { // read-volatile
      HashEntry<K,V> e = getFirst(hash);
      while (e != null) {
         if (e.hash == hash && key.equals(e.key)) {
            V v = e.value;
            if (v != null)
               return v;
            return readValueUnderLock(e); // recheck
          }
          e = e.next;
      }
 }
  return null;
}

The one place where is gets a lock is readValueUnderLock(). The comments say that this is technically legal under the memory model but never known to occur.

Overall, looks like the code is pretty similar for both. Just a bit better organized in ConcurrentHashMap. So I'd guess that the performance is similar enough.

That said, if puts really are extremely rare, you could consider implementing a "copy on write" type of mechanism.

Which one to use: HashMap and ConcurrentHashMap with Future values JAVA?

You're saying that

Map values are not getting modified.

so there should be no reason to choose ConcurrentHashMap. ConcurrentHashMap is meant to provide thread-safety and atomicity of operations across threads, which neither you are needing.

The tradeoffs are that you're paying the price for thread-safe/atomic operations when you don't need them. As to how much that actually is, it depends on your app.

Related threads

Performance ConcurrentHashmap vs HashMap
Are there any drawbacks with ConcurrentHashMap?

Synchronized HashMap vs ConcurrentHashMap write test

One

How many threads varies, not by CPU, but by what you are doing. If, for example, what you are doing with your threads is highly disk intensive, your CPU isn't likely to be maxed out, so doing 8 threads may just cause heavy thrashing. If, however, you have huge amounts of disk activity, followed by heavy computation, followed by more disk activity, you would benefit from staggering the threads, splitting out your activities and grouping them, as well as using more threads. You would, for example, in such a case, likely want to group together file activity that uses a single file, but maybe not activity where you are pulling from a bunch of files (unless they are written contiguously on the disk). Of course, if you overthink disk IO, you could seriously hurt your performance, but I'm making a point of saying that you shouldn't just shirk it, either. In such a program, I would probably have threads dedicated to disk IO, threads dedicated to CPU work. Divide and conquer. You'd have fewer IO threads and more CPU threads.

It is common for a synchronous server to run many more threads than cores/CPUs because most of those threads either do work for only a short time or don't do much CPU intensive work. It's not useful to have 500 threads, though, if you will only ever have 2 clients and the context switching of those excess threads hampers performance. It's a balancing act that often requires a little bit of tuning.

In short

Think about what you are doing
- Network activity is light,so more threads are generally good
- CPU intensive things don't do much good if you have 2x more of those threads than cores... usually a little more than 1x or a little less than 1x is optimum, but you have to test, test, test
- Having 10 disk IO intensive threads may hurt all 10 threads, just like having 30 CPU intensive threads... the thrashing hurts them all
Try to spread out the pain
- See if it helps to spread out the CPU, IO, etc, work or if clustering is better... it will depend on what you are doing
Try to group things up
- If you can, separate out your disk, IO, and network tasks and give them their own threads that are tuned to those tasks

Two

In general, thread-unsafe methods run faster. Similarly using localized synchronization runs faster than synchronizing the entire method. As such, HashMap is normally significantly faster than ConcurrentHashMap. Another example would be StringBuffer compared to StringBuilder. StringBuffer is synchronized and is not only slower, but the synchronization is heavier (more code, etc); it should rarely be used. StringBuilder, however, is unsafe if you have multiple threads hitting it. With that said, StringBuffer and ConcurrentHashMap can race, too. Being "thread-safe" doesn't mean that you can just use it without thought, particularly the way that these two classes operate. For example, you can still have a race condition if you are reading and writing at the same time (say, using contains(Object) as you are doing a put or remove). If you want to prevent such things, you have to use your own class or synchronize your calls to your ConcurrentHashMap.

I generally use the non-concurrent maps and collections and just use my own locks where I need them. You'll find that it's much faster that way and the control is great. Atomics (e.g. AtomicInteger) are nice sometimes, but really not generally useful for what I do. Play with the classes, play with synchronization, and you'll find that you can master than more efficiently than the shotgun approach of ConcurrentHashMap, StringBuffer, etc. You can have race conditions whether or not you use those classes if you don't do it right... but if you do it yourself, you can also be much more efficient and more careful.

Example

Note that we have a new Object that we are locking on. Use this instead of synchronized on a method.

public final class Fun {
    private final Object lock = new Object();

    /*
     * (non-Javadoc)
     *
     * @see java.util.Map#clear()
     */
    @Override
    public void clear() {
        // Doing things...
        synchronized (this.lock) {
            // Where we do sensitive work
        }
    }

    /*
     * (non-Javadoc)
     *
     * @see java.util.Map#put(java.lang.Object, java.lang.Object)
     */
    @Override
    public V put(final K key, @Nullable final V value) {
        // Doing things...
        synchronized (this.lock) {
            // Where we do sensitive work
        }
        // Doing things...
    }
}

And From Your Code...

I might not put that sb.append(index) in the lock or might have a separate lock for index calls, but...

    private final Object lock = new Object();

    private String getNextString() {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 5; i++) {
            char c = chars[random.nextInt(chars.length)];
            sb.append(c);
        }
        synchronized (lock) {
            sb.append(index);
            if (map.containsKey(sb.toString()))
                System.out.println("dublicate:" + sb.toString());
        }
        return sb.toString();
    }

    private int getNextInt() {
        synchronized (lock) {
            return index++;
        }
    }

Performance Concurrenthashmap VS Hashmap