Why There Is No Concurrenthashset Against Concurrenthashmap

Why there is no ConcurrentHashSet against ConcurrentHashMap

There's no built in type for ConcurrentHashSet because you can always derive a set from a map. Since there are many types of maps, you use a method to produce a set from a given map (or map class).

Prior to Java 8, you produce a concurrent hash set backed by a concurrent hash map, by using Collections.newSetFromMap(map)

In Java 8 (pointed out by @Matt), you can get a concurrent hash set view via ConcurrentHashMap.newKeySet(). This is a bit simpler than the old newSetFromMap which required you to pass in an empty map object. But it is specific to ConcurrentHashMap.

Anyway, the Java designers could have created a new set interface every time a new map interface was created, but that pattern would be impossible to enforce when third parties create their own maps. It is better to have the static methods that derive new sets; that approach always works, even when you create your own map implementations.

ConcurrentHashMap does not work as expected

Well, there's a compound action here. You get the map value given a key, increment it by one, and place it back in the map against the same key. You have to guarantee that all these statements execute atomically. But the given implementation does not impose that prerequisite. Hence you end up with a safety failure.

To fix this, you can use the atomic merge operation defined in ConcurrentHashMap. The entire method invocation is performed atomically. Here's how it looks.

Map<String, Integer> voting = new ConcurrentHashMap<>();

for (int i = 0; i < 16; i++)
    new Thread(() -> {
        voting.merge("GERB", 1, Integer::sum);
    }).start();

for (int i = 0; i < 100; i++)
    voting.merge("GERB", 1, Integer::sum);

Thread.sleep(5000); // Waits for the threads to finish

for (String s : voting.keySet())
    System.out.println(s + ": " + voting.get(s));

Running this program produces the following output:

GERB: 116

concurrent writes of unique values to a java HashSet

HashSet, and HashMap, use a hashing mechanism to determine which bucket to put the new entry in. Generally, this will not be affected by parallel access so long as the keys are distinct.

Bucket sizes are carefully observed because if they start getting large, performance takes a significant hit. When a HashSet's bucket gets too full a resize is triggered. This rearranges all of the buckets into a better arrangement.

If another thread attempts to write/read during a resize is is very likely that the process will crash as the structure is in the midst of being rearranged.

If normal synchronisation mechanisms seem too onerous you may be better to use a ReadWriteLock. See here for some usage hints.

Java ConcurrentHashMap is better than HashMap performance wise?

Doug Lea is extremely good at these things, so I won't be surprised if at one time his ConcurrentHashMap performs better than Joshua Bloch's HashMap. However as of Java 7, the first @author of HashMap has become Doug Lea too. Obviously now there's no reason HashMap would be any slower than its concurrent cousin.

Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually ConcurrentHashMap is within 3% of HashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don't fit in cache, both will be slow. In real applications, a map doesn't have to be big to compete with others for residing in cache. If a map is used often, it's cached; if not, it's not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)

public static void main(String[] args)
{
    for(int i = 0; i<100; i++)
    {
        System.out.println();

        int entries = i*100*1000;
        long t0=test( entries, new FakeMap() );
        long t1=test( entries, new HashMap() );
        long t2=test( entries, new ConcurrentHashMap() );

        long diff = (t2-t1)*100/(t1-t0);
        System.out.printf("entries=%,d time diff= %d%% %n", entries, diff);
    }
}

static long test(int ENTRIES, Map map)
{
    long SEED = 0;
    Random random = new Random(SEED);

    int RW_RATIO = 10;

    long t0 = System.nanoTime();

    for(int i=0; i<ENTRIES; i++)
        map.put( random.nextInt(), random.nextInt() );

    for(int i=0; i<RW_RATIO; i++)
    {
        random.setSeed(SEED);
        for(int j=0; j<ENTRIES; j++)
        {
            map.get( random.nextInt() );
            random.nextInt();
        }
    }
    long t = System.nanoTime()-t0;
    System.out.printf("%,d ns %s %n", t, map.getClass());
    return t;
}

static class FakeMap implements Map
{
    public Object get(Object key)
    {
        return null;  
    }
    public Object put(Object key, Object value)
    {
        return null;  
    }
    // etc. etc.
}

Whats the best way to obtain a concurrent hash set when write operations are excess than read operations?

ConcurrentHashMap.newKeySet() and ConcurrentHashMap.keySet() that return a KeySetView rely on the ConcurrentHashMap class that locks at the writing only and only the key mapping concerned by the writing and not the whole map.

Consequently, reading operations are fast but writing too (very slightly slower in the facts).

CopyOnWriteArraySet that relies on CopyOnWriteArrayList doesn't lock for reading but locks for writing operations. As a consequence to guarantee the concurrent reading consistency each writing operation triggers a copy of the underlying array.

So in the case of not small collection (100 element or more for example), CopyOnWriteArraySet writing operations should be more expensive (lock on the whole collection + copy of the underlying array) than KeySetView (lock only and uniquely the concerned entry).

The CopyOnWriteArraySet javadoc underlines that point :

It is best suited for applications in which set sizes generally stay small, read-only operations vastly outnumber mutative operations, and
you need to prevent interference among threads during traversal.

Mutative operations (add, set, remove, etc.) are expensive since they usually entail copying the entire underlying array

Here is a benchmark where we compare both behavior.

The Set<String>s are initialized with 100 elements ("0" to 99" value) at each iteration.

The 6 first methods are writing operations (remove, add a new element, overwrite an existing element).

while the 4 next methods are reading operations (iterate, contains).

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Threads(8)
public class SetBenchmark {

    private Set<String> keySetView;
    private Set<String> copyOnWriteArraySet;
    private Random random;

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(SetBenchmark.class.getSimpleName())
                                          .warmupIterations(5)
                                          .measurementIterations(5)
                                          .forks(1)
                                          .build();

        new Runner(opt).run();
    }

    @Setup(Level.Iteration)
    public void doSetup() {
        random = new Random(1);
        keySetView = ConcurrentHashMap.newKeySet();
        copyOnWriteArraySet = new CopyOnWriteArraySet<>();
        init(keySetView);
        init(copyOnWriteArraySet);

    }

    private void init(Set<String> set) {
        IntStream.range(0, 100)
                 .forEach(i -> {
                     final String string = String.valueOf((char) i);
                     set.add(string);
                 });

    }

    // Writing
    @Benchmark
    public void _1_keySetView_remove() {
        doRemove(keySetView);
    }

    @Benchmark
    public void _2_copyOnWriteArraySet_remove() {
        doRemove(copyOnWriteArraySet);
    }

    @Benchmark
    public void _3_keySetView_add_with_new_value() {
        doAddWithNewValue(keySetView);
    }

    @Benchmark
    public void _4_copyOnWriteArraySet_add_with_new_value() {
        doAddWithNewValue(copyOnWriteArraySet);
    }

    @Benchmark
    public void _5_keySetView_add_with_existing_value() {
        doAddWithExistingValue(keySetView);
    }

    @Benchmark
    public void _6_copyOnWriteArraySet_add_with_existing_value() {
        doAddWithExistingValue(copyOnWriteArraySet);
    }

    // Reading
    @Benchmark
    public void _7_keySetView_iterate() {
        String res = doIterate(keySetView);
    }

    @Benchmark
    public void _8_copyOnWriteArraySet_iterate() {
        String res = doIterate(copyOnWriteArraySet);
    }

    @Benchmark
    public void _9_keySetView_contains() {
        boolean res = doContains(keySetView);
    }

    @Benchmark
    public void _010_copyOnWriteArraySet_contains() {
        boolean res = doContains(copyOnWriteArraySet);
    }

    // Writing
    private void doRemove(Set<String> set) {
        set.remove(getRandomString());
    }

    private void doAddWithNewValue(Set<String> set) {
        set.add(getRandomString() + set.size());
    }

    private void doAddWithExistingValue(Set<String> set) {
        set.add(getRandomString());
    }

    // Reading
    private String doIterate(Set<String> set) {
        String result = "";
        for (String string : set) {
            result += string;
        }
        return result;
    }

    private boolean doContains(Set<String> set) {
        return set.contains(getRandomString());
    }

    // Random value with seed
    private String getRandomString() {
        return String.valueOf(random.nextInt(100));
    }

}

There is the result (lower score is better).

Writing operations :


Benchmark                                                   Mode  Cnt      Score   Error  Units
SetBenchmark._1_keySetView_remove                            avgt          61,659          ns/op
SetBenchmark._2_copyOnWriteArraySet_remove                   avgt         249,976          ns/op

SetBenchmark._3_keySetView_add_with_new_value                avgt         240,589          ns/op
SetBenchmark._4_copyOnWriteArraySet_add_with_new_value       avgt       30691,318          ns/op

SetBenchmark._5_keySetView_add_with_existing_value           avgt          84,472          ns/op
SetBenchmark._6_copyOnWriteArraySet_add_with_existing_value  avgt         473,592          ns/op

Reading operations :


Benchmark                                                   Mode  Cnt      Score   Error  Units
SetBenchmark._7_keySetView_iterate                           avgt       13603,012          ns/op
SetBenchmark._8_copyOnWriteArraySet_iterate                  avgt       13626,146          ns/op

SetBenchmark._9_keySetView_contains                          avgt          53,081          ns/op
SetBenchmark._10_copyOnWriteArraySet_contains                avgt         250,401          ns/op

Reading are as fast for the two Set implementations only for the iterator.

And writing is much slower for CopyOnWriteArraySet in any case but when we add() a not existing value, that is still worse.

Is values() of ConcurrentHashMap thread safe?

Is iterating over result of values() method the recommended way of fetching results without getting a ConcurrentModificationException?

Yes. It is the recommended way, and you won't get a ConcurrentModificationException.

As the package level javadoc states:

Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.

Is the below code thread safe?

  for (String name: myMap.values()) {
       System.out.println("name": + name); 
  }

Yes ... with some qualifications.

Thread safety really means that the code works according to its specified behavior in a multi-threaded application. The problem is that you haven't said clearly what you expect the code to actually do.

What we can say is the following:

The iteration will see values as per the previously stated guarantees.
The memory model guarantees mean that there shouldn't be any nasty behavior with stale values ... unless you mutate value objects after putting them into the map. (If you do that, then the object's methods need to be implemented to cope with that; e.g. they may need to be synchronized. This is moot for String values, since they are immutable.)

Why There Is No Concurrenthashset Against Concurrenthashmap