Different types of thread-safe Sets in Java
The
CopyOnWriteArraySet
is a quite simple implementation - it basically has a list of elements in an array, and when changing the list, it copies the array. Iterations and other accesses which are running at this time continue with the old array, avoiding necessity of synchronization between readers and writers (though writing itself needs to be synchronized). The normally fast set operations (especiallycontains()
) are quite slow here, as the arrays will be searched in linear time.Use this only for really small sets which will be read (iterated) often and changed seldom. (Swing's listener-sets would be an example, but these are not really sets, and should be only used from the EDT anyway.)
Collections.synchronizedSet
will simply wrap a synchronized-block around each method of the original set. You should not access the original set directly. This means that no two methods of the set can be executed concurrently (one will block until the other finishes) - this is thread-safe, but you will not have concurrency if multiple threads are using the set. If you use the iterator, you usually still need to synchronize externally to avoid ConcurrentModificationExceptions when modifying the set between iterator calls. The performance will be like the performance of the original set (but with some synchronization overhead, and blocking if used concurrently).Use this if you only have low concurrency, and want to be sure all changes are immediately visible to the other threads.
ConcurrentSkipListSet
is the concurrentSortedSet
implementation, with most basic operations in O(log n). It allows concurrent adding/removing and reading/iteration, where iteration may or may not tell about changes since the iterator was created. The bulk operations are simply multiple single calls, and not done atomically – other threads may observe only some of them.Obviously you can use this only if you have some total order on your elements.
This looks like an ideal candidate for high-concurrency situations, for not-too-large sets (because of the O(log n)).For the
ConcurrentHashMap
(and the Set derived from it): Here most basic options are (on average, if you have a good and fasthashCode()
) in O(1) (but might degenerate to O(n) when many keys have the same hash code), like for HashMap/HashSet. There is a limited concurrency for writing (the table is partitioned, and write access will be synchronized on the needed partition), while read access is fully concurrent to itself and the writing threads (but might not yet see the results of the changes currently being written). The iterator may or may not see changes since it was created, and bulk operations are not atomic.
Resizing is slow (as for HashMap/HashSet), thus you should try to avoid this by estimating the needed size on creation (and using about 1/3 more of that, as it resizes when 3/4 full).Use this when you have large sets, a good (and fast) hash function and can estimate the set size and needed concurrency before creating the map.
Are there other concurrent map implementations one could use here?
Thread safe data structure to check for existence and write if not
The collection classes in the java.util package are not thread-safe in order to provide maximum performance in single-threaded applications. (Vector and Hashtable being exceptions)
There are a few ways to achieve the thread safety you are looking for.
Sychronized Wrapper
Set<String> safeSet = Collections.synchronizedSet(new HashSet<>());
This will wrap all the calls to the underlying set in in a synchronized block, locking on the object. However, That means when a thread is iterating over elements in a collection, all other collection’s methods block, causing other threads having to wait.
java.util.concurrent Package
Java 5 introduced concurrent collections that provide much better performance than synchronized wrappers.
There are different flavors: copy-on-write, Compare-And-Swap, and concurrent collections.
The concurrent collections use a special Lock that is more flexible than synchronization.
So for what you are doing, a HashSet is probably a good match, if it was single threaded. In the concurrent package you could use ConcurrentHashMap.
It would look like this:
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
...
private static final Object PRESENT = new Object();
Map<String, Object> seenStrings = new ConcurrentHashMap<>();
for ( String aString : stringList ) {
if ( seenStrings.containsKey(aString) ) {
// Already there
} else {
// Not seen yet
seenStrings.put(aString, PRESENT);
}
}
Update
Andy's comment is a good one, I wasn't sure what you wanted to do if you had already seen an item or if you haven't.
You could do this to ensure the check and insert are executed atomically
if (seenStrings.put(aString, PRESENT) == null) {
// Not seen yet
}
Update In Java 8+, you can create a set backed by the specified map. Effectively a ConcurrentHashSet.
Set<String> seenStrings = Collections.newSetFromMap(new ConcurrentHashMap<>());
for (String aString : stringList) {
if (seenStrings.add(aString)) {
// Not seen yet
}
}
Java Collection Framework: Some thread safe and some not?
Thread safety carries an overhead (although, in modern VM's, the overhead is much lower than when the collection framework was designed). So collections aren't thread safe unless it is specifically required, with the exception of the JDK1.1 collections - when they were designed, the philosophy was more like "let's leave as little room for error, at the cost of some performance".
We have several phases in Java API evolvement.
JDK1.1
In version 1.1 of Java, we had the data structures Vector
and Hashtable
. They are completely synchronized, providing a level of thread safety.
JDK1.2
In version 1.2 of Java, the collections framework was introduced. None of the basic collections are thread-safe (they don't synchronize any operations) : ArrayList
, LinkedList
, HashMap
, TreeMap
and the Set
implementations.
But you can obtain a synchronized version by calling Collections.synchronizedMap
, Collections.synchronizedList
, etc.
JDK1.5
In version 1.5 of Java, the java.util.concurrent
framework was introduced. They contain specialized data structured for multi-threaded use. These provide a level of thread safety.
Note that even with synchronized collections, it is possible to introduce data races; it only means that you cannot destroy the internal structure of the collections (all the invariants of the collections will be maintained)
For example, if you have a two-step process where you first check that the collection doesn't contain some element, and in the second step, you insert that element. If you don't provide your own synchronization for these two steps, you can get the element added twice if two threads do this at the same time.
How Synchronized and Concurrent Collections are thread-safe but their content not
tl;dr
The objects stored in a thread-safe collection can be leaked outside and used in a non-thread-safe manner.
Detailed Answer:
I thought if Collection is thread-safe then its content will
implicitly be thread-safe, I mean if two threads cannot access my
Collection object then the object which my Collection object is
holding will implicitly become thread-safe.I know sure I missing the point, could someone please explain me with
an example.
Consider the following code that uses two threads to add to the same non-thread-safe list the elements from 0 to 10
. In the end, the main thread sums all the elements of that list. The final result should be same as 0 + 0 + 1 + 1 + ... 9 + 9 = 90.
However, if you execute the code a couple of times you get different values, and sometimes even the following NPE
:
Exception in thread "main" java.lang.NullPointerException
at java.base/java.util.stream.ReduceOps$1ReducingSink.accept(ReduceOps.java:80)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:553)
at Z.CollectionThreadSafe.main(CollectionThreadSafe.java:26)
All this is the result of the race-condition during the call of the method add
.
private static void addToList(List<Integer> list) {
for (int i = 0; i < 10; i++)
list.add(i);
}
public static void main(String[] arg) throws InterruptedException {
final int TOTAL_THREADS = 2;
List<Integer> list = new ArrayList<>();
ExecutorService pool = Executors.newFixedThreadPool(TOTAL_THREADS);
for (int i = 0; i < TOTAL_THREADS; i++) {
pool.submit(() -> addToList(list));
}
pool.shutdown();
pool.awaitTermination(10, TimeUnit.SECONDS);
System.out.println(list.stream().reduce(0, Integer::sum));
}
Let us fix the race-condition by using a thread-Safe List
by calling Collections.synchronizedList. So let us adapt the previous code to:
List<Integer> list = Collections.synchronizedList(new ArrayList<>());
You can run it as many times as you want; the final result is always the same i.e., 90
. That much we knew already. Let us showcase the:
It's worth mentioning that synchronized and concurrent collections
only make the collection itself thread-safe and not the contents.
You just need to adapt the previous code from:
List<Integer> list = Collections.synchronizedList(new ArrayList<>());
to:
final List<List<Integer>> LIST_THREAD_SAFE = Collections.synchronizedList(new ArrayList<>());
LIST_THREAD_SAFE.add(new ArrayList<>());
List<Integer> list = LIST_THREAD_SAFE.get(0);
...
and voilá! you have exactly the same situation as the first example that we have showcased (i.e., race-condition). Even though the list LIST_THREAD_SAFE
is thread-safe its content is not. Hence,
synchronized and concurrent collections only make the collection
itself thread-safe and not the contents.
When is CopyOnWriteArraySet useful to achieve thread-safe HashSet?
It is useful when you have a small set of element for a thread safe collection.
One example is a Set of listeners. You need to ensure uniqueness and iterate over them efficiently.
BTW CopyOnWriteArraySet has the lowest overhead on a per reference basis. It can be as little as 1/6 the size of the other collections. This is particularly useful if you have a lot of them.
while Set data structure is for high performance contains operation, could anybody explain this?
COWAS is more efficient in terms of memory and it's contains
is faster for small collections than the alternatives. What is "high performance" depends on the use case.
Choosing the best concurrency list in Java
had better be
List
The only List
implementation in java.util.concurrent
is CopyOnWriteArrayList. There's also the option of a synchronized list as Travis Webb mentions.
That said, are you sure you need it to be a List
? There are a lot more options for concurrent Queue
s and Map
s (and you can make Set
s from Map
s), and those structures tend to make the most sense for many of the types of things you want to do with a shared data structure.
For queues, you have a huge number of options and which is most appropriate depends on how you need to use it:
- ConcurrentLinkedQueue
- ArrayBlockingQueue
- LinkedBlockingDeque
- LinkedBlockingQueue
- PriorityBlockingQueue
- SynchronousQueue
- DelayQueue
Related Topics
Is Java 7 Using Tim Sort for the Method Arrays.Sort
Understanding Strange Java Hash Function
Different Types of Thread-Safe Sets in Java
Working with a List of Lists in Java
Environment Variable to Control Java.Io.Tmpdir
Replacing All Non-Alphanumeric Characters with Empty Strings
Read File from Resources Folder in Spring Boot
How to Intentionally Cause a Custom Java Compiler Warning Message
Dynamically Changing Log4J Log Level
Check If a String Contains a Special Character
Changing Names of Parameterized Tests
Java Generics: Generic Type Defined as Return Type Only
Rounding Bigdecimal to *Always* Have Two Decimal Places
Hibernate - @Elementcollection - Strange Delete/Insert Behavior