Concurrent HashSet T in .NET Framework?
Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.
ConcurrentDictionary (recommended)
This first one is to use the class ConcurrentDictionary<TKey, TValue>
in the namespace System.Collections.Concurrent
. In the case, the value is pointless, so we can use a simple byte
(1 byte in memory).
private ConcurrentDictionary<string, byte> _data;
This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T>
except key and value are different objects.
Source: Social MSDN
ConcurrentBag
If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T>
in the same namespace of the previous class.
private ConcurrentBag<string> _data;
Self-implementation
Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net
The only drawback of this solution is that the type HashSet<T>
doesn't officially concurrent access, even for reading operations.
I quote the code of the linked post (originally written by Ben Mosher).
using System;
using System.Collections.Generic;
using System.Threading;
namespace BlahBlah.Utilities
{
public class ConcurrentHashSet<T> : IDisposable
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();
#region Implementation of ICollection<T> ...ish
public bool Add(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void Clear()
{
_lock.EnterWriteLock();
try
{
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
_lock.EnterReadLock();
try
{
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public bool Remove(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public int Count
{
get
{
_lock.EnterReadLock();
try
{
return _hashSet.Count;
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
}
#endregion
#region Dispose
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing)
if (_lock != null)
_lock.Dispose();
}
~ConcurrentHashSet()
{
Dispose(false);
}
#endregion
}
}
EDIT: Move the entrance lock methods ouside the try
blocks, as they could throw an exception and execute the instructions contained in the finally
blocks.
How to implement ConcurrentHashSet in .Net
I just ran into a similar scenario ("I am interested in a fast Add and Contains and Remove") and implemented this sucker:
using System.Collections.Generic;
using System.Threading;
namespace BlahBlah.Utilities
{
public class ConcurrentHashSet<T> : IDisposable
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();
#region Implementation of ICollection<T> ...ish
public bool Add(T item)
{
try
{
_lock.EnterWriteLock();
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public void Clear()
{
try
{
_lock.EnterWriteLock();
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
try
{
_lock.EnterReadLock();
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
public bool Remove(T item)
{
try
{
_lock.EnterWriteLock();
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}
public int Count
{
get
{
try
{
_lock.EnterReadLock();
return _hashSet.Count;
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
}
#endregion
#region Dispose
public void Dispose()
{
if (_lock != null) _lock.Dispose();
}
#endregion
}
}
Haven't really tested it (performance- or reliability-wise). YMMV.
Is Contains thread safe in HashSet T
Normally (normally) collections that are used only for reading are "unofficially" thread safe (there is no collection in .NET that I know that modifies itself during reading). There are some caveats:
- The items themselves could not be thread safe (but with an
HashSet<T>
this problem should be minimized, because you can't extract items from it. Still theGetHashCode()
and theEquals()
must be thread-safe. If, for example, they access lazy objects that are loaded on-demand, they could be not-thread safe, or perhaps they cache/memoize some data to speed-up subsequent operations) - You must be sure that after the last write there is a
Thread.MemoryBarrier()
(done in the same thread as the write) or equivalent, otherwise a read on another thread could read incomplete data - You must be sure that in each thread (different from the one where you did a write), before doing the first read there is a
Thread.MemoryBarrier()
. Note that if theHashSet<T>
was "prepared" (with the Thread.MemoryBarrier() at the end) before creating/starting the other threads, then theThread.MemoryBarrier()
isn't necessary, because the threads can't have a stale read of the memory (because they didn't exist). Various operations cause an implicitThread.MemoryBarrier()
. For example if the threads where created before theHashSet<T>
was filled, entered aWait()
and wereun-Waited
after theHashSet<T>
was filled (plus itsThread.MemoryBarrier()
), exiting aWait()
causes an implicitThread.MemoryBarrier()
A simple example of a class that uses memoization/lazy loading/whatever you want to call it and in that way can break the thread safety.
public class MyClass
{
private long value2;
public int Value1 { get; set; }
// Value2 is lazily loaded in a very primitive
// way (note that Lazy<T> *can* be used thread-safely!)
public long Value2
{
get
{
if (value2 == 0)
{
// value2 is a long. If the .NET is running at 32 bits,
// the assignment of a long (64 bits) isn't atomic :)
value2 = LoadFromServer();
// If thread1 checks and see value2 == 0 and loads it,
// and then begin writing value2 = (value), but after
// writing the first 32 bits of value2 we have that
// thread2 reads value2, then thread2 will read an
// "incomplete" data. If this "incomplete" data is == 0
// then a second LoadFromServer() will be done. If the
// operation was repeatable then there won't be any
// problem (other than time wasted). But if the
// operation isn't repeatable, or if the incomplete
// data that is read is != 0, then there will be a
// problem (for example an exception if the operation
// wasn't repeatable, or different data if the operation
// wasn't deterministic, or incomplete data if the read
// was != 0)
}
return value2;
}
}
private long LoadFromServer()
{
// This is a slow operation that justifies a lazy property
return 1;
}
public override int GetHashCode()
{
// The GetHashCode doesn't use Value2, because it
// wants to be fast
return Value1;
}
public override bool Equals(object obj)
{
MyClass obj2 = obj as MyClass;
if (obj2 == null)
{
return false;
}
// The equality operator uses Value2, because it
// wants to be correct.
// Note that probably the HashSet<T> doesn't need to
// use the Equals method on Add, if there are no
// other objects with the same GetHashCode
// (and surely, if the HashSet is empty and you Add a
// single object, that object won't be compared with
// anything, because there isn't anything to compare
// it with! :-) )
// Clearly the Equals is used by the Contains method
// of the HashSet
return Value1 == obj2.Value1 && Value2 == obj2.Value2;
}
}
Locking HashSet for concurrency
No, it is not sufficient to lock only on Add
.
The fact that it doesn't crash only tells you that it didn't crash during your test.
You cannot guarantee that:
- It won't crash in the future
- It will produce the correct results
A non-threadsafe data structure has no guarantees whatsoever if used in a multithreaded fashion.
You need to either:
- Lock on every call to it
- Use a threadsafe data structure, one that has been built to support this scenario
If you use a different data structure than a hashset, like a dictionary, you may even need to lock multi-statements, because this may still fail:
lock (dLock)
if (d.ContainsKey("test"))
return;
var value = ExpensiveCallToObtainValue();
lock (dLock)
d.Add("test", value);
Between the call to ContainsKey
and the call to Add
another thread may have already inserted that key.
To handle this correctly, without using a threadsafe data structure, is contain both operations inside the same lock:
lock (dLock)
{
if (!d.ContainsKey("test"))
d.Add("test", ExpensiveCallToObtainValue());
}
Using LINQ methods on a custom ConcurrentHashSet class?
Internally the custom class you have posted uses a HashSet<T>
to store the data. So you can still make use of the methods you mentioned, First
and FirstOrDefault
, provided that you would do it in a thread safe way. For instance the implementation of FirstOrDefault
would have been something like this:
public T TryGetFirstOrDefault()
{
_lock.EnterReadLock();
try
{
return _hashSet.FirstOrDefault();
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
Update
You could generalize the above by passing a predicate:
public T TryGetFirstOrDefault(Func<T, bool> predicate)
{
_lock.EnterReadLock();
try
{
return _hashSet.FirstOrDefault(x=>predicate(x));
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
So in case you have a ConcurrentHashSet<Player>
, you could use it for instance as:
var player = concurrentHashSetOfPlayers.TryGetFirstOrDefault(x=>x.Id == playerId);
.Net Collection for atomizing T?
You can take a source code of a HashSet<T>
from Reference Source and write your own GetOrAdd method.
C#: volatile reads and writes of HashSet
If you mean "will Check
always use the most up to date version of the field", then yes, as a side-effect of volatility this will be the case - and swapping the entire reference is much cheaper than constantly synchronizing (.NET ensures you can't have a torn reference so the reference swap is guaranteed to be atomic).
Note: the thread-safety in this scenario is strictly dependent on the fact that the hash-set is not mutated after it has been created and the reference swapped, which is what happens in the code in the question.
You can get the same result more conveniently, though, by declaring the field as volatile
:
public class Checker
{
private volatile HashSet<int> _hs = new HashSet<int>();
public bool Check(int a) => _hs.Contains(a);
public void Update(IEnumerable<int> items) => _hs = new HashSet<int>(items);
}
Could I have problems if I add elements to hashset from a parallel.Foreach?
It's my assumption that switching thread contexts and synchronization is more expensive than HashSet<T>.Add()
call even for large sets added from for loop. However, the fastest way to add new members to collections is always to use HashSet<T>.AddRange
operation and due to that I would optimize code even further by creating new collection if it would be significantly smaller than target collection and than I would be adding it with single call to HashSet<T>.AddRange
.
c# Is HashTable thread safe?
You are correct, HashTable and Dictionary are not thread safe. HashTable can be made (sort of) thread safe using the HashTable.Synchronized function. This creates a wrapper around the HashTable allowing only one writer and multiple readers to access HashTable. But.
The wrapper around the HashTable is not fully thread safe, one could iterate the collection while another thread could concurrently still change the collection resulting in an exception.
Enumerating through a collection is intrinsically not a thread-safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads.
-- https://docs.microsoft.com/en-us/dotnet/api/system.collections.hashtable.synchronized?view=netcore-3.1
Depending on the situation I would pick a collection from the System.Collections.Concurrent namespace. Which is best described here: https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/
Bear in mind though that these collections are not the fastest - if speed is important I would recommend tailoring one to your specific needs.
Related Topics
Webdriverwait Is Not Waiting for the Element I Specify
What Static Analysis Tools Are Available for C#
What Are the True Benefits of Expandoobject
Best Practice to Return Errors in ASP.NET Web API
Do You Have to Put Task.Run in a Method to Make It Async
How to Dynamically Create ASP.NET Controls Within Dynamically Created ASP.NET Controls
Stop Visual Studio Debug Putting Slash in String Containing Double Quotes
How to Write a Scalable Tcp/Ip Based Server
Arrays, Heap and Stack and Value Types
Best Way to Repeat a Character in C#
Round a Double to X Significant Figures
ASP.NET MVC 5 Group of Radio Buttons
How to Create/Edit a Manifest File
Unrecognized Escape Sequence for Path String Containing Backslashes
Calculating Distance Between Two Latitude and Longitude Geocoordinates