Concurrent Hashset<T> in .Net Framework

Concurrent HashSet T in .NET Framework?

Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.

ConcurrentDictionary (recommended)

This first one is to use the class ConcurrentDictionary<TKey, TValue> in the namespace System.Collections.Concurrent. In the case, the value is pointless, so we can use a simple byte (1 byte in memory).

private ConcurrentDictionary<string, byte> _data;

This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T> except key and value are different objects.

Source: Social MSDN

ConcurrentBag

If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T> in the same namespace of the previous class.

private ConcurrentBag<string> _data;

Self-implementation

Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net

The only drawback of this solution is that the type HashSet<T> doesn't officially concurrent access, even for reading operations.

I quote the code of the linked post (originally written by Ben Mosher).

using System;
using System.Collections.Generic;
using System.Threading;

namespace BlahBlah.Utilities
{
public class ConcurrentHashSet<T> : IDisposable
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();

#region Implementation of ICollection<T> ...ish
public bool Add(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public void Clear()
{
_lock.EnterWriteLock();
try
{
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public bool Contains(T item)
{
_lock.EnterReadLock();
try
{
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}

public bool Remove(T item)
{
_lock.EnterWriteLock();
try
{
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public int Count
{
get
{
_lock.EnterReadLock();
try
{
return _hashSet.Count;
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
}
#endregion

#region Dispose
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing)
if (_lock != null)
_lock.Dispose();
}
~ConcurrentHashSet()
{
Dispose(false);
}
#endregion
}
}

EDIT: Move the entrance lock methods ouside the try blocks, as they could throw an exception and execute the instructions contained in the finally blocks.

How to implement ConcurrentHashSet in .Net

I just ran into a similar scenario ("I am interested in a fast Add and Contains and Remove") and implemented this sucker:

using System.Collections.Generic;
using System.Threading;

namespace BlahBlah.Utilities
{
public class ConcurrentHashSet<T> : IDisposable
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private readonly HashSet<T> _hashSet = new HashSet<T>();

#region Implementation of ICollection<T> ...ish
public bool Add(T item)
{
try
{
_lock.EnterWriteLock();
return _hashSet.Add(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public void Clear()
{
try
{
_lock.EnterWriteLock();
_hashSet.Clear();
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public bool Contains(T item)
{
try
{
_lock.EnterReadLock();
return _hashSet.Contains(item);
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}

public bool Remove(T item)
{
try
{
_lock.EnterWriteLock();
return _hashSet.Remove(item);
}
finally
{
if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
}
}

public int Count
{
get
{
try
{
_lock.EnterReadLock();
return _hashSet.Count;
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}
}
#endregion

#region Dispose
public void Dispose()
{
if (_lock != null) _lock.Dispose();
}
#endregion
}
}

Haven't really tested it (performance- or reliability-wise). YMMV.

Is Contains thread safe in HashSet T

Normally (normally) collections that are used only for reading are "unofficially" thread safe (there is no collection in .NET that I know that modifies itself during reading). There are some caveats:

  • The items themselves could not be thread safe (but with an HashSet<T> this problem should be minimized, because you can't extract items from it. Still the GetHashCode() and the Equals() must be thread-safe. If, for example, they access lazy objects that are loaded on-demand, they could be not-thread safe, or perhaps they cache/memoize some data to speed-up subsequent operations)
  • You must be sure that after the last write there is a Thread.MemoryBarrier() (done in the same thread as the write) or equivalent, otherwise a read on another thread could read incomplete data
  • You must be sure that in each thread (different from the one where you did a write), before doing the first read there is a Thread.MemoryBarrier(). Note that if the HashSet<T> was "prepared" (with the Thread.MemoryBarrier() at the end) before creating/starting the other threads, then the Thread.MemoryBarrier() isn't necessary, because the threads can't have a stale read of the memory (because they didn't exist). Various operations cause an implicit Thread.MemoryBarrier(). For example if the threads where created before the HashSet<T> was filled, entered a Wait() and were un-Waited after the HashSet<T> was filled (plus its Thread.MemoryBarrier()), exiting a Wait() causes an implicit Thread.MemoryBarrier()

A simple example of a class that uses memoization/lazy loading/whatever you want to call it and in that way can break the thread safety.

public class MyClass
{
private long value2;

public int Value1 { get; set; }

// Value2 is lazily loaded in a very primitive
// way (note that Lazy<T> *can* be used thread-safely!)
public long Value2
{
get
{
if (value2 == 0)
{
// value2 is a long. If the .NET is running at 32 bits,
// the assignment of a long (64 bits) isn't atomic :)
value2 = LoadFromServer();

// If thread1 checks and see value2 == 0 and loads it,
// and then begin writing value2 = (value), but after
// writing the first 32 bits of value2 we have that
// thread2 reads value2, then thread2 will read an
// "incomplete" data. If this "incomplete" data is == 0
// then a second LoadFromServer() will be done. If the
// operation was repeatable then there won't be any
// problem (other than time wasted). But if the
// operation isn't repeatable, or if the incomplete
// data that is read is != 0, then there will be a
// problem (for example an exception if the operation
// wasn't repeatable, or different data if the operation
// wasn't deterministic, or incomplete data if the read
// was != 0)
}

return value2;
}
}

private long LoadFromServer()
{
// This is a slow operation that justifies a lazy property
return 1;
}

public override int GetHashCode()
{
// The GetHashCode doesn't use Value2, because it
// wants to be fast
return Value1;
}

public override bool Equals(object obj)
{
MyClass obj2 = obj as MyClass;

if (obj2 == null)
{
return false;
}

// The equality operator uses Value2, because it
// wants to be correct.
// Note that probably the HashSet<T> doesn't need to
// use the Equals method on Add, if there are no
// other objects with the same GetHashCode
// (and surely, if the HashSet is empty and you Add a
// single object, that object won't be compared with
// anything, because there isn't anything to compare
// it with! :-) )

// Clearly the Equals is used by the Contains method
// of the HashSet
return Value1 == obj2.Value1 && Value2 == obj2.Value2;
}
}

Locking HashSet for concurrency

No, it is not sufficient to lock only on Add.

The fact that it doesn't crash only tells you that it didn't crash during your test.

You cannot guarantee that:

  • It won't crash in the future
  • It will produce the correct results

A non-threadsafe data structure has no guarantees whatsoever if used in a multithreaded fashion.

You need to either:

  • Lock on every call to it
  • Use a threadsafe data structure, one that has been built to support this scenario

If you use a different data structure than a hashset, like a dictionary, you may even need to lock multi-statements, because this may still fail:

lock (dLock)
if (d.ContainsKey("test"))
return;

var value = ExpensiveCallToObtainValue();
lock (dLock)
d.Add("test", value);

Between the call to ContainsKey and the call to Add another thread may have already inserted that key.

To handle this correctly, without using a threadsafe data structure, is contain both operations inside the same lock:

lock (dLock)
{
if (!d.ContainsKey("test"))
d.Add("test", ExpensiveCallToObtainValue());
}

Using LINQ methods on a custom ConcurrentHashSet class?

Internally the custom class you have posted uses a HashSet<T> to store the data. So you can still make use of the methods you mentioned, First and FirstOrDefault, provided that you would do it in a thread safe way. For instance the implementation of FirstOrDefault would have been something like this:

public T TryGetFirstOrDefault()
{
_lock.EnterReadLock();

try
{
return _hashSet.FirstOrDefault();
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}

Update

You could generalize the above by passing a predicate:

public T TryGetFirstOrDefault(Func<T, bool> predicate)
{
_lock.EnterReadLock();

try
{
return _hashSet.FirstOrDefault(x=>predicate(x));
}
finally
{
if (_lock.IsReadLockHeld) _lock.ExitReadLock();
}
}

So in case you have a ConcurrentHashSet<Player>, you could use it for instance as:

var player = concurrentHashSetOfPlayers.TryGetFirstOrDefault(x=>x.Id == playerId);

.Net Collection for atomizing T?

You can take a source code of a HashSet<T> from Reference Source and write your own GetOrAdd method.

C#: volatile reads and writes of HashSet

If you mean "will Check always use the most up to date version of the field", then yes, as a side-effect of volatility this will be the case - and swapping the entire reference is much cheaper than constantly synchronizing (.NET ensures you can't have a torn reference so the reference swap is guaranteed to be atomic).

Note: the thread-safety in this scenario is strictly dependent on the fact that the hash-set is not mutated after it has been created and the reference swapped, which is what happens in the code in the question.

You can get the same result more conveniently, though, by declaring the field as volatile:

public class Checker
{
private volatile HashSet<int> _hs = new HashSet<int>();

public bool Check(int a) => _hs.Contains(a);

public void Update(IEnumerable<int> items) => _hs = new HashSet<int>(items);
}

Could I have problems if I add elements to hashset from a parallel.Foreach?

It's my assumption that switching thread contexts and synchronization is more expensive than HashSet<T>.Add() call even for large sets added from for loop. However, the fastest way to add new members to collections is always to use HashSet<T>.AddRange operation and due to that I would optimize code even further by creating new collection if it would be significantly smaller than target collection and than I would be adding it with single call to HashSet<T>.AddRange.

c# Is HashTable thread safe?

You are correct, HashTable and Dictionary are not thread safe. HashTable can be made (sort of) thread safe using the HashTable.Synchronized function. This creates a wrapper around the HashTable allowing only one writer and multiple readers to access HashTable. But.

The wrapper around the HashTable is not fully thread safe, one could iterate the collection while another thread could concurrently still change the collection resulting in an exception.

Enumerating through a collection is intrinsically not a thread-safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads.

-- https://docs.microsoft.com/en-us/dotnet/api/system.collections.hashtable.synchronized?view=netcore-3.1

Depending on the situation I would pick a collection from the System.Collections.Concurrent namespace. Which is best described here: https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/

Bear in mind though that these collections are not the fastest - if speed is important I would recommend tailoring one to your specific needs.



Related Topics



Leave a reply



Submit