Unordered_Map Thread Safety

unordered_map thread safety

STL containers are designed so that you are guaranteed to be able to have:

A. Multiple threads reading at the same time

B. One thread writing at the same time

Having multiple threads writing is not one of the above conditions and is not allowed. Multiple threads writing will thus create a data race, which is undefined behavior.

You could use a mutex to fix this. A shared_mutex (combined with shared_locks) would be especially useful as that type of mutex allows multiple concurrent readers.

http://eel.is/c++draft/res.on.data.races#3 is the part of the standard which guarantees the ability to concurrently use const functions on different threads. http://eel.is/c++draft/container.requirements.dataraces specifies some additional non-const operations which are safe on different threads.

Multithreading unordered_map

Here is a utility class:

template<class T, class M=std::mutex, template<class...>class S=std::unique_lock, template<class...>class U=std::unique_lock>
struct mutex_protected {
  template<class F>
  auto read( F&& f ) const
  -> typename std::result_of<F&&(T const&)>::type
  {
    auto l = lock();
    return std::forward<F>(f)(data);
  }
  template<class F>
  auto write( F&& f ) 
  -> typename std::result_of<F&&(T&)>::type
  {
    auto l = lock();
    return std::forward<F>(f)(data);
  }
  mutex_protected(mutex_protected&&)=delete;
  mutex_protected& operator=(mutex_protected&&)=delete;

  template<class...Args>
  mutex_protected( Args&&...args ):
    data( std::forward<Args>(args)... )
  {}
private:
  mutable M m;
  T data;

  U<M> lock() { return U<M>(m); }
  S<M> lock() const { return S<M>(m); }
};

it, especially in c++14, lets you interact with a mutex protected instance of data in an easy to write way.

In c++14 you can use std::shared_timed_mutex and in c++17 you can use std::shared_mutex like this:

template<class T>
using rw_guarded = mutex_guarded< T, std::shared_mutex, std::shared_lock >;

this enables the ability to have many readers at once. But you should first determine if the simple mutex one is fast enough.

struct cache {
  using Key=std::string;
  using Value=int;
  using Map=std::unordered_map< Key, Value >;
  Value get( Key const& k ) {
    Value* r = table.read([&](Map const& m)->Value*{
      auto it = m.find(k);
      if (it == m.end()) return nullptr;
      return std::addressof( it->second );
    });
    if (r) return *r;
    return table.write([&](Map& m)->Value{
      auto it = m.find(k);
      if (it != m.end()) return it->second;
      auto r = m.insert( std::make_pair(k, 42) ); // construct data here
      return r.first->second;
    });
  }
private:
  mutex_guarded< std::unordered_map< Key, Value > > table;
};

Upgrade mutex_guarded to rw_guarded and it switches to reader-writer locks.

Here is a more complex version:

Have two maps; one to value, one to shared future of value.

Use a reader writer lock (aka shared mutex).

To get, get a shared lock. Check if it is there. If it is, return.

Unlock first map. Lock second map for writing. If there isn't already a shared future under the key, add one. Unlock map 2, and wait on shared future regardless of if you added it.

When done, lock first map for reading; check if result already there. If yes, return it. If not, unlock, relock for writing, move data into map 1 if not already there, return data in first map.

This is designed to minimize the period map 1 is locked for exclusively, allowing max concurrency there.

Other designs will optimize other considerations.

Do not use operator[]. Do not interact with any map without a lock of some kind active. Know which locks correspond to which map. Note that reading elements (not looking up) can be done without a lock in some cases. Sometimes reading copies of shared things is required, not the shared thing. Look up the docs of every type to determine what operations need what locks.

is unordered_set thread safe if data race is irrelevant?

All standard containers are safe to read from concurrently and you can concurrently modify different elements of the same container, but not the same element:
https://en.cppreference.com/w/cpp/container#Thread_safety

You need to synchronize writes to the container if there is possibility of overlap to avoid undefined behavior (e.g. using std::mutex). (the answer to your second question is yes, but mySet might be corrupted due to concurrent write so you cannot assume it will have the inserted values)

is local unordered_map variable thread safe in c++

Each invocation of a function has its own local variables. You can safely call this function from several threads.

C++ stl unordered_map, thread safety where each thread accesses only it's own assigned key and may edit that value

The answer to this question can be found in [res.on.data.races]/3:

A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non- const arguments, including this.

Furthermore, [container.requirements.dataraces]/1 states:

For purposes of avoiding data races ([res.on.data.races]), implementations shall consider the following functions to be const: begin, end, rbegin, rend, front, back, data, find, lower_bound, upper_bound, equal_range, at and, except in associative or unordered associative containers, operator[].

Since unordered_map::operator[] is non-const, it is legal for an implementation to modify the unordered_map when a call to operator[] occurs. You should instead use unordered_map::find, which is explicitly required to be treated as const, and hence will not modify the unordered_map:

map.find(key)->second = new vector<MyClass*>;

(As a side note, your suggested design looks like a recipe for memory leaks. Why not make it be a unordered_map<int, std::unique_ptr<vector<MyClass*>>>, or unordered_map<int,vector<MyClass*>>?)

How to use `std::unordered_map` in a thread-safe way?

Now, because operator[] returns a reference, and they both will act on the same reference, I think that this could still be a data race. Is this right?

This is right assuming T isn't a thread-safe type.

This isn't a data race in the operation of the container, but a data race in the use of the object of type T.

I have this question. Would it be correct to use at() on the threads

If you only call the const qualified at, then you never get a non-const reference and thus cannot modify the value of any element. If you don't modify the values, then you don't have a data race.

If you use the non-const qualified at, then there is no difference in regards to the described data race.

P.S. You'll find that you wouldn't be able to have any synchronised non-const member functions because locking a mutex requires modification. You can get around that using mutable.

Thread safety of boost::unordered_map int, struct and shared_mutex

The demuxer looks correct to me. There are a few inefficiencies though:

You don't need to count before you erase. Just erase. If an element is not present, this will do nothing. That saves you one lookup. Likewise, don't use count followed by at. Use find (see below for the use).
You may want to move as much work as possible out of the critical section. Foe example in onUserConnected you could create the TsStreams object before acquiring the lock.
Note that changing an unordered map will never invalidate pointers or references to elements in the map unless they are erased. That means in onUserData you don't have to hold the lock on the map while parsing the packet.

That is, assuming you don't call onUserData for the same user from two different threads. You could prevent this by introducing a second lock the TsStream object. Likewise, you should guard against erasing the element while another thread may still parse the last packet. I would use a shared_ptr for this. Something like this:

class Demuxer {
...
    boost::unordered_map<int, boost::shared_ptr<TsStreams> > usersData;
    boost::shared_mutex mtx_;
};
void Demuxer::onUserData(Data data) {
    boost::shared_lock<boost::shared_mutex> maplock(mtx_);
    auto found = usersData.find(data.socket);
    if(found == usersData.end())
        return;
    boost::shared_ptr<TsStreams> stream = found->second;
    boost::unique_lock<boost::recursive_mutex> datalock(stream->mtx_);
    maplock.unlock();
    tsDemuxer.parsePacket(data.socket, *stream, (uint8_t *) data.buffer, data.length);
}

If you reduce the time the Demuxer lock is taken with this approach, you should probably replace that shared mutex with a normal one. shared mutexes have much higher overhead and are not worth it for such short critical sections.

The TsDemuxer looks a bit wonky:

In TsDemuxer::parsePacket you never unlock the mutex. Shouldn't that be a unique_lock? Likewise, in parseStream the unlock seems unpaired. In general, using a unique_lock object is always the way to go compared to manual locking and unlocking. If anything, lock and unlock the unique_lock, not the mutex.

Remarks unrelated to multithreading

stream.buffer.clear() is more efficient thanstream.buffer = vector<char>() because this will reuse the buffer memory instead of deallocating it completely.
As others have noted, these parts of boost are now part of the standard library. Replace boost:: with std:: and enable a recent C++ standard like C++14 or 17 and you are fine. At worst you have to replace shared_mutex with shared_timed_mutex.
In Demuxer, you pass the User and Data objects by value. Are you sure those shouldn't be const references?

Is write-only to a shared std::unordered_map thread safe?

No, it's not thread safe. The std::unordered_map class doesn't provide any special thread-safety guarantees, just the same ordinary level of thread-safety that all standard classes provide by default. That means that it is not safe for one thread to access the structure in any way while another thread is, or might be, modifying it.

Unordered_Map Thread Safety