C++11 Thread Safety of Random Number Generators

C++11 Thread safety of Random number generators

The C++11 standard library is broadly thread safe. The thread safety guarantees on PRNG objects are the same as on containers. More specifically, since the PRNG classes are all pseudo-random, i.e. they generate a deterministic sequence based on a definite current state, there is really no room to be peeking or poking at anything outside the contained state (which is also visible to the user).

Just as containers need locks to make them safe to share, you would have to lock the PRNG object. This would make it slow and nondeterministic. One object per thread would be better.

§17.6.5.9 [res.on.data.races]:

1 This section specifies requirements that implementations shall meet
to prevent data races (1.10). Every standard library function shall
meet each requirement unless otherwise specified. Implementations may
prevent data races in cases other than those specified below.

2 A C++ standard library function shall not directly or indirectly
access objects (1.10) accessible by threads other than the current
thread unless the objects are accessed directly or indirectly via the
function’s argu- ments, including this.

3 A C++ standard library function shall not directly or indirectly
modify objects (1.10) accessible by threads other than the current
thread unless the objects are accessed directly or indirectly via the
function’s non- const arguments, including this.

4 [ Note: This means, for example, that implementations can’t use a
static object for internal purposes without synchronization because it
could cause a data race even in programs that do not explicitly share
objects betweenthreads. —endnote]

5 A C++ standard library function shall not access objects indirectly
accessible via its arguments or via elements of its container
arguments except by invoking functions required by its specification
on those container elements.

6 Operations on iterators obtained by calling a standard library
container or string member function may access the underlying
container, but shall not modify it. [Note: In particular, container
operations that invalidate iterators conflict with operations on
iterators associated with that container. — end note ]

7 Implementations may share their own internal objects between threads
if the objects are not visible to users and are protected against data
races.

8 Unless otherwise specified, C++ standard library functions shall
perform all operations solely within the current thread if those
operations have effects that are visible (1.10) to users.

9 [ Note: This allows implementations to parallelize operations if
there are no visible side effects. — end note ]

How do I generate thread-safe uniform random numbers?

Have you tried this?

int intRand(const int & min, const int & max) {
    static thread_local std::mt19937 generator;
    std::uniform_int_distribution<int> distribution(min,max);
    return distribution(generator);
}

Distributions are extremely cheap (they will be completely inlined by the optimiser so that the only remaining overhead is the actual random number rescaling). Don’t be afraid to regenerate them as often as you need – in fact, resetting them would conceptually be no cheaper (which is why that operation doesn’t exist).

The actual random number generator, on the other hand, is a heavy-weight object carrying a lot of state and requiring quite some time to be constructed, so that should only be initialised once per thread (or even across threads, but then you’d need to synchronise access which is more costly in the long run).

c++ generate thread safe random numbers

I would probably generate the seeds in main, and pass a seed to each thread function. I wouldn't use the output of std::random_device directly either--I'd put numbers into something like an std::set or std::unordered_set until I got as many seeds as I wanted, to assure that I didn't give two threads the same seed (which would obviously be a waste of time).

Something along this general line:

int do_work(unsigned long long seed) {

  //Get random number generators
  typedef std::mt19937 MyRNG;

  //seed generator
  MyRNG rng(seed);

  //make my uniform distributions for each parameter
  std::uniform_real_distribution<> param1(-1,1);
  std::uniform_real_distribution<> param2(-1,1);

  double x,y;
  //Do my scan
  for (int i = 0; i < N; i++) {
    x = param1(rng);
    y = param2(rng);

   //Do things with x and y*
  }
}

static const int num_threads = 4;

int main() {
    std::set<unsigned long long> seeds;

    while (seeds.size() < num_threads)
       seeds.insert(std::random_device()());

    std::vector<std::thread> threads;

   for (auto const seed: seeds)
       threads.emplace_back(std::thread(do_work, seed));

    for (auto &t : threads)
        t.join();
}

As an aside, using a single result from random_device to seed an std::mt19937 restricts the generator quite a bit--you're giving it only 32 (or possibly 64) bits of seed, but it actually has 19937 bits of seed material. std::seed_seq attempts to ameliorate this to at least some degree (among other things, you can use a number of outputs from std::random_device to create the seed.

Oh, and given that your two instances of uniform_real_distribution use the same parameters, there's probably not a whole lot of need for two separate distribution objects either.

Why isn't this random number generator thread-safe?

There are three observations about your test output I would make:

It has much stronger variance than a good random source's average should provide. You observed this yourself by comparing to the single thread results.
The calculated average decreases with thread count and never reaches the original 0.5 (i.e. it's not just higher variance but also changed mean).
There is a temporal relation in the data, particularly visible in the 4 thread case.

All this is explained by the race condition present in your code: You assign to SUM from multiple threads. Incrementing a double is not an atomic operation (even on x86, where you'll probably get atomic reads and writes on registers). Two threads may read the current value (e.g. 10), increment it (e.g. both add 0.5) and then write the value back to memory. Now you have two threads writing a 10.5 instead of the correct 11.

The more threads try to write to SUM concurrently (without synchronization), the more of their changes are lost. This explains all observations:

How hard the threads race each other in each individual run determines how many results are lost, and this can vary from run to run.
The average is lower with more races (for example more threads) because more results are lost. You can never exceeed the statistical 0.5 average because you only ever lose writes.
As the threads and scheduler "settle in", the variance is reduced. This is a similar reason to why you should "warm up" your tests when benchmarking.

Needless to say, this is undefined behavior. It just shows benign behavior on x86 CPUs, but this is not something the C++ standard guarantees you. For all you know, the individual bytes of a double might be written to by different threads at the same time resulting in complete garbage.

The proper solution would be adding the doubles thread-locally and then (with synchronization) adding them together in the end. OMP has reduction clauses for this specific purpose.

For integral types, you could use std::atomic<IntegralType>::fetch_add(). std::atomic<double> exists but (before C++20) the mentioned function (and others) are only available for integral types.

Is random number generation in multiple threads using C++11 random library slow just like using rand() in multiple threads?

Yes and no. Most of the C++11 random number generators are objects that encapsulate their own state, so as long as you create a separate generator object for each thread, each should be able to operate independently of the others (so you don't need any locking).

The specific case of std::random_device is a little different though: this is intended (but not guaranteed) to obtain truly non-deterministic data from some sort of random number generation hardware. The driver for that device may well impose locking requirements of its own -- and it's often fairly low bandwidth as well, so if you want to obtain a lot of numbers quickly, it's usually a poor choice.

In a typical case, you want to use one generator (other than std::random_device) per thread, and use std::random_device only to provide the initial seeds for those other generators. This may impose locks during that initialization, but then allows each thread to generate its random numbers without any interlock with other threads.

C++ thread-safe uniform distribution random number generation

Based on the already mentioned solution, here is a version adapted to your specific needs:

double doubleRand(double min, double max) {
    thread_local std::mt19937 generator(std::random_device{}());
    std::uniform_real_distribution<double> distribution(min, max);
    return distribution(generator);
}

Safe random number generation using rand() or std::random_device in C++

Random number generator engines are not thread safe.

Using thread_local to initialize one engine per thread makes it thread safe, so I would say your solution is good.

If you have doubts about the thread safety of random_device, maybe make it thread_local also:

int randomInRange(const int min, const int max)
{
    thread_local std::random_device rd;
    thread_local std::mt19937 rng(rd());  
    thread_local std::uniform_int_distribution<int> uid;
    return uid(rng, decltype(uid)::param_type{min,max});
}

There are more threads that talk about this that weren't mentioned:

C++ thread-safe uniform distribution random number generation

C++11 Thread safety of Random number generators

I'm sure that there are more.

Thread-safe using of random

You're never meant to use a PRNG for long enough that you can see a perfectly uniform distribution from it. You should only see a random approximation of a uniform distribution. The fact that giving every thread its own PRNG means it would take numThreads times longer for every thread to see that perfect distribution does not matter.

PRNGs are subject to mathematical analysis to confirm that they produce each possible output the same number of times, but this does not reflect how they're meant to be used.

If they were used this way it would represent a weakness. If you watch it for long enough you know that having seen output x n times, and every other output n+1 times, the next output must be x. This is uniform but it is obviously not random.

A true RNG will never produce a perfectly uniform output, but it should also never show the same bias twice. A PRNG known to have non-uniform output will have the same non-uniform output every time it's used. This means that if you run simulations for longer to average out the noise, the PRNG's own bias will eventually become the most statistically significant factor. So an ideal PRNG should eventually emit a perfectly uniform distribution (actually often not perfect, but within a known, very small margin) over a sufficiently long period.

Your random seed will pick a random point somewhere in that sequence and your random number requests will proceed some way along that sequence, but if you ever find yourself going all the way around and coming back to your own start point (or even within 1/10th that distance) then you need to get a bigger PRNG.

That said, there is another consideration. Often PRNGs are used specifically so that their results will be predictable.

If you're in a situation where you need a mutex to safely access a single PRNG, then you've probably already lost your predictable behaviour because you can't tell which thread got the next [predictable] random number. Each thread still sees an unpredictable sequence because you can't be sure which subset of PRNG results it got.

If every thread has its own PRNG, then (so long as other ordering constraints are met as necessary) you can still get predictable results.

Thread safe random numbers

boost::random has a number of generators which are objects. The
simplest solution would be to simply use a distinct generator for each
thread.

C++11 Thread Safety of Random Number Generators