Why Is the New Random Library Better Than Std::Rand()

Why is the new random library better than std::rand()?

Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.

Common failings of "bad" - but common enough - rand() implementations are:

  • low randomness of low-order bits;
  • short period;
  • low RAND_MAX;
  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.

Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.


Indeed, the actual problem with rand() is not much of implementation in principle but:

  • backwards compatibility; many current implementations use suboptimal generators, typically with badly chosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);
  • simplistic interface; rand() provides a single generator with the global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:

    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread-local state; this last one has been adopted by several implementations (notably Visual C++);
    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.

Finally, the rand state of affairs:

  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;
  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:

  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);
  • generally of state-of-the-art quality (from when the library was designed; see below);
  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.

Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & Co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).


Finally, to get back to your performance comparison: as others have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.

Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization

int getRandNum_New() {
static std::minstd_rand eng{std::random_device{}()};
static std::uniform_int_distribution<int> dist{0, 5};
return dist(eng);
}

I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()

int getRandNum_New() {
static std::minstd_rand eng{std::random_device{}()};
return eng() % 6;
}

I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.


Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.

Why is the use of rand() considered bad?

There are two parts to this story.

First, rand is a pseudorandom number generator. This means it depends on a seed. For a given seed it will always give the same sequence (assuming the same implementation). This makes it not suitable for certain applications where security is of a great concern. But this is not specific to rand. It's an issue with any pseudo-random generator. And there are most certainly a lot of classes of problems where a pseudo-random generator is acceptable. A true random generator has its own issues (efficiency, implementation, entropy) so for problems that are not security related most often a pseudo-random generator is used.

So you analyzed your problem and you conclude a pseudo-random generator is the solution. And here we arrive to the real troubles with the C random library (which includes rand and srand) that are specific to it and make it obsolete (a.k.a.: the reasons you should never use rand and the C random library).

  • One issue is that it has a global state (set by srand). This makes it impossible to use multiple random engines at the same time. It also greatly complicates multithreaded tasks.

  • The most visible problem of it is that it lacks a distribution engine: rand gives you a number in interval [0 RAND_MAX]. It is uniform in this interval, which means that each number in this interval has the same probability to appear. But most often you need a random number in a specific interval. Let's say [0, 1017]. A commonly (and naive) used formula is rand() % 1018. But the issue with this is that unless RAND_MAX is an exact multiple of 1018 you won't get an uniform distribution.

  • Another issue is the Quality of Implementation of rand. There are other answers here detailing this better than I could, so please read them.

In modern C++ you should definitely use the C++ library from <random> which comes with multiple random well-defined engines and various distributions for integer and floating point types.

What difference between rand() and random() functions?

randomize() and random() are not part of the standard library. Perhaps your teacher wrote functions with these names for use in your class, or maybe you really mean random() and srandom() which are part of POSIX and not available on Windows. rand() and srand() are part of the standard library and will be provided by any standard conforming implementation of C++.


You should avoid rand() and srand() and use the new C++11 <random> library. <random> was added as part of the C++11 standard (and VS2012 does provide it).

Video explaining why: rand() Considered Harmful

  • rand() is sometimes a low quality pRNG and not suitable for applications that need a reasonable level of unpredictability. <random> provides a variety of engines with different characteristics suitable for many different use cases.

  • Converting the results of rand() into a number you can use directly usually relies on code that is difficult to read and easy to get wrong, whereas using <random> distributions is easy and produces readable code.

  • The common methods of generating values in a given distribution using rand() further decrease the quality of the generated data. % generally biases the data and floating point division still produces non-uniform distributions. <random> distributions are higher quality as well as more readable.

  • rand() relies on a hidden global resource. Among other issues this causes rand() to not be thread safe. Some implementations make thread safety guarantees, but this is not required by the standard. Engines provided by <random> encapsulate pRNG state as objects with value semantics, allowing flexible control over the state.

  • srand() only permits a limited range of seeds. Engines in <random> can be initialized using seed sequences which permit the maximum possible seed data. seed_seq also implements a common pRNG warm-up.


example of using <random>:

#include <iostream>
#include <random>

int main() {
// create source of randomness, and initialize it with non-deterministic seed
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 eng{seed};

// a distribution that takes randomness and produces values in specified range
std::uniform_int_distribution<> dist(1,6);

for (int i=0; i<100; ++i) {
std::cout << dist(eng) << '\n';
}
}

std::uniform_real_distribution and rand()

The real comparison is between rand and one of the random number engines provided by the C++11 standard library. std::uniform_real_distribution just distributes the output of an engine according to some parameters (for example, real values between 10 and 20). You could just as well make an engine that uses rand behind the scenes.

Now the difference between the standard library random number engines and using plain old rand is in guarantee and flexibility. rand provides no guarantee for the quality of the random numbers - in fact, many implementations have shortcomings in their distribution and period. If you want some high quality random numbers, rand just won't do. However, the quality of the random number engines is defined by their algorithms. When you use std::mt19937, you know exactly what you're getting from this thoroughly tested and analysed algorithm. Different engines have different qualities that you may prefer (space efficiency, time efficiency, etc.) and are all configurable.

This is not to say you should use rand when you don't care too much. You might as well just start using the random number generation facilities from C++11 right away. There's no downside.

Generate random numbers using C++11 random library

Stephan T. Lavavej(stl) from Microsoft did a talk at Going Native about how to use the new C++11 random functions and why not to use rand(). In it, he included a slide that basically solves your question. I've copied the code from that slide below.

You can see his full talk here:

#include <random>
#include <iostream>

int main() {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(1.0, 10.0);

for (int i=0; i<16; ++i)
std::cout << dist(mt) << "\n";
}

We use random_device once to seed the random number generator named mt. random_device() is slower than mt19937, but it does not need to be seeded because it requests random data from your operating system (which will source from various locations, like RdRand for example).


Looking at this question / answer, it appears that uniform_real_distribution returns a number in the range [a, b), where you want [a, b]. To do that, our uniform_real_distibution should actually look like:

std::uniform_real_distribution<double> dist(1, std::nextafter(10, DBL_MAX));

Safe random number generation using rand() or std::random_device in C++

Random number generator engines are not thread safe.

Using thread_local to initialize one engine per thread makes it thread safe, so I would say your solution is good.

If you have doubts about the thread safety of random_device, maybe make it thread_local also:

int randomInRange(const int min, const int max)
{
thread_local std::random_device rd;
thread_local std::mt19937 rng(rd());
thread_local std::uniform_int_distribution<int> uid;
return uid(rng, decltype(uid)::param_type{min,max});
}

There are more threads that talk about this that weren't mentioned:

C++ thread-safe uniform distribution random number generation

C++11 Thread safety of Random number generators

I'm sure that there are more.

Equivalent of srand() and rand() using post-C++11 std library

You can't even assure that you get the same sequence if you use rand() on another compiler. And no, you can't get random to produce the same sequence as whoever's rand() it was you were using. (Thank goodness. rand() is notorious for being one of the worst pseudo-random number generators of all time.)

It is possible for you to restore the state of rand(), simply by using srand() to set the initial state and counting how many times you called rand(). You can later repeat that to bring rand() back to that same state.

But don't use rand()!

Is random number generation in multiple threads using C++11 random library slow just like using rand() in multiple threads?

Yes and no. Most of the C++11 random number generators are objects that encapsulate their own state, so as long as you create a separate generator object for each thread, each should be able to operate independently of the others (so you don't need any locking).

The specific case of std::random_device is a little different though: this is intended (but not guaranteed) to obtain truly non-deterministic data from some sort of random number generation hardware. The driver for that device may well impose locking requirements of its own -- and it's often fairly low bandwidth as well, so if you want to obtain a lot of numbers quickly, it's usually a poor choice.

In a typical case, you want to use one generator (other than std::random_device) per thread, and use std::random_device only to provide the initial seeds for those other generators. This may impose locks during that initialization, but then allows each thread to generate its random numbers without any interlock with other threads.



Related Topics



Leave a reply



Submit