Why Not Just Use Random_Device

Why not just use random_device?

Also, why not just:

std::mt19937 e{std::random_device{}()};

It might be fine if you only will do this once, but if you will do it many times, it's better to keep track of your std::random_device and not create / destroy it unnecessarily.

It may be helpful to look at the libc++ source code for implementation of std::random_device, which is quite simple. It's just a thin wrapper over std::fopen("/dev/urandom"). So each time you create a std::random_device you are getting another filesystem handle, and pay all associated costs.

On windows, as I understand, std::random_device represents some call to a microsoft crypto API, so you are going to be initializing and destroying some crypto library interface everytime you do this.

It depends on your application, but for general purposes I wouldn't think of this overhead as always negligible. Sometimes it is, and then this is great.

I guess this ties back into your first question:

Instead, this is what I found on most examples/sites/articles:

 std::random_device rd;
std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
std::uniform_int_distribution<int> dist{1, 5};

At least the way I think about it is:

  • std::mt19937 is a very simple and reliable random generator. It is self-contained and will live entirely in your process, not calling out to the OS or anything else. The implementation is mandated by the standard, and at least in boost, it used the same code everywhere, derived from the original mt19937 paper. This code is very stable and it's cross-platform. You can be pretty confident that initializing it, querying from it, etc. is going to compile to similar code on any platform that you compile it on, and that you will get similar performance.

  • std::random_device by contrast is pretty opaque. You don't really know exactly what it is, what it's going to do, or how efficient it will be. You don't even know if it can actually be acquired -- it might throw an exception when you attempt to create it. You know that it doesn't require a seed. You're not usually supposed to pull tons and tons of data from it, just use it to generate seeds. Sometimes, it acts as a nice interface to cryptographic APIs, but it's not actually required to do that and sadly sometimes it doesn't. It might correspond to /dev/random on unix, it might correspond to /dev/urandom/. It might correspond to some MSVC crypto API (visual studio), or it might just be a fixed constant (mingw). If you cross-compile for some phone, who knows what it will do. (And even when you do get /dev/random, you still have the problem that performance may not be consistent -- it may appear to work great, until the entropy pool runs out, and then it runs slow as a dog.)

The way I think about it is, std::random_device is supposed to be like an improved version of seeding with time(NULL) -- that's a low bar, because time(NULL) is a pretty crappy seed all things considered. I usually use it where I would have used time(NULL) to generate a seed, back in the day. I don't really consider it all that useful outside of that.

Why is random device creation expensive?

The short answer is: It depends on the system and library implementation

Types of standard libraries

  • In a fairytale world where you have the most shitty standard library implementation imaginable, random_device is nothing but a superfluous wrapper around std::rand(). I'm not aware of such an implementation but someone may correct me on this

  • On a bare-metal system, e.g. an embedded microcontroller, random_device may interact directly with a hardware random number generator or a corresponding CPU feature. That may or may not require an expensive setup, e.g. to configure the hardware, open communication channels, or discard the first N samples

  • Most likely you are on a hosted platform, meaning a modern operating system with a hardware abstraction level. Let's consider this case for the rest of this post

Types of random_device

Your system may have a real hardware random number generator, for example the TPM module can act as one. See How does the Trusted Platform Module generate its true random numbers? Any access to this hardware has to go through the operating system, for example on Windows this would likely be a Cryptographic Service Provider (CSP).

Or your CPU may have some built in, such as Intel's rdrand and rdseed instruction. In this case a random_device that maps directly to these just has to discover that they are available and check that they are operational. rdrand for example can detect hardware failure at which point the implementation may provide a fallback. See What are the exhaustion characteristics of RDRAND on Ivy Bridge?

However, since these features may not be available, operating systems generally provide an entropy pool to generate random numbers. If these hardware features are available, your OS may use them to feed this pool or provide a fallback once the pool is exhausted. Your standard library will most likely just access this pool through an OS-specific API.

That is what random_device is on all mainstream library implementations at the moment: an access point to the OS facilities for random number generation. So what is the setup overhead of these?

System APIs

  • A traditional POSIX (UNIX) operating system provides random numbers through the pseudo-devices /dev/random and /dev/urandom. So the setup cost is the same as opening and closing this file. I assume this is what your book refers to

  • Since this API has some downsides, new APIs have popped up, such as Linux's getrandom. This one would not have any setup cost but it may fail if the kernel does not support it, at which point a good library may try /dev/urandom again

  • Windows libraries likely go through its crypto API. So either the old CSP API CryptGenRandom or the new BCryptGenRandom. Both require a handle to a service or algorithm provider. So this may be similar to the /dev/urandom approach

Consequences

In all these cases you will need at least one system call to access the RNG and these are significantly slower than normal function calls. See System calls overhead And even the rdrand instruction is around 150 cycles per instruction. See What is the latency and throughput of the RDRAND instruction on Ivy Bridge? Or worse, see RDRAND and RDSEED intrinsics on various compilers?

A library (or user) may be tempted to reduce the number of system calls by buffering a larger number random bytes, e.g. with buffered file I/O. This again would make opening and closing the random_device unwise, assuming this discards the buffer.

Additionally, the OS entropy pool has a limited size and can be exhausted, potentially causing the whole system to suffer (either by working with sub-par random numbers or by blocking until entropy is available again). This and the slow performance mean that you should not usually feed the random_device directly into a uniform_int_distribution or something like this. Instead use it to initialize a pseudo random number generator.

Of course this has exceptions. If your program needs just 64 random bytes over the course of its entire runtime, it would be foolish to draw the 2.5 kiB random state to initialize a mersenne twister, for example. Or if you need the best possible entropy, e.g. to generate cryptographic keys, then by all means, use it (or better yet, use a library for this; never reinvent crypto!)

Use of random in C++

The point of the random generator is to hold the state of the algorithm, in order to produce repeatable pseudorandom sequences of numbers based on a specific random seed.

The point of the random device is to provide a random seed for the random generator.

If you try seeding a new generator for every random value, you are no longer exercising the randomness provided by the random generator's algorithm. Instead, you are biasing the generator to rely on the randomness of the random device itself.

For this reason, examples #3 and #4 are not advisable.

The correct way to generate a random sequence is example #1:

std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 19);

for (int i = 0; i < 10000; ++i) {
int foo = dis(gen);
}

Example #2 is also correct, but it's kinda pointless to construct the uniform_int_distribution inside the loop. Of course, with compiler optimizations, it doesn't really hurt and there may be times where it's preferable to keep the distribution near where it used, for the sake of clarity.

If I need to generate random values for multiple stuffs unrelated, must I need to create differents generators or can I keep the same ?

You are welcome to use multiple generators for unrelated random sequences if you want to -- that is actually one of their major draw-cards. You retain the randomness guarantees of the pseudorandom algorithm for a particular sequence if its generator is not used when generating other sequences (most notably when extraction of numbers from the sequences are interleaved).

This is also useful for reproducibility: For example, when you actually have a specific seed value (instead of pulling it from a random device), using that seed for one particular sequence gives repeatable results, regardless of any other sequences being used at the same time.

One other major benefit is that by using separate generators you get the same thread-safety guarantees that apply to other objects. In other words, that means if you want to generate multiple pseudorandom sequences concurrently, you can do so without locks provided each thread operates on a separate generator.

Should I use a random engine seeded from std::random_device or use std::random_device every time

The standard practice, as far as I am aware, is to seed the random number generator with a number that is not calculated by the computer (but comes from some external, unpredictable source). That should be the case with your rd() function. From then on, you call the pseudo-random number generator(PRNG) for each and every pseudo-random number that you need.

If you are worried about the numbers not being random enough, then you should pick a different PRNG. Entropy is a scarce and precious resource and should be treated as such. Although, you may not be needing that many random numbers right now, you may in the future; or other applications could need them. You want that entropy to be available whenever an application asks for it.

It sounds like, for your application, that the mersenne twister will suit your needs just fine. No one who plays your game will ever feel like the teams that are loaded aren't random.

Generating random numbers in C++ using std::random_device

std::random_device overloads the parenthesis operator to give it function-like syntax. It more-or-less looks like this:

class random_device {
/*...*/
public:
uint32_t operator()() {
return /*...*/;
}
};

It can then be invoked on an object

std::random_device device;
uint32_t value = device();

or on a temporary (which is still technically an object)

uint32_t value = std::random_device{}();
uint32_t value_2 = std::random_device()(); //Equivalent syntax

The operator() overload can also be overloaded in other ways.

struct multiplies_by_3 {
uint32_t operator()(uint32_t value) const {
return value * 3;
}
};

multiplies_by_3 multiplier;
uint32_t value = multiplier(15); //45
uint32_t value_2 = multiplies_by_3{}(20); //60
uint32_t value_3 = multiplies_by_3()(25); //75

struct subtracts_first_from_second {
uint32_t operator()(uint32_t first, uint32_t second) const {
return second - first;
}
};

subtracts_first_from_second subtractor;
uint32_t value = subtractor(15, 17); //2
uint32_t value_2 = subtracts_first_from_second{}(20, 29); //9
uint32_t value_3 = subtracts_first_from_second()(25, 17); //Underflows to some large number

What is the difference between `std::random_device` and `std::mt19937_64`?

std::random_device is a uniformly-distributed integer random number generator that produces non-deterministic random numbers. It could be used with distribution to generate random numbers, but the performance of many implementations of std::random_device degrades sharply once the entropy pool is exhausted, so it is recommended to use it only to seed a pseudo-random number generator such as std::mt19937_64. They are "not so random", but do not degrate as former one, so they are often used in chain like that:

std::random_device rd;
std::mt19937_64 eng(rd());
std::uniform_int_distribution<int> uniform_dist(1, 6)

std::cout << uniform_dist(eng);


Related Topics



Leave a reply



Submit