Why Is Rand()%6 Biased

Why is rand()%6 biased?

There are two issues with rand() % 6 (the 1+ doesn't affect either problem).

First, as several answers have pointed out, if the low bits of rand() aren't appropriately uniform, the result of the remainder operator is also not uniform.

Second, if the number of distinct values produced by rand() is not a multiple of 6, then the remainder will produce more low values than high values. That's true even if rand() returns perfectly distributed values.

As an extreme example, pretend that rand() produces uniformly distributed values in the range [0..6]. If you look at the remainders for those values, when rand() returns a value in the range [0..5], the remainder produces uniformly distributed results in the range [0..5]. When rand() returns 6, rand() % 6 returns 0, just as if rand() had returned 0. So you get a distribution with twice as many 0's as any other value.

The second is the real problem with rand() % 6.

The way to avoid that problem is to discard values that would produce non-uniform duplicates. You calculate the largest multiple of 6 that's less than or equal to RAND_MAX, and whenever rand() returns a value that's greater than or equal to that multiple you reject it and call `rand() again, as many times a needed.

So:

int max = 6 * ((RAND_MAX + 1u) / 6)
int value = rand();
while (value >= max)
value = rand();

That's a different implementation of the code in question, intended to more clearly show what's going on.

Why do people say there is modulo bias when using a random number generator?

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!

When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11

When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11

When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11

This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:

int x; 
do {
x = rand();
} while (x >= n);

but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.

A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:

int x;

do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));

x %= n;

For small values of n, this will rarely require more than one call to rand().


Works cited and further reading:

  • CPlusPlus Reference

  • Eternally Confuzzled


C/C++ rand() function for biased expectation

Hmmm, how about choosing a random number between 0 and 17, and if the number is greater than 9, change it to 5?

For 0 - 17, you would get a distribution like

0,1,2,3,4,5,6,7,8,9,5,5,5,5,5,5,5,5

Code:

int random_numbers[100];
for(register int i = 0; i < 100; i++){
random_numbers[i] = rand() % 18;
if (random_numbers[i] > 9) {
random_numbers[i] = 5;
}
}

You basically add a set of numbers beyond your desired range that, when translated to 5 give you equal numbers of 5 and non-5.

Random.nextInt(int) is [slightly] biased

http://docs.oracle.com/javase/6/docs/api/java/util/Random.html:

An instance of this class is used to generate a stream of
pseudorandom numbers. The class uses a 48-bit seed, which is modified
using a linear congruential formula. (See Donald Knuth, The Art of
Computer Programming, Volume 3, Section 3.2.1.)

If two instances of Random are created with the same seed, and the
same sequence of method calls is made for each, they will generate and
return identical sequences of numbers. [...]

It is a pseudo-random number generator. This means that you are not actually rolling a dice but rather use a formula to calculate the next "random" value based on the current random value. To creat the illusion of randomisation a seed is used. The seed is the first value used with the formula to generate the random value.

Apparently javas random implementation (the "formula"), does not generate more than 16 even numbers in a row.

This behaviour is the reason why the seed is usually initialized with the time. Deepending on when you start your program you will get different results.

The benefits of this approach are that you can generate repeatable results. If you have a game generating "random" maps, you can remember the seed to regenerate the same map if you want to play it again, for instance.

For true random numbers some operating systems provide special devices that generate "randomness" from external events like mousemovements or network traffic. However i do not know how to tap into those with java.

From the Java doc for secureRandom:

Many SecureRandom implementations are in the form of a pseudo-random
number generator (PRNG), which means they use a deterministic
algorithm to produce a pseudo-random sequence from a true random seed.
Other implementations may produce true random numbers, and yet others
may use a combination of both techniques.

Note that secureRandom does NOT guarantee true random numbers either.

Why changing the seed does not help

Lets assume random numbers would only have the range 0-7.
Now we use the following formula to generate the next "random" number:

 next = (current + 3) % 8

the sequence becomes 0 3 6 1 4 7 2 5.

If you now take the seed 3 all you do is to change the starting point.

In this simple implementation that only uses the previous value, every value may occur only once before the sequence wraps arround and starts again. Otherwise there would be an unreachable part.

E.g. imagine the sequence 0 3 6 1 3 4 7 2 5. The numbers 0,4,7,2 and 5 would never be generated more than once(deepending on the seed they might be generated never), since once the sequence loops 3,6,1,3,6,1,... .

Simplified pseudo random number generators can be thought of a permutation of all numbers in the range and you use the seed as a starting point. If they are more advanced you would have to replace the permutation with a list in which the same numbers might occur multiple times.

More complex generators can have an internal state, allowing the same number to occur several times in the sequence, since the state lets the generator know where to continue.

C/C++ rand() function for biased expectation

Hmmm, how about choosing a random number between 0 and 17, and if the number is greater than 9, change it to 5?

For 0 - 17, you would get a distribution like

0,1,2,3,4,5,6,7,8,9,5,5,5,5,5,5,5,5

Code:

int random_numbers[100];
for(register int i = 0; i < 100; i++){
random_numbers[i] = rand() % 18;
if (random_numbers[i] > 9) {
random_numbers[i] = 5;
}
}

You basically add a set of numbers beyond your desired range that, when translated to 5 give you equal numbers of 5 and non-5.

rand' function in C++?

You can use rand % 20 but it won't be truly uniform, it will contain bias. Your better option in C++ is to use std::uniform_int_distribution<> this way

#include <random>
#include <iostream>

int main()
{
std::random_device rd;
std::mt19937 gen( rd());
std::uniform_int_distribution<> dis( 1, 20);

for ( int n=0; n<10; ++n)
std::cout << dis(gen) << ' ';
std::cout << '\n';
}

You can read this to learn more about the bias introduced by rand() % x.

C rand() slightly biased

My personal favourite is xoroshiro. It passes most pseudorandomness tests and it's incredibly fast, much faster than rand().

Using both rand() and rand_r() : is this simple example correct?

  1. the simultaneous usage of rand and rand_r is in this case correct;

As long as :

  • rand is not used concurrently (which in your code is ok - you're only calling it once in the main thread)
  • rand_r with the same seed variable is not used concurrently (which in your code is ok - you're only calling it once for each seed variable)

there are no issues with thread safety.


  1. the seed's initialization for rand_r, i.e. the variable my_seeds, is fine;

You have a separate seed for every (potentially) concurrent use of rand_r. As long as the same seed variable isn't used for concurrent calls to rand_r (which in your code doesn't happen), all is good.


  1. the for parallelization and related variable usage is safe.

Each "thread" in your code has its own seed variable for rand_r and its own result variable. So there's no concurrency issue wrt. that.

Side note : rand_r has been obsoleted, and both rand and rand_r are relatively low quality prng's. Depending on your needs, it might be worth it to investigate alternative prng's.



Related Topics



Leave a reply



Submit