Possible Issue About Random Number Generator

Possible issue about random number generator

As I mentioned in my comment above, this has absolutely nothing to do with random number generators.

Consider:

set.seed(123)
result <- sample(x=c(2:50), size=10e4, replace=TRUE)
x <- hist(result)

Sample Image

Something looks wrong, eh? But look closer:

> x$breaks
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
> x$counts
[1] 6132 3971 4179 4115 4108 4002 4145 4073 4192 4117 4123 4099 4054 4013 4067 4055 4073 4082 4095
[20] 4088 4044 4050 4027 4096

versus...

> table(result)
result
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1979 2100 2053 1978 1993 2152 2027 2058 2057 2074 2034 1991 2011 2075 2070 2067 2006 2047 2145 2019
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
2098 2060 2063 2099 2000 2016 2038 1990 2023 1976 2091 2060 1995 2061 2012 2003 2079 2008 2087 2036
42 43 44 45 46 47 48 49 50
2052 1989 2055 2044 2006 2001 2026 2062 2034

Note that the first bin from hist appears to include all 2, 3 and 4 values. This is because the default binning strategy employed by hist adds some "fuzziness" to the bin boundaries, which result in the first two break point being slightly less than 2.0 and slightly more than 4.0. Combine that with the intervals being right closed, and you get the resulting histogram.

Compare with:

hist(result,breaks = 1:50)

Sample Image

Why is it hard for a program to generate random numbers?

How about, "Because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it's not very random! Imagine trying to give someone instructions on how to choose a random number."

issue with random number generator

TL;DR: default_random_engine doesn’t give you any guarantees. If you want numeric random, use mt19937_64 instead. If you want to generate security relevant randomness (keys, IVs, salts, passwords, …), find a CSPRNG seeded from random_device or use random_device. See below for fun analysis of the default_random_engine behaviour of g++.

default_random_engine is implementation-defined

As mentioned in the comments, default_random_engine has implementation-defined behaviour and you should probably not be using it at all. Implementation-defined means that the implementation is, for example, free to use the (in)famous XKCD rng:

class default_random_engine
{
public:
using result_type = uint32_t;

result_type min() const
{
return 1;
}

result_type max() const
{
return 6;
}

default_random_engine(int) {}; // we don’t care about the argument

result_type operator()() const
{
return 4; // chosen by fair dice roll
// guaranteed to be random
}
};

default_random_engine doesn’t work well with seeds close to each other

However, actually it appears simply that the default_random_engine (used in the GNU libstdc++) is highly biased with the seeds currently produced by time(0) and/or in the resolution it is sampled with a uniform_int_distribution<int>(1, 6). Try to generate several random numbers at once in the same program. You will find outputs like these: 6 6 6 4 6 2 2 3, 6 3 2 1 4 6 3 3, …. In short, default_random_engine has no guarantees, and may be a very bad RNG for any purpose (appearantly, even teaching about the random API, because it doesn’t appear random when generating a single diceroll seeded with UNIX epoch timestamps!).

So I ran the numbers. With a small program (see below), I calculated the probability distribution for the first dice roll number generated with default_random_engine and mt19937_64 to be 6, plotted over the range of seeds. This is with g++ (Debian 6.1.1-11) 6.1.1 20160802 and libstdc++ 6.1.1-11.

Plot of the probability to get a 6 in the first draw over seed

The plot shows the probability to get a 6 with std::uniform_int_distribution<int> (1,6) in the first draw after initialising the corresponding random engine with the seed on the X axis. In addition, the current unix time is drawn as vertical black line (seen only in the top range of the chart). As you can see, we are currently in a range where the integer distribution is very likely (100%) to output a 6 as the first number using default_random_engine. The mt19937_64 in contrast is very close to the expected ⅙ probability. The bin size for the histogram is 10000 (or about 2.8 hours worth of seconds-since-unix-epoch).

Also, over the whole tested range (0..4e9-1), default_random_engine actually produces the 6 as first number with a probability of approximately ⅙. It is just really bad at dealing with seeds which are "close" (±1000) to each other.

Conclusion

So, in a way, time(0) is the culprit, because it is changing slowly. And also, (the standard library used by) Code::Blocks is the culprit because their default_random_engine sucks.A proper RNG should not bias towards a certain result over such a large range of seeds. Just use a proper RNG, and don’t try to get things to work nicely with default_random_engine, which isn’t guaranteed to be portable between compilers and standard libraries.

Use one of the numerically sound random number engines provided by the C++11 standard instead (such as the mentioned mt19937_64), if you need random numbers for numerical purposes (such as a game physics simulation).

If you need random numbers for anything security relevant (generate passwords, salts, IVs for cryptography, keys, …), do not use mt19937_64, but use a CSPRNG seeded using std::random_device or std::random_device directly.

Another note, as the tutorial is geared towards game development: As soon as network comes to play, you may in fact need the platform independent behaviour. In such cases, choosing a specific RNG engine with specific parameters is vital to achieve reproducible results across clients and between client and server.

Source

#include <array>
#include <cstdint>
#include <iostream>
#include <random>
#include <ctime>

void search()
{
static constexpr uint32_t SEARCH_MAX = 4000000000;
static constexpr uint32_t SEARCH_BLOCKS = 8;
static constexpr uint32_t SEARCH_BIN_SIZE = 10000;
static constexpr uint32_t SEARCH_BINS = SEARCH_MAX / SEARCH_BIN_SIZE;
static constexpr uint32_t SEARCH_STEP = 1;
static constexpr uint32_t OUTPUT_STEP = 1000000;
std::vector<uint32_t> histogram(SEARCH_BINS);
for (uint32_t &member: histogram) member = 0;

std::uniform_int_distribution<int> diceRoll(1, 6);
static constexpr uint32_t START = BLOCK * (SEARCH_MAX/SEARCH_BLOCKS);
static constexpr uint32_t END = START + SEARCH_MAX/SEARCH_BLOCKS;
for (uint32_t i = START; i < END; i+=SEARCH_STEP) {
rng_engine_to_use foo(i);
int roll = diceRoll(foo);
if (roll == 6) {
histogram[i/SEARCH_BIN_SIZE] += 1;
}
if (i % OUTPUT_STEP == 0) {
std::cerr << ((float)i / SEARCH_MAX * 100) << "% \r" << std::flush;
}
}

for (uint32_t i = START / SEARCH_BIN_SIZE; i < END/SEARCH_BIN_SIZE; ++i) {
std::cout << i*SEARCH_BIN_SIZE << " " << histogram[i] << std::endl;
}
}

int main()
{
search();
}

I #defined BLOCK to be in 0..7, to build one version for each of my CPU cores, because the mt19937_64 takes quite some time to generate all those numbers (I only afterwards realised that the plot would not be readable when plotted over the whole range, but meh.). I replaced rng_engine_to_use with mt19937_64 and default_random_engine each and plotted the generated histogram (seen above).

Why do random number generators cause problems when executed in parallel?

Aside from getting wrong values from the race conditions (pointed out by @MitchWheat), the code will be less efficient because of cache-line sharing between cores on mainstream x86 processors.

Here is an example of (pretty bad but simple) 32-bit random generator (written in C):

uint32_t seed = 0xD1263AA2;

uint32_t customRandom() {
uint32_t old = seed;
seed = (uint32_t)(((uint64_t)seed * 0x2E094C40) >> 24);
return old;
}

void generateValues(uint32_t* arr, size_t size) {
for(size_t i=0 ; i<size ; ++i)
arr[i] = customRandom();
}

If you run this example in sequential (see the result here), the state seed will likely be stored in the memory hierarchy using mainstream compilers (ie. GCC and Clang). This 32-bit block of memory will be read/written in the L1 cache which is very close to the core executing the code.

When you parallelize the loop naively, using for example #pragma omp parallel for in OpenMP, the state is read/written concurrently by multiple threads. There is a race condition: the state value seed can be read by multiple thread in parallel and be written in parallel. Consequently, the same value can be generated by multiple threads while the results are supposed to be random. Race condition are bad and must be fixed. You can fix that using a thread-local state here.

Assuming you do not fix the code because you want to understand the impact of the race condition on the resulting performance of this code, you should see a performance drop in parallel. The issue comes from the cache coherence protocol used by x86 mainstream processors. Indeed, seed is shared between all the threads executed on each core so the processor will try to synchronize the cache of the core so they are kept coherent. This process is very expensive (much slower than reading/writing in the L1 cache). More specifically, when a thread on a given core write in seed, the processor invalidate the seed stored in all the other threads located in the caches of the other cores. Each thread must then fetch the updated seed (typically from the much slower L3 cache) when seed is read. You can find more information here.

issue with self made random number generator in python. (using time.time())

class Random:
def __init__(self):
self.num = 20
self.j = 1
def randomBits(self):
self.j = self.num * self.j
self.j = float("0.%s" % int(self.j))
return self.j
def randrange(self,x, y):
self.randomBits()
return int(self.j * (y - x) + x)

Though, I did kind of "hard-code" it always being a decimal so I can use it like JavaScript's Math.random() I find you will like this one much better. I'm sure you know how to use it but here is some examples in-case you do not.

>>> random = Random()
>>> random.randrange(1,10**50)
20000000000000001525953968218377400658992994189312

So as you can see it generated a number between 1 and 10 followed by 50 zeros. To use random.randrange(numberOne, numberTwo) I hope this way helped you a bit.

Random number generator issues in matlab

This is because the seed generator in matlab, when you start matlab is always the same take a look at this

>> rng('default')
>> ind_1=randi([0 1000])
ind_2=randi([0 1000])
ind_3=randi([0 1000])
ind_1=randi([0 1000])
ind_2=randi([0 1000])
ind_3=randi([0 1000])
rng('default')
ind_1=randi([0 1000])
ind_2=randi([0 1000])
ind_3=randi([0 1000])

ind_1 =

815

ind_2 =

906

ind_3 =

127

ind_1 =

914

ind_2 =

632

ind_3 =

97

ind_1 =

815

ind_2 =

906

ind_3 =

127

So the only thing that you have to do is change the initial seed every time you generate new numbers.

Execute before rng('shuffle'), it reseeds the generator using a different seed based on the current time.

>> rng('default')
>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

815 906 127

>> rng('shuffle')
>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

404 10 838

>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

31 459 534

>> rng('shuffle')
>> rng('shuffle')
>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

708 963 21

>> rng('default')
>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

815 906 127

>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

914 632 97

>> rng('default')
>> [randi([0 1000]), randi([0 1000]), randi([0 1000])]

ans =

815 906 127

Problems with my random number generator

The problem still lies in the +1 for the random numbers.

Also, you are first checking in the the first assignment to x if the card is already picked, and than you assign it to an other card.

Why not use something like:

int nr_cards_picked = 0             /* Number of uniquely picked cards in hand */
/* Continue picking cards until 5 unique cards are picked. */
while (nr_cards_picked < 5) {
x = rand() % 52; /* Take a random card */
if (kortArray[x].draget == 0) {
/* Pick this card. */
kortArray[x].draget = 1; /* Card is picked */
kortHand[i] = kortArray[x]; /* Add picked card to hand */
nr_cards_picked++;
}
}

Forgive compiler errors; I don't have a compiler near here.

This case you only have one time a random number call.

Theoretically it might never end but this is not likely.

Random number generator generating low numbers more frequently than high numbers C++

For a uniformly distributed random variable E in the open interval [0, 32767]
the probability of mod(E, 10000) < 2800 is around 34%. Intuitively you can think of mod(E, 10000) < 2800 as favouring the bucket of numbers in the range [30000, 32767]: that bucket modulo 10000 is always less than 2800. So that has the effect of pushing the result above 28%.

That's the behavior you are observing here.

It's not a function of the quality of the random generator, although you would get better results if you were to use a uniform generator with a larger periodicity. Using rand() out of your C++ standard library is ill-advised as the standard is too relaxed about the function requirements for it to be portable. <random> from C++11 will cause you far less trouble: you'd be able to avoid explicit % too.



Related Topics



Leave a reply



Submit