Stl Random Distributions and Portability

stl random distributions and portability

This is not a defect, it is by design. The rationale for this can be found in A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398) which says (emphasis mine):

On the other hand, the specifications for the distributions only
define the statistical result, not the precise algorithm to use. This
is different from engines, because for distribution algorithms,
rigorous proofs of their correctness are available, usually under the
precondition that the input random numbers are (truely) uniformly
distributed. For example, there are at least a handful of algorithms
known to produce normally distributed random numbers
from uniformly
distributed ones. Which one of these is most efficient depends on at
least the relative execution speeds for various transcendental
functions, cache and branch prediction behaviour of the CPU, and
desired memory use
. This proposal therefore leaves the choice of the
algorithm to the implementation. It follows that output sequences for
the distributions will not be identical across implementations
. It is
expected that implementations will carefully choose the algorithms for
distributions up front, since it is certainly surprising to customers
if some distribution produces different numbers from one
implementation version to the next.

This point is reiterated in the implementation defined section which says:

The algorithms how to produce the various distributions are specified
as implementation-defined, because there is a vast variety of
algorithms known for each distribution. Each has a different trade-off
in terms of speed, adaptation to recent computer architectures, and
memory use. The implementation is required to document its choice so
that the user can judge whether it is acceptable quality-wise.

C++11 random number distributions are not consistent across platforms -- what alternatives are there?

I have created my own C++11 distributions:

template <typename T>
class UniformRealDistribution
{
public:
typedef T result_type;

public:
UniformRealDistribution(T _a = 0.0, T _b = 1.0)
:m_a(_a),
m_b(_b)
{}

void reset() {}

template <class Generator>
T operator()(Generator &_g)
{
double dScale = (m_b - m_a) / ((T)(_g.max() - _g.min()) + (T)1);
return (_g() - _g.min()) * dScale + m_a;
}

T a() const {return m_a;}
T b() const {return m_b;}

protected:
T m_a;
T m_b;
};

template <typename T>
class NormalDistribution
{
public:
typedef T result_type;

public:
NormalDistribution(T _mean = 0.0, T _stddev = 1.0)
:m_mean(_mean),
m_stddev(_stddev)
{}

void reset()
{
m_distU1.reset();
}

template <class Generator>
T operator()(Generator &_g)
{
// Use Box-Muller algorithm
const double pi = 3.14159265358979323846264338327950288419716939937511;
double u1 = m_distU1(_g);
double u2 = m_distU1(_g);
double r = sqrt(-2.0 * log(u1));
return m_mean + m_stddev * r * sin(2.0 * pi * u2);
}

T mean() const {return m_mean;}
T stddev() const {return m_stddev;}

protected:
T m_mean;
T m_stddev;
UniformRealDistribution<T> m_distU1;
};

The uniform distribution seems to deliver good results and the normal distribution delivers very good results:

100000 values -> 68.159% within 1 sigma; 95.437% within 2 sigma; 99.747% within 3 sigma

The normal distribution uses the Box-Muller method, which according to what I have read so far, is not the fastest method, but it runs more that fast enough for my application.

Both the uniform and normal distributions should work with any C++11 engine (tested with std::mt19937) and provides the same sequence on all platforms, which is exactly what I wanted.

Should I call reset() on my C++ std random distribution to clear hidden state?

To ensure there is no hidden state within the distribution, is it 1)
necessary

Yes.

and 2) sufficient to call reset() on the distribution each time?

Yes.

You probably don't want to do this though. At least not on every single call. The std::normal_distribution is the poster-child for allowing distributions to maintain state. For example a popular implementation will use the Box-Muller transformation to compute two random numbers at once, but hand you back only one of them, saving the other for the next time you call. Calling reset() prior to the next call would cause the distribution to throw away this already valid result, and cut the efficiency of the algorithm in half.

How to find out the implementation details for a certain standard C++ function?

Sometimes a utility's behavior is explicitly defined by the standard, sometimes it's not.

  1. Look in the standard; if found, it's strictly-conforming and portable. Yoohay!
  2. If not specified or explicitly implementation-defined, look into the standard library implementation of your choice. The source code will explain it.

    Unfortunately it's implementation-defined and unportable then. If specified by POSIX or something similar, yoohay again, but only for POSIX-conforming or "something similar"-conforming platforms.

Here's an example:

The C++14 standard draft N4296 says in §26.5.8.5.1:

A normal_distribution random number distribution produces random
numbers x distributed according to the probability density function

formula

The distribution parameters µ and σ are also known as this distribution's mean and standard deviation.

I have no idea about PRNG, so I cannot explain this formula to you but I think this is the thing you were looking for.

There you have a function (more specifically: a "probability density function") for calculating random numbers using normal distribution. The whole algorithm builds around this and can be found in the corresponding standard library implementation.

Why is std::uniform_int_distributionIntType::operator() not const?

While min() and max() wont change, the distribution may contain state that helps it generate the next value. If operator() were const then this state could not be modified without having to guarantee that the object is thread safe. Providing that guarantee could be expensive and distributions are meant to be light weight.

Mersenne Twister Reproducibility across Compilers

The numbers that engines generate are guaranteed to be reproducible across implementations, but the distributions are not. (source: rand() considered harmful).

The N3337 standard draft says this about normal_distribution (26.5.8.5.1):

A normal_distribution random number distribution produces random numbers x distributed according to the probability density function

Sample Image

The distribution parameters µ and σ are also known as this distribution’s mean and standard deviation

And... that's it. It doesn't specify the order of the generated numbers, nor algorithm, nor example outputs.

The standard is very elaborate about mersenne_twister_engine (26.5.3.2), it specifies the state transition function, initial seeding algorithm and so on.

Is generate_canonical output consistent across platforms?

The difficulties encountered in the linked question points to the basic problem with consistency: rounding mode. The clear intent of the mathematical definition of generate_canonical in the standard is that the URNG be called several times, each producing a non-overlapping block of entropy to fill the result with; that would be entirely consistent across platforms. The problem is, no indication is given as to what to do with the extra bits below the LSB. Depending on rounding mode and summation order, these can round upwards, spilling into the next block (which is what allows for a 1.0 result).

Now, the precise wording is "the instantiation’s results...are distributed as uniformly as possible as specified below". If the rounding mode is round-to-nearest, an implementation which produces 1.0 is not as uniform as possible (because 1-eps is less likely than 1-2*eps). But it's still "as specified below". So depending on how you parse that sentence, generate_canonical is either fully specified and consistent, or has delegated some extra un-discussed bits to the implementation.

In any case, the fact that certain implementations produce 1.0 makes it quite clear that the current behavior is not cross-platform consistent. If you want that, it seems like the most straightforward approach would be to wrap your URNG in an independent_bits_engine to produce some factor of bits bits, so there's never anything to round.

Why GCC and MSVC std::normal_distribution are different?

It's problematic, but the standard unfortunately does not specify in detail what algorithm to use when constructing (many) of the randomly distributed numbers, and there are several valid alternatives, with different benefits.

26.6.8.5 Normal distributions [rand.dist.norm]
26.6.8.5.1 Class template normal_distribution [rand.dist.norm.normal]

A normal_distribution random number distribution produces random
numbers x distributed according to the probability density function

Sample Image

parameters μ and are also known as this distribution’s mean and
standard deviation .

The most common algorithm for generating normally distributed numbers is Box-Muller, but even with that algorithm there are options and variations.

The freedom is even explicitly mentioned in the standard:

26.6.8 Random number distribution class templates [rand.dist]
. . .

3 The
algorithms for producing each of the specified distributions are
implementation-defined.

A goto option for this is boost random

By the way, as @Hurkyl points out: It seems that the two implementations are actually the same: For example box-muller generates pairs of values, of which one is returned and one is cached. The two implementations differ only in which of the values is returned.

Further, the random number engines are completely specified and will give the same sequence between implementations, but care does need to be taken since the different distributions can also consume different amounts of random data in order to produce their results, which will put the engines out of sync.



Related Topics



Leave a reply



Submit