Creating a Prng Engine for <Random> in C++11 That Matches Prng Results in R

Creating a PRNG engine for random in C++11 that matches PRNG results in R

The first question is thus, where can I find the documentation on the MRG32k3a implementation in R that specifies these parameters?

I would use the source:
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/RNG.c#L143

The problem is I have no idea how this is done and I can't seem to find any information anywhere (aside from knowing that engines are classes).

The requirements for a RandomNumberEngine can be found here:
https://en.cppreference.com/w/cpp/named_req/RandomNumberEngine
Although it is sufficient to fulfill UniformRandomBitGenerator if you want to use uniform_real_distribution:

Expression      Return type     Requirements
G::result_type  T               T is an unsigned integer type
G::min()        T               Returns the smallest value that G's operator()
                                may return. The value is strictly less than
                                G::max().
G::max()        T               Returns the largest value that G's operator() may
                                return. The value is strictly greater than
                                G::min()
g()             T               Returns a value in the closed interval [G::min(),
                                G::max()]. Has amortized constant complexity.

Main problem is that MRG32k3a is meant to return a floating point number in (0,1), while a C++ UniformRandomBitGenerator returns an integer type. Why do you want to integrate with the <random> header?

Additional difficulties you would have to take into account:

Seeding strategy used by R, c.f. https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/RNG.c#L293
R scrambles the user supplied seed, c.f. https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/RNG.c#L272

Alternatives would include using R source code directly without integration with the <random> header or link to libR.

random get internal state of rng-engine

From cppreference:

Engines and distributions are designed to be used together to produce random values. All of the engines may be specifically seeded, serialized, and deserialized for use with repeatable simulators.

And you'll notice that all the engines have operator<< and operator>> defined. So you should be able to save and load them from a file with these.

Proof of concept:

#include <fstream>
#include <iostream>
#include <random>

int main(int argc, char* argv[])
{
    std::ofstream ofile("out.dat", std::ios::binary);

    std::random_device randDev;
    std::default_random_engine defEngine(randDev());
    std::uniform_int_distribution<int> dist(0, 9);
    auto printSomeNumbers = [&](std::default_random_engine& engine) { for(int i=0; i < 10; i++) { std::cout << dist(engine) << " "; } std::cout << std::endl; };

    std::cout << "Start sequence: " << std::endl;
    printSomeNumbers(defEngine);

    ofile << defEngine;
    ofile.close();

    std::default_random_engine fileEngine;

    std::ifstream ifile("out.dat", std::ios::binary);
    ifile >> fileEngine;

    std::cout << "Orig engine: "; printSomeNumbers(defEngine);
    std::cout << "File engine: "; printSomeNumbers(fileEngine);

    std::cin.get();
    return 0;
}

As can be seen on Coliru, the output is:

Start sequence:

6 5 8 2 9 6 5 2 6 3

Orig engine: 0 5 8 5 2 2 0 7 2 0

File engine: 0 5 8 5 2 2 0 7 2 0

C++11: How to set seed using random

The point of having a seed_seq is to increase the entropy of the generated sequence. If you have a random_device on your system, initializing with multiple numbers from that random device may arguably do that. On a system that has a pseudo-random number generator I don't think there is an increase in randomness, i.e. generated sequence entropy.

Building on that your approach:

If your system does provide a random device then you can use it like this:

  std::random_device r;
  // std::seed_seq ssq{r()};
  // and then passing it to the engine does the same
  default_random_engine eng{r()};
  uniform_real_distribution<double> urd(0, 1);
  cout << "Uniform [0, 1): " << urd(eng);

If your system does not have a random device then you can use time(0) as a seed to the random_engine

  default_random_engine eng{static_cast<long unsigned int>(time(0))};
  uniform_real_distribution<double> urd(0, 1);
  cout << "Uniform [0, 1): " << urd(eng);

If you have multiple sources of randomness you can actually do this (e.g. 2)

std::seed_seq seed{ r1(), r2() };
  default_random_engine eng{seed};
  uniform_real_distribution<double> urd(0, 1);
  cout << "Uniform [0, 1): " << urd(eng);

where r1 , r2 are different random devices , e.g. a thermal noise or quantum source .

Ofcourse you could mix and match

std::seed_seq seed{ r1(), static_cast<long unsigned int>(time(0)) };
  default_random_engine eng{seed};
  uniform_real_distribution<double> urd(0, 1);
  cout << "Uniform [0, 1): " << urd(eng);

Finally, I like to initialize with an one liner:

  auto rand = std::bind(std::uniform_real_distribution<double>{0,1},
              std::default_random_engine{std::random_device()()});
  std::cout << "Uniform [0,1): " << rand();

If you worry about the time(0) having second precision you can overcome this by playing with the high_resolution_clock either by requesting the time since epoch as designated firstly by bames23 below:

static_cast<long unsigned int>(std::chrono::high_resolution_clock::now().time_since_epoch().count())

or maybe just play with CPU randomness

long unsigned int getseed(int const K)
{

    typedef std::chrono::high_resolution_clock hiclock;

    auto gett= [](std::chrono::time_point<hiclock> t0)
    {
        auto tn = hiclock::now();
        return static_cast<long unsigned int>(std::chrono::duration_cast<std::chrono::microseconds>(tn-t0).count());
    };

    long unsigned int diffs[10];
    diffs[0] = gett(hiclock::now());
    for(int i=1; i!=10; i++)
    {
        auto last = hiclock::now();
        for(int k=K; k!=0; k--)
        {
            diffs[i]= gett(last);
        }
    }

    return *std::max_element(&diffs[1],&diffs[9]);
}

Using a random number to generate a series of repeatable numbers in C++

I don't know exactly why you cannot use rand itself but, if not, then you could just roll your own. Use your integer to seed a number then use the standard X_n+1 = a * X_n + c mod m.

The Wikipedia page for linear congruential method has some sample values for a, c and m.

Now LCM isn't the greatest random number generator in the world but, if you just want numbers that "look somewhat randomised", it should be sufficient.

As an example of an LCM algorithm, the following function, based on the Microsoft Visual/Quick C/C++ entry from that linked page above:

// LCM pseudo-random number generator.
// Outputs numbers from 0..32767 inclusive.
// With protection against identical sequences.
//   due to full 32-bit cycle but only returning 15 bits.

uint32_t myRand (void) {
    const static uint32_t a = 214013U;
    const static uint32_t c = 2531011U;
    // m is, of course, 2^32 since the values will wrap.

    static uint32_t seed = 1;
    seed = seed * a + c;
    return (seed >> 16) & 0x7FFF;
}

has a cycle time of the full range of 32-bit numbers but only returns certain bits of the seed each time, which reduces the appearance of identical sequences.

If you want a C++ class to do this so that all random number generators are independent of each other (as suggested by TomZ), you can use something like:

class myRand {
    public:
        myRand ();
        myRand (unsigned int newSeed);
        ~myRand ();
        void setSeed (unsigned int newSeed);
        unsigned int getSeed (void);
        unsigned int next (void);
    private:
        unsigned int seed;
        const static unsigned int a = 214013U;
        const static unsigned int c = 2531011U;
};

myRand::myRand () { seed = 1; }
myRand::myRand (unsigned int newSeed) { setSeed (newSeed); }
myRand::~myRand () { }
void myRand::setSeed (unsigned int newSeed) { seed = newSeed; }
unsigned int myRand::getSeed (void) { return seed; }
unsigned int myRand::next (void) {
    seed = (seed * a + c) & 0xffffffff;
    return (seed >> 16) & 0x7fff;
}

#include <iostream>

int main (void) {
    myRand r (5);

    std::cout << r.next() << "\n";          //    54
    std::cout << r.next() << "\n";          // 28693

    unsigned int saveSeed = r.getSeed();
    std::cout << r.next() << "\n";          // 12255
    std::cout << r.next() << "\n";          // 24449

    r.setSeed (saveSeed);
    std::cout << r.next() << "\n";          // 12255
    std::cout << r.next() << "\n";          // 24449

    return 0;
}

C++11 random number distributions are not consistent across platforms -- what alternatives are there?

I have created my own C++11 distributions:

template <typename T>
class UniformRealDistribution
{
 public:
    typedef T result_type;

 public:
    UniformRealDistribution(T _a = 0.0, T _b = 1.0)
        :m_a(_a),
         m_b(_b)
    {}

    void reset() {}

    template <class Generator>
    T operator()(Generator &_g)
    {
        double dScale = (m_b - m_a) / ((T)(_g.max() - _g.min()) + (T)1); 
        return (_g() - _g.min()) * dScale  + m_a;
    }

    T a() const {return m_a;}
    T b() const {return m_b;}

 protected:
    T       m_a;
    T       m_b;
};

template <typename T>
class NormalDistribution
{
 public:
    typedef T result_type;

 public:
    NormalDistribution(T _mean = 0.0, T _stddev = 1.0)
        :m_mean(_mean),
         m_stddev(_stddev)
    {}

    void reset()
    {
        m_distU1.reset();
    }

    template <class Generator>
    T operator()(Generator &_g)
    {
        // Use Box-Muller algorithm
        const double pi = 3.14159265358979323846264338327950288419716939937511;
        double u1 = m_distU1(_g);
        double u2 = m_distU1(_g);
        double r = sqrt(-2.0 * log(u1));
        return m_mean + m_stddev * r * sin(2.0 * pi * u2);
    }

    T mean() const {return m_mean;}
    T stddev() const {return m_stddev;}

protected:
    T                           m_mean;
    T                           m_stddev;
    UniformRealDistribution<T>  m_distU1;
};

The uniform distribution seems to deliver good results and the normal distribution delivers very good results:

100000 values -> 68.159% within 1 sigma; 95.437% within 2 sigma; 99.747% within 3 sigma

The normal distribution uses the Box-Muller method, which according to what I have read so far, is not the fastest method, but it runs more that fast enough for my application.

Both the uniform and normal distributions should work with any C++11 engine (tested with std::mt19937) and provides the same sequence on all platforms, which is exactly what I wanted.

Random Engine Differences

From the explanations below, linear engine seems to be faster but less random while Marsenne Twister has a higher complexity and randomness. Subtract-with-carry random number engine is an improvement to the linear engine and it is definitelly more random. In the last reference, it is stated that Mersenne Twister has higher complexity than the Subtract-with-carry random number engine

Linear congruential random number engine

A pseudo-random number generator engine that produces unsigned integer numbers.

This is the simplest generator engine in the standard library. Its state is a single integer value, with the following transition algorithm:

x = (ax+c) mod m

Where x is the current state value, a and c are their respective template parameters, and m is its respective template parameter if this is greater than 0, or numerics_limits::max() plus 1, otherwise.

Its generation algorithm is a direct copy of the state value.

This makes it an extremely efficient generator in terms of processing and memory consumption, but producing numbers with varying degrees of serial correlation, depending on the specific parameters used.

The random numbers generated by linear_congruential_engine have a period of m.
http://www.cplusplus.com/reference/random/linear_congruential_engine/

Mersenne twister random number engine

A pseudo-random number generator engine that produces unsigned integer numbers in the closed interval [0,2^w-1].

The algorithm used by this engine is optimized to compute large series of numbers (such as in Monte Carlo experiments) with an almost uniform distribution in the range.

The engine has an internal state sequence of n integer elements, which is filled with a pseudo-random series generated on construction or by calling member function seed.

The internal state sequence becomes the source for n elements: When the state is advanced (for example, in order to produce a new random number), the engine alters the state sequence by twisting the current value using xor mask a on a mix of bits determined by parameter r that come from that value and from a value m elements away (see operator() for details).

The random numbers produced are tempered versions of these twisted values. The tempering is a sequence of shift and xor operations defined by parameters u, d, s, b, t, c and l applied on the selected state value (see operator()).

The random numbers generated by mersenne_twister_engine have a period equivalent to the mersenne number 2^((n-1)*w)-1.http://www.cplusplus.com/reference/random/mersenne_twister_engine/

Subtract-with-carry random number engine

A pseudo-random number generator engine that produces unsigned integer numbers.

The algorithm used by this engine is a lagged fibonacci generator, with a state sequence of r integer elements, plus one carry value. http://www.cplusplus.com/reference/random/subtract_with_carry_engine/

Lagged Fibonacci generators have a maximum period of (2k - 1)*^(2M-1) if addition or subtraction is used. The initialization of LFGs is a very complex problem. The output of LFGs is very sensitive to initial conditions, and statistical defects may appear initially but also periodically in the output sequence unless extreme care is taken. Another potential problem with LFGs is that the mathematical theory behind them is incomplete, making it necessary to rely on statistical tests rather than theoretical performance.
http://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

And finally:
The choice of which engine to use involves a number of tradeoffs: the linear congruential engine is moderately fast and has a very small storage requirement for state. The lagged Fibonacci generators are very fast even on processors without advanced arithmetic instruction sets, at the expense of greater state storage and sometimes less desirable spectral characteristics. The Mersenne twister is slower and has greater state storage requirements but with the right parameters has the longest non-repeating sequence with the most desirable spectral characteristics (for a given definition of desirable). in http://en.cppreference.com/w/cpp/numeric/random

Creating a Prng Engine for <Random> in C++11 That Matches Prng Results in R