C++ Uniform_Int_Distribution Always Returning Min() on First Invocation

Pseudo random number generator gives same first output but then behaves as expected

I'm not sure what's going wrong (yet!), but you can still initialize by time as follows without hitting the problem (borrowed from here).

#include <ctime>
#include <iostream>
#include <random>
#include <chrono>

using namespace std;

unsigned seed1 = std::chrono::system_clock::now().time_since_epoch().count();

default_random_engine gen(seed1); //gen(time(NULL));
uniform_int_distribution<int> dist(10,200);

int main()
{
    for(int i = 0; i < 5; i++)
        cout<<dist(gen)<<endl;

    return 0;
}

You can also use the random device, which is non-determinstic (it steals timing information from your key strokes, mouse movements, and other sources to generate unpredictable numbers). This is the strongest seed you can choose but the computer's clock is the better way to go if you don't need strong guarantees because the computer can run out of "randomness" if you use it to often (it takes many key strokes and mouse movements to generate a single truly random number).

std::random_device rd;
default_random_engine gen(rd());

Running

cout<<time(NULL)<<endl;
cout<<std::chrono::system_clock::now().time_since_epoch().count()<<endl;
cout<<rd()<<endl;

on my machine generates

1413844318
1413844318131372773
3523898368

so the chrono library is providing a significantly larger number and more rapidly changing number (that's in nanoseconds) than the ctime library. The random_device is producing non-deterministic numbers which are all over the map. So it seems as though the seeds ctime is producing may be too close together somehow and thus map partly to the same internal state?

I made another program which looks like this:

#include <iostream>
#include <random>
using namespace std;

int main(){
  int oldval           = -1;
  unsigned int oldseed = -1;

  cout<<"Seed\tValue\tSeed Difference"<<endl;
  for(unsigned int seed=0;seed<time(NULL);seed++){
    default_random_engine gen(seed);
    uniform_int_distribution<int> dist(10,200);
    int val = dist(gen);
    if(val!=oldval){
      cout<<seed<<"\t"<<val<<"\t"<<(seed-oldseed)<<endl;
      oldval  = val;
      oldseed = seed;
    }
  }
}

As you can see, this simply prints out the first output value for every possible random seed up to the current time along with the seed and number of previous seeds which had the same value. An excerpt of the output looks like this:

Seed  Value Seed Difference
0 10  1
669 11  669
1338  12  669
2007  13  669
2676  14  669
3345  15  669
4014  16  669
4683  17  669
5352  18  669
6021  19  669
6690  20  669
7359  21  669
8028  22  669
8697  23  669
9366  24  669
10035 25  669
10704 26  669
11373 27  669
12042 28  669
12711 29  669
13380 30  669
14049 31  669

So for every new first number there are 669 seeds which give that first number. Because the second and third numbers are different we are still generating unique internal states. I think we would have to understand much more about the default_random_engine in order to understand what is special about 669 (which can be factored into 3 and 223).

Given this, it's clear why the chrono and random_device libraries work better: the seeds they generate are always more than 669 apart. Keep in mind that even if the first number is the same what matters in many programs is that the sequence of numbers generated by distinct.

uniform_int_distribution used wrong here? (my results seem not equally distributed)

Uniform distribution does not specifically mean that you will get an equal amount in each quadrant of the screen; It means that there is an equal chance of each point appearing. So if you do a simple 50:50 trial, it does not mean you will get 50:50 results.

However if you did a test with say 1,000,000 stars it is very likely that they will be nearly uniformly distributed. To me this seems like an error in sample size

uniform_int_distribution a(), b(), min(), and max()

Are these functions really identical

They are identical, because it is reasonable and practical to always assume so. It would be true but pedantic and misleading to say it is not an absolute guarantee.

I only have access to MinGW 4.7.1, which I assume its standard library is the same as GCC one. Inside it, the class template uniform_int_distribution has members:

  /**
   * @brief Returns the inclusive lower bound of the distribution range.
   */
  result_type
  min() const
  { return this->a(); }

  /**
   * @brief Returns the inclusive upper bound of the distribution range.
   */
  result_type
  max() const
  { return this->b(); }

So, after function inlining, (a and min) and (b and max) should be tanslated into identical code.

Reading the standard section 26.5.8.2.1, you will find that it (only indirectly) says they should return the same value. As a result, a sane library implementer will make them effectively identical or at least not-so-different.

why are a() and b() defined?

I can only guess. It may be about consistency, but the consistency is in a less formal sense.

Mathematically, a uniform distribution is like this:

P(i|a,b) = 1/(b-a+1)

And in uniform_int_distribution, we have

uniform_int_distribution::a()
uniform_int_distribution::b()

For Bernoulli distribution and bernoulli_distribution:

P(i|p) = [complicated]

bernoulli_distribution::p()

Poisson:

P(i|mean) = [complicated]

poisson_distribution::mean()

Normal:

P(x|mean, standard-deviation) = [complicated]

normal_distribution::mean()
normal_distribution::stddev()

We can observe that they all tell their parameters. It is not useful to generic code, but that may be helpful in some situations.

Should I keep the random distribution object instance or can I always recreate it?

I doubt that the distribution object is expensive to create and destroy, although I suppose it might do slightly more than just store the parameters min,max. It might precalculate some useful values based on the parameters, for instance in the obvious implementation 2**32 % (max-min+1) is the number of different values from the generator that would be discarded and re-tried.

In principle, a distribution object is allowed to store inside it some bits of entropy that were drawn from the generator on a previous call to operator(), but not needed. Those bits could be used for a later operator() invocation. So if min==0 and max==1, then you can get 32 calls to operator() on the distribution per call on the generator. That's what the reset() function is about, to clear this state.

So if you use the same min/max values repeatedly, then technically you're wasting random bits by using a new distribution each time -- you could perhaps end up with fewer calls to the engine than if you kept the distribution object around. But I doubt it matters, especially since MT is fast.

Distributions and internal state

Interesting question.

So I was wondering if interfering with how the distribution works by
constantly resetting it (i.e. recreating the distribution at every
call of get_int_from_range) I get properly distributed results.

I've written code to test this with uniform_int_distribution and poisson_distribution. It's easy enough to extend this to test another distribution if you wish. The answer seems to be yes.

Boiler-plate code:

#include <random>
#include <memory>
#include <chrono>
#include <utility>

typedef std::mt19937_64 engine_type;

inline size_t get_seed()
    { return std::chrono::system_clock::now().time_since_epoch().count(); }

engine_type& engine_singleton()
{  
    static std::unique_ptr<engine_type> ptr;

    if ( !ptr ) 
        ptr.reset( new engine_type(get_seed()) );
    return *ptr;
}

// ------------------------------------------------------------------------

#include <cmath>
#include <cstdio>
#include <vector>
#include <string>
#include <algorithm>

void plot_distribution( const std::vector<double>& D, size_t mass = 200 )
{
    const size_t n = D.size();
    for ( size_t i = 0; i < n; ++i ) 
    {
        printf("%02ld: %s\n", i, 
            std::string(static_cast<size_t>(D[i]*mass),'*').c_str() );
    }
}

double maximum_difference( const std::vector<double>& x, const std::vector<double>& y )
{
    const size_t n = x.size(); 

    double m = 0.0;
    for ( size_t i = 0; i < n; ++i )
        m = std::max( m, std::abs(x[i]-y[i]) );

    return m;
}

Code for the actual tests:

#include <iostream>
#include <vector>
#include <cstdio>
#include <random>
#include <string>
#include <cmath>

void compare_uniform_distributions( int lo, int hi )
{
    const size_t sample_size = 1e5;

    // Initialize histograms
    std::vector<double> H1( hi-lo+1, 0.0 ), H2( hi-lo+1, 0.0 );

    // Initialize distribution
    auto U = std::uniform_int_distribution<int>(lo,hi);

    // Count!
    for ( size_t i = 0; i < sample_size; ++i )
    {
        engine_type E(get_seed());

        H1[ U(engine_singleton())-lo ] += 1.0;
        H2[ U(E)-lo ] += 1.0;
    }

    // Normalize histograms to obtain "densities"
    for ( size_t i = 0; i < H1.size(); ++i )
    {
        H1[i] /= sample_size; 
        H2[i] /= sample_size; 
    }

    printf("Engine singleton:\n"); plot_distribution(H1);
    printf("Engine creation :\n"); plot_distribution(H2);
    printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
    std::cout<< std::string(50,'-') << std::endl << std::endl;
}

void compare_poisson_distributions( double mean )
{
    const size_t sample_size = 1e5;
    const size_t nbins = static_cast<size_t>(std::ceil(2*mean));

    // Initialize histograms
    std::vector<double> H1( nbins, 0.0 ), H2( nbins, 0.0 );

    // Initialize distribution
    auto U = std::poisson_distribution<int>(mean);

    // Count!
    for ( size_t i = 0; i < sample_size; ++i )
    {
        engine_type E(get_seed());
        int u1 = U(engine_singleton());
        int u2 = U(E);

        if (u1 < nbins) H1[u1] += 1.0;
        if (u2 < nbins) H2[u2] += 1.0;
    }

    // Normalize histograms to obtain "densities"
    for ( size_t i = 0; i < H1.size(); ++i )
    {
        H1[i] /= sample_size; 
        H2[i] /= sample_size; 
    }

    printf("Engine singleton:\n"); plot_distribution(H1);
    printf("Engine creation :\n"); plot_distribution(H2);
    printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
    std::cout<< std::string(50,'-') << std::endl << std::endl;

}

// ------------------------------------------------------------------------

int main()
{
    compare_uniform_distributions( 0, 25 );
    compare_poisson_distributions( 12 );
}

Run it here.

Does the C++ standard make any guarantee regarding this topic?

Not that I know of. However, I would say that the standard makes an implicit recommendation not to re-create the engine every time; for any distribution Distrib, the prototype of Distrib::operator() takes a reference URNG& and not a const reference. This is understandably required because the engine might need to update its internal state, but it also implies that code looking like this

auto U = std::uniform_int_distribution(0,10);
for ( <something here> ) U(engine_type());

does not compile, which to me is a clear incentive not to write code like this.

I'm sure there are plenty of advice out there on how to properly use the random library. It does get complicated if you have to handle the possibility of using random_devices and allowing deterministic seeding for testing purposes, but I thought it might be useful to throw my own recommendation out there too:

#include <random>
#include <chrono>
#include <utility>
#include <functional>

inline size_t get_seed()
    { return std::chrono::system_clock::now().time_since_epoch().count(); }

template <class Distrib>
using generator_type = std::function< typename Distrib::result_type () >;

template <class Distrib, class Engine = std::mt19937_64, class... Args>
inline generator_type<Distrib> get_generator( Args&&... args )
{ 
    return std::bind( Distrib( std::forward<Args>(args)... ), Engine(get_seed()) ); 
}

// ------------------------------------------------------------------------

#include <iostream>

int main()
{
    auto U = get_generator<std::uniform_int_distribution<int>>(0,10);
    std::cout<< U() << std::endl;
}

Run it here. Hope this helps!

EDIT My first recommendation was a mistake, and I apologise for that; we can't use a singleton engine like in the tests above, because this would mean that two uniform int distributions would produce the same random sequence. Instead I rely on the fact that std::bind copies the newly-created engine locally in std::function with its own seed, and this yields the expected behaviour; different generators with the same distribution produce different random sequences.

uniform_int_distribution with zero range goes to infinite loop

The standard says ([rand.dist.uni.int]):

A uniform_int_distribution random number distribution produces random integers i,
a ≤ i ≤ b, distributed according to the constant discrete probability function
P(i|a,b)=1/(b−a+1)
. . .
explicit uniform_int_distribution(IntType a = 0, IntType b = numeric_limits<IntType>::max());
Requires: a ≤ b.

So uniform_int_distribution<>(5,5) should return 5 with probability 1/1.

Implementations that go into an infinite loop instead, have a bug.

However, your mock RNG that always generates the same value, doesn't satisfy Uniform random bit generator requirements:

A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned. [ Note: The degree to which g's results approximate the ideal is often determined statistically. — end note ]

See [req.genl]/p1.b:

Throughout this subclause [rand], the effect of instantiating a template:
b) that has a template type parameter named URBG is undefined unless the corresponding template argument is cv-unqualified and satisfies the requirements of uniform random bit generator.

Sure enough, with a standard RNG it just works:

#include <iostream>
#include <random>

int main() {
    std::mt19937_64 rng;
    std::uniform_int_distribution<> dist(5, 5);
    std::cout << dist(rng) << "\n";
}

Prints:

C++ Generating random numbers in functions

You have two problems here. First

std::vector<double> generate(std::default_random_engine generator, double mean, double sigma, int n)

Takes the PRNG by value, which means it makes a copy. That means every time you call the function your going to be starting from the same sequence since you never modify the generator from the call site.

The second issue is with

std::vector<double> generate(double mean, double sigma, int n)

You recreate the same generator every time you call the function. This is not going to work as it is going to create the same sequence each time.

Typically you have two options. You can pass the PRNG to the function by reference, or you declare a static PRNG in the function so it persists between function calls.