Generate Random Numbers with a Given Distribution

Generate random numbers according to distributions

The standard random number generator you've got (rand() in C after a simple transformation, equivalents in many languages) is a fairly good approximation to a uniform distribution over the range [0,1]. If that's what you need, you're done. It's also trivial to convert that to a random number generated over a somewhat larger integer range.

Conversion of a Uniform distribution to a Normal distribution has already been covered on SO, as has going to the Exponential distribution.

[EDIT]: For the triangular distribution, converting a uniform variable is relatively simple (in something C-like):

double triangular(double a,double b,double c) {
double U = rand() / (double) RAND_MAX;
double F = (c - a) / (b - a);
if (U <= F)
return a + sqrt(U * (b - a) * (c - a));
else
return b - sqrt((1 - U) * (b - a) * (b - c));
}

That's just converting the formula given on the Wikipedia page. If you want others, that's the place to start looking; in general, you use the uniform variable to pick a point on the vertical axis of the cumulative density function of the distribution you want (assuming it's continuous), and invert the CDF to get the random value with the desired distribution.

Generate random numbers with a given (numerical) distribution

scipy.stats.rv_discrete might be what you want. You can supply your probabilities via the values parameter. You can then use the rvs() method of the distribution object to generate random numbers.

As pointed out by Eugene Pakhomov in the comments, you can also pass a p keyword parameter to numpy.random.choice(), e.g.

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

If you are using Python 3.6 or above, you can use random.choices() from the standard library – see the answer by Mark Dickinson.

What does it mean to generate a random number/variable from a probability distribution?

In general, generating a random number from a probability distribution means transforming random numbers so that the numbers fit the distribution.

Perhaps the most generic way to do so is called inverse transform sampling:

  1. Generate a uniform random number in [0, 1].
  2. Run the quantile function (also known as the inverse CDF or the PPF) on the uniform random number.
  3. The result is a random number that fits the distribution.

However, this technique can't be used in practice for all distributions. The main reason is that the quantile function is either unavailable or hard to calculate. Thus, for many distributions, other techniques are used. They include rejection sampling, direct transformations, etc.

In the case of the geometric distribution, there are at least two ways to generate numbers that follow it. One way is a direct transformation:

  1. Set x to 0.
  2. With probability p, return x.
  3. Add 1 to x and go to step 2.

A geometric random number can also be found by inverse transform sampling, described below.

  1. Generate a uniform random number in [0, 1], call it u.
  2. Run the quantile function, which is floor(log((u - 1)/(p-1))/log(1-p)).
  3. The result is a geometric random number.

Other ways to generate geometric random numbers are available. The choice of algorithm depends on many things, including efficiency, simplicity, and accuracy. (Note that the geometric distribution is defined differently in different works.) The same applies to other probability distributions.

The 1986 book Non-Uniform Random Variate Generation by Luc Devroye goes into random generation from various distributions in detail. See also my article on randomization and sampling methods.

How to generate random numbers with predefined probability distribution?

For simple distributions like the ones you need, or if you have an easy to invert in closed form CDF, you can find plenty of samplers in NumPy as correctly pointed out in Olivier's answer.

For arbitrary distributions you could use Markov-Chain Montecarlo sampling methods.

The simplest and maybe easier to understand variant of these algorithms is Metropolis sampling.

The basic idea goes like this:

  • start from a random point x and take a random step xnew = x + delta
  • evaluate the desired probability distribution in the starting point p(x) and in the new one p(xnew)
  • if the new point is more probable p(xnew)/p(x) >= 1 accept the move
  • if the new point is less probable randomly decide whether to accept or reject depending on how probable1 the new point is
  • new step from this point and repeat the cycle

It can be shown, see e.g. Sokal2, that points sampled with this method follow the acceptance probability distribution.

An extensive implementation of Montecarlo methods in Python can be found in the PyMC3 package.

Example implementation

Here's a toy example just to show you the basic idea, not meant in any way as a reference implementation. Please refer to mature packages for any serious work.

def uniform_proposal(x, delta=2.0):
return np.random.uniform(x - delta, x + delta)

def metropolis_sampler(p, nsamples, proposal=uniform_proposal):
x = 1 # start somewhere

for i in range(nsamples):
trial = proposal(x) # random neighbour from the proposal distribution
acceptance = p(trial)/p(x)

# accept the move conditionally
if np.random.uniform() < acceptance:
x = trial

yield x

Let's see if it works with some simple distributions

Gaussian mixture

def gaussian(x, mu, sigma):
return 1./sigma/np.sqrt(2*np.pi)*np.exp(-((x-mu)**2)/2./sigma/sigma)

p = lambda x: gaussian(x, 1, 0.3) + gaussian(x, -1, 0.1) + gaussian(x, 3, 0.2)
samples = list(metropolis_sampler(p, 100000))

metropolis gaussians sum

Cauchy

def cauchy(x, mu, gamma):
return 1./(np.pi*gamma*(1.+((x-mu)/gamma)**2))

p = lambda x: cauchy(x, -2, 0.5)
samples = list(metropolis_sampler(p, 100000))

metropolis cauchy

Arbitrary functions

You don't really have to sample from proper probability distributions. You might just have to enforce a limited domain where to sample your random steps3

p = lambda x: np.sqrt(x)
samples = list(metropolis_sampler(p, 100000, domain=(0, 10)))

metropolis sqrt

p = lambda x: (np.sin(x)/x)**2
samples = list(metropolis_sampler(p, 100000, domain=(-4*np.pi, 4*np.pi)))

metropolis sinc

Conclusions

There is still way too much to say, about proposal distributions, convergence, correlation, efficiency, applications, Bayesian formalism, other MCMC samplers, etc.
I don't think this is the proper place and there is plenty of much better material than what I could write here available online.


  1. The idea here is to favor exploration where the probability is higher but still look at low probability regions as they might lead to other peaks. Fundamental is the choice of the proposal distribution, i.e. how you pick new points to explore. Too small steps might constrain you to a limited area of your distribution, too big could lead to a very inefficient exploration.

  2. Physics oriented. Bayesian formalism (Metropolis-Hastings) is preferred these days but IMHO it's a little harder to grasp for beginners. There are plenty of tutorials available online, see e.g. this one from Duke university.

  3. Implementation not shown not to add too much confusion, but it's straightforward you just have to wrap trial steps at the domain edges or make the desired function go to zero outside the domain.

How can I generate random numbers with given limits and mean?

If you want a continuous distribution, where you can get any floating point value between the min and max, one possibility is the triangular distribution which has its mean equal to (min+max+mode)/3. Given the mean, min, and max, that would imply you need mode = 3 * mean - min - max. That yields mode = 17 for your problem. Numpy provides a generator for triangular distributions.

If you want a discrete distribution, where the outcomes are all integers, you could consider a binomial distribution with shifting. The binomial distribution has parameters n and p, with the range being {0,...,n} and the mean equal to n*p. Since you want a min of 12 and a max of 40, you need to shift all of your target values by subtracting the min. You would generate a binomial with n = max - min = 28 and a mean of 23 - 12 = 11. That would require 28 * p = 11, or p = 11/28. Generate your final outcomes by adding the min value to shift the results back to your desired range, i.e., X = binomial(n = 28, p = 11/28) + 12.

You said in a comment that you want to do this in python, so here's some code illustrating both distributions:

import numpy as np

rng = np.random.default_rng()

sample_size = 30
min, max = 12, 40

# Triangular
mode = 17
print(rng.triangular(min, mode, max, sample_size))

# Binomial
n, p = 28, 11/28 # number of trials, probability of each trial
print(rng.binomial(n, p, sample_size) + 12)

Note that the range limits are guaranteed but values near the extremes of the range are extremely rare for both of these. If you want more occurrences of the min and max, you'll need to dig up a bounded distribution with "fatter" tails. One way to do this would be to generate a mixture of distributions, such as flipping a weighted coin to generate from either a triangular or a uniform distribution. Since the mean of the uniform is the mid-range (26), this would require you to adjust the mean of the binomial or triangle to 20 in order to yield the overall target mean of 23 if you use a 50/50 weighting. For example, to get a mix of binomials and discrete uniforms:

sample_size = 30
min, max = 12, 40

n, p = 28, 8/28 # adjust p to get mean of 20 after shifting by 12
binom = rng.binomial(n, p, sample_size // 2) + 12
unif = rng.integers(min, high = max + 1, size = sample_size // 2)
pooled = np.concatenate((binom, unif))
rng.shuffle(pooled)
print(pooled)

generating random number with a specific distribution in c

uniform:

Generate a random number in the range [0,1] with uniform distribution:

double X=((double)rand()/(double)RAND_MAX);

Exponentional

generating an exponentional random variable with parameter lambda:

-ln(U)/lambda (where U~Uniform[0,1]). 

normal:

the simplest way [though time consuming] is using the central limit theorem, [sum enough uniformly distributed numbers] but there are other methods in the wikipedia page such as the box muller transform that generates 2 independent random variables: X,Y~N(0,1)

X=sqrt(-2ln(U))*cos(2*pi*V)
Y=sqrt(-2ln(U))*sin(2*pi*V)
where U,V~UNIFORM[0,1]

transforming from X~N(0,1) to Z~N(m,s^2) is simple: Z = s*X + m

Though you CAN generate these random numbers, I stand by @Amigable Clark Kant suggestion to use an existing library.

Generating a random number based on a distribution function

The question is: "Is it a correct way for generating a random number based on a distribution function?" Thus, basically you have either continuous Probability Density Function (PDF) or discrete Probability Mass Function (PMF) with notation f(x) and you are are looking for a way to find random variate x.

There are at least two ways to do this.

  1. To use Inverse Transform Distribution.
  2. To use Rejection method

Using Inverse transform:
If we know the function of probability distribution, then for some Cumulative Distribution Function (CDF) we can find the closed from of random variate. Suppose your probability function is f(x) and the CDF is F(x) then assuming you can get the inverse function, you can get random variate

x=inverse F(U)

where U is random uniform distribution

Using Rejection Method:
If the CDF does not have a closed form inverse, then you can always use rejection method. The idea of rejection method is to generate a random 2D point (a pair of random number): (r1, r2) and then the point is either under the curve or over the curve of the PDF. If the point is under the curve, then we take this value as our random number, otherwise we sample another point.
Suppose PDF f(x) is bounded by a maximum value M

r1 is generated within interval [a, b] of horizontal axis
r2 is generated within interval [0, M] of vertical axis
If r2 <f (r1) then accept r1 and output r1
Else reject r1 and generate new random point

Inverse Transform method is superior to the Rejection method if you can find the inverse of the CDF because you can get the closed form. For instance, CDF of Exponential distribution with rate 1 is F(x) = 1-exp(-x). The inverse transform would be

x= inverse F(U) = -ln(1-U) = -ln(U)

because 1-U2=U2

Random numbers with user-defined continuous probability distribution

Assuming the density function you have is proportional to a probability density function (PDF) you can use the rejection sampling method: Draw a number in a box until the box falls within the density function. It works for any bounded density function with a closed and bounded domain, as long as you know what the domain and bound are (the bound is the maximum value of f in the domain). In this case, the bound is 64/(99*math.pi) and the algorithm works as follows:

import math
import random

def sample():
mn=0 # Lowest value of domain
mx=2*math.pi # Highest value of domain
bound=64/(99*math.pi) # Upper bound of PDF value
while True: # Do the following until a value is returned
# Choose an X inside the desired sampling domain.
x=random.uniform(mn,mx)
# Choose a Y between 0 and the maximum PDF value.
y=random.uniform(0,bound)
# Calculate PDF
pdf=(((3+(math.cos(x))**2)**2)*(1/(99*math.pi/4)))
# Does (x,y) fall in the PDF?
if y<pdf:
# Yes, so return x
return x
# No, so loop

See also the section "Sampling from an Arbitrary Distribution" in my article on randomization.


The following shows the method's correctness by showing the probability that the returned sample is less than π/8. For correctness, the probability should be close to 0.0788:

print(sum(1 if sample()<math.pi/8 else 0 for _ in range(1000000))/1000000)

How do you draw random numbers from a given scipy random distribution?

The relevant method is rvs which stands for random variates.

You can generate a single random number drawn from a negative binomial distribution like this:

import scipy.stats
scipy.stats.nbinom.rvs(1,0.5) # returns 2 (or other random integer between 0 and +inf)


Related Topics



Leave a reply



Submit