How to Best Simulate an Arbitrary Univariate Random Variate Using Its Probability Function

How do I best simulate an arbitrary univariate random variate using its probability function?

Here is a (slow) implementation of the inverse cdf method when you are only given a density.

den<-dnorm #replace with your own density

#calculates the cdf by numerical integration
cdf<-function(x) integrate(den,-Inf,x)[[1]]

#inverts the cdf
inverse.cdf<-function(x,cdf,starting.value=0){
lower.found<-FALSE
lower<-starting.value
while(!lower.found){
if(cdf(lower)>=(x-.000001))
lower<-lower-(lower-starting.value)^2-1
else
lower.found<-TRUE
}
upper.found<-FALSE
upper<-starting.value
while(!upper.found){
if(cdf(upper)<=(x+.000001))
upper<-upper+(upper-starting.value)^2+1
else
upper.found<-TRUE
}
uniroot(function(y) cdf(y)-x,c(lower,upper))$root
}

#generates 1000 random variables of distribution 'den'
vars<-apply(matrix(runif(1000)),1,function(x) inverse.cdf(x,cdf))
hist(vars)

Simulate from an (arbitrary) continuous probability distribution

Here is a way using the distr package, which is designed for this.

library(distr)
p <- function(x) (2/pi) * (1/(exp(x)+exp(-x))) # probability density function
dist <-AbscontDistribution(d=p) # signature for a dist with pdf ~ p
rdist <- r(dist) # function to create random variates from p

set.seed(1) # for reproduceable example
X <- rdist(1000) # sample from X ~ p
x <- seq(-10,10, .01)
hist(X, freq=F, breaks=50, xlim=c(-5,5))
lines(x,p(x),lty=2, col="red")

Sample Image

You can of course also do this is base R using the methodology described in any one of the links in the comments.

Simulate data from (non-standard) density function

As pointed out by @pjs we can use Rejection sampling (check the wiki for details).

Here is one implementation of this approach.

The most important step is to find a distribution g from which we can sample and from which it exists M such that M * g > f for all point

f <- function(x) (25 * 200.7341^25 / x^26 * exp(-(200.7341/x)^25))
g <- function(x) dnorm(x, mean = 200.7341, sd = 40)
M <- 5
curve(f, 0, 500)
curve(M * g(x), 0, 500, add = TRUE, lty = "dashed")

Sample Image

Now, we can execute the algorithm

set.seed(42)
k <- 1
count <- 0
res <- vector(mode = "numeric", length = 1000)
while(k < 1001) {
z <- rnorm(n = 1, mean = 200.7341, sd = 40)
R <- f(z) / (M * g(z))
if (R > runif(1)) {
res[k] <- z
k <- k + 1
}
count <- count + 1
}

(accept_rate <- (k / count) * 100)
## [1] 19.7086

require(MASS) ## for truehist
truehist(res)
curve(f, 0, 250, add = TRUE)

Sample Image

The acceptance rate is not great. You can try do find a better envelope function or use a Metropolis Hasting algorithm.

Generating random sample from the quantiles of unknown density in R

If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.

So, let

f(x)     = your density function, 
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).

In R code:

f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)

F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)

x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)

par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")

Sample Image

In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.

Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)

X <- runif(1000,0,1)   # random sample from U[0,1]
Z <- F.inv(X)

Finally, we demonstrate that Z is indeed distributed as f(x).

par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))

Sample Image

Use inverse CDF to generate random variable in R

You can use uniroot(...) for this.

[Note: If the point of this exercise is to implement your own version of a Newton Raphson technique, let me know and I'll delete the answer.]

If I'm understanding this correctly, you want to generate random samples from a distribution with probability density function f and cumulative density F where

f = x*exp(-x)
F = 1 - (1+x)*exp(-x)

As you imply, this can be done by generating a random sample from U[0,1] and transforming that according to the inverse CDF of F. The procedure is very similar to the ones posted here and here, except that you already have an expression for the CDF.

f <- function(x) x*exp(-x)
F <- function(x) 1-(1+x)*exp(-x)

F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,100))$root}
F.inv <- Vectorize(F.inv)

x <- seq(0,10,length.out=1000)
y <- seq(0,1,length.out=1000)

par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")

Sample Image

Then, generate X ~ U[0,1] and Z = F.inv(X).

set.seed(1)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)

par(mfrow=c(1,1))
hist(Z, freq=FALSE, breaks=c(seq(0,10,length=30),Inf), xlim=c(0,10))
lines(x,f(x),type="l",main="Density function", col="red",lty=2)

Sample Image

Randomly fill a 3D grid according to a probability density function p(x,y,z)

Here's an example, using a gaussian pdf (see plot). This code is easily adapted to any specified pdf:

%matplotlib qt 
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

#number of points to lay down:
n = 4000;

#create meshgrid:
min, max, L = -5, 5, 91;
[x_grid,y_grid,z_grid] = np.meshgrid(np.linspace(min,max,L),np.linspace(min,max,L),np.linspace(min,max,L))
xi,yi,zi = x_grid.ravel(),y_grid.ravel(),z_grid.ravel()

#create normalized pdf (gaussian here):
pdf = np.exp(-(x_grid**2 + y_grid**2 + z_grid**2));
pdf = pdf/np.sum(pdf);

#obtain indices of randomly selected points, as specified by pdf:
randices = np.random.choice(np.arange(x_grid.ravel().shape[0]), n, replace = False,p = pdf.ravel())

#random positions:
x_rand = xi[randices]
y_rand = yi[randices]
z_rand = zi[randices]

fig = plt.figure();
ax = fig.add_subplot(111, projection='3d',aspect='equal')
svals = 16;
ax.scatter(x_rand, y_rand, z_rand, s=svals, alpha=.1)

scatter plot generated by code

How to generate correlated Uniform[0,1] variables

This won't be exact, but the NORTA/copula method should be pretty close and easy to implement.

The relevant citation is:

Cario, Marne C., and Barry L. Nelson. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois, 1997.

The paper can be found here.

The general recipe to generate correlated random variables from any distribution is:

  1. Draw two (or more) correlated variables from a joint standard normal distribution using corr2data
  2. Calculate the univariate normal CDF of each of these variables using normal()
  3. Apply the inverse CDF of any distribution to simulate draws from that distribution.

The third step is pretty easy with the [0,1] uniform: you don't even need it. Typically, the magnitude of the correlations you get will be less than the magnitudes of the original (normal) correlations, so it might be useful to bump those up a bit.

Stata Code for 2 uniformish variables that have a correlation of 0.75:

clear

// Step 1
matrix C = (1, .75 \ .75, 1)
corr2data x y, n(10000) corr(C) double
corr x y, means

// Steps 2-3
replace x = normal(x)
replace y = normal(y)

// Make sure things worked
corr x y, means
stack x y, into(z) clear
lab define vars 1 "x" 2 "y"
lab val _stack vars
capture ssc install bihist
bihist z, by(_stack) density tw1(yline(-1 0 1))

If you want to improve the approximation for the uniform case, you can transform the correlations like this (see section 5 of the linked paper):

matrix C = (1,2*sin(.75*_pi/6)\2*sin(.75*_pi/6),1)

This is 0.76536686 instead of the 0.75.


Code for the question in the comments

The correlation matrix C written more compactly, and I am applying the transformation:

clear
matrix C = ( 1, ///
2*sin(-.46*_pi/6), 1, ///
2*sin(.53*_pi/6), 2*sin(-.80*_pi/6), 1, ///
2*sin(0*_pi/6), 2*sin(-.41*_pi/6), 2*sin(.48*_pi/6), 1 )
corr2data v1 v2 v3 v4, n(10000) corr(C) cstorage(lower)
forvalues i=1/4 {
replace v`i' = normal(v`i')
}


Related Topics



Leave a reply



Submit