R-How to Generate Random Sample of a Discrete Random Variables

R-How to generate random sample of a discrete random variables?

I think you are looking to generate samples of a Bernoulli random variable. A Bernoulli random variable is a special case of a binomial random variable. Therefore, you can try rbinom(N,1,p). This will generate N samples, with value 1 with probability p, value 0 with probability (1-p). To get values of a and -a you can use a*(2*rbinom(N,1,p)-1).

generating a discrete random probability distribution, by perturbing an existing one

As proposed by prof.Bolker, you ought to look at Dirichlet distribution. Let's denote mean apriori values by capital letters Ci and sampled values by small letters ci. It will automatically, from distribution properties, provide you with two features:

  1. Sum i ci = 1

  2. Each ci is within [0...1] range

so right away you could use them as probabilities.

Given Ci, and looking at distribution definition (check the link), the only free parameter left is

a0 = Sum i ai

and each ai = Ci * a0

Such choice of ai will (again, automatically) provide proper mean value E[ci] = Ci.

Bigger a0 - ci would be more narrow around Ci. Variance is roughly speaking Var[ci] ~ Ci/a0, so for 5% you might try to use a0 of 50.

Some R code

library(MCMCpack)

apriori <- c(0.2, 0.3, 0.1, 0.4) # your C_i
a0 <- 50
a <- a0*apriori

set.seed(12345)
# sample your c_i and use it, for example, to throw uneven dice
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)

# another dice throw
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)

...

Efficiently generating discrete random numbers

How about using cut:

N <- 1e6
u <- runif(N)
system.time(as.numeric(cut(u,cdf)))
user system elapsed
1.03 0.03 1.07

head(table(as.numeric(cut(u,cdf))))

1 2 3 4 5 6
51 95 165 172 148 75

How can we find E(X^n) for a discrete random variable X in R?

Take Poisson random variable X ~ Poisson(2) for example.

probabilistic method

f1 <- function (N) {
x <- 0:N
p <- dpois(x, 2)
## approximate E[X]
m1 <- weighted.mean(x, p)
## approximate E[X ^ 2]
m2 <- weighted.mean(x ^ 2, p)
## approximate E[X ^ 3]
m3 <- weighted.mean(x ^ 3, p)
## return
c(m1, m2, m3)
}

As N gets bigger, approximation is more and more accurate, in the sense that the sequence converges analytically.

N <- seq(10, 200, 10)
m123_prob <- t(sapply(N, f1))
matplot(m123_prob, type = "l", lty = 1)

statistical method (sampling based method)

f2 <- function (sample_size) {
x <- rpois(sample_size, 2)
## unbiased estimate of E[x]
m1 <- mean(x)
## unbiased estimate of E[x ^ 2]
m2 <- mean(x ^ 2)
## unbiased estimate of E[x ^ 3]
m3 <- mean(x ^ 3)
## return
c(m1, m2, m3)
}

As sample_size grows, estimation is more and more accurate, in the sense that the sequence converges in probability.

sample_size <- seq(10, 200, 10)
m123_stat <- t(sapply(sample_size, f2))
matplot(m123_stat, type = "l", lty = 1)

Random sample from given bivariate discrete distribution

You are almost there. Assuming you have the data frame dt with the x, y, and pij values, just sample the rows!

dt <- expand.grid(X=1:3, Y=1:2)
dt$p <- runif(6)
dt$p <- dt$p / sum(dt$p) # get fake probabilities
idx <- sample(1:nrow(dt), size=8, replace=TRUE, prob=dt$p)
sampled.x <- dt$X[idx]
sampled.y <- dt$Y[idx]

R Using distributions to generate random variables, but with many different dimensions and parameter values

I think you may be misunderstanding what rdirichlet(...) does (BTW: you do have to spell it correctly...).

rdirichlet(n,alpha) returns a matrix with n rows, and length(alpha) columns. Each row corresponds to a random deviate taken from the gamma distribution with scale parameter given by the corresponding element of alpha, normalized so that the row-wise sums are 1. So, for example,

set.seed(1)
rdirichlet(2,c(1,1,1))
# [,1] [,2] [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646

Two rows because n=2, 3 columns because length(alpha)=3. There is no reason to expect that the values in the three columns will be equal (to 1/3) just because alpha = c(1,1,1), although the column-wise means will approach (1/3,1/3,1/3) for large n:

set.seed(1)
colMeans(rdirichlet(1000,c(1,1,1)))
# [1] 0.3371990 0.3314027 0.3313983

Given this, it is not clear (to me at least) what you want exactly. This will create a list of matrices:

set.seed(1)
lapply(list(param1,param2),function(x)rdirichlet(2,x))
# [[1]]
# [,1] [,2] [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646

# [[2]]
# [,1] [,2] [,3]
# [1,] 0.0010146803 0.0003150297 0.9986703
# [2,] 0.0001574301 0.0003112573 0.9995313

Something that looks more or less like your expected output can be generated this way:

set.seed(1)
t(apply(rbind(param1,param2),1,function(x)colMeans(rdirichlet(S,x))))
# [,1] [,2] [,3]
# param1 0.3765401986 0.369370923 0.2540889
# param2 0.0005991643 0.001380334 0.9980205

Finally, the univariate distributions work differently. rnorm(...), runif(...) etc return a vector (not a matrix), so the apply(...) functions can be used more or less directly:

param1 <- c(0,1)
param2 <- c(5,2)
param3 <- c(1,.2)
set.seed(1)
sapply(list(param1,param2,param3),function(x)rnorm(5,mean=x[1],sd=x[2]))
# [,1] [,2] [,3]
# [1,] -0.6264538 3.359063 1.3023562
# [2,] 0.1836433 5.974858 1.0779686
# [3,] -0.8356286 6.476649 0.8757519
# [4,] 1.5952808 6.151563 0.5570600
# [5,] 0.3295078 4.389223 1.2249862

Here, each column is a vector of random variates from the corresponding parameter-set.



Related Topics



Leave a reply



Submit