R-How to Generate Random Sample of a Discrete Random Variables

R-How to generate random sample of a discrete random variables?

I think you are looking to generate samples of a Bernoulli random variable. A Bernoulli random variable is a special case of a binomial random variable. Therefore, you can try rbinom(N,1,p). This will generate N samples, with value 1 with probability p, value 0 with probability (1-p). To get values of a and -a you can use a*(2*rbinom(N,1,p)-1).

generating a discrete random probability distribution, by perturbing an existing one

As proposed by prof.Bolker, you ought to look at Dirichlet distribution. Let's denote mean apriori values by capital letters C_i and sampled values by small letters c_i. It will automatically, from distribution properties, provide you with two features:

Sum _i c_i = 1
Each c_i is within [0...1] range

so right away you could use them as probabilities.

Given C_i, and looking at distribution definition (check the link), the only free parameter left is

a₀ = Sum _i a_i

and each a_i = C_i * a₀

Such choice of a_i will (again, automatically) provide proper mean value E[c_i] = C_i.

Bigger a₀ - c_i would be more narrow around C_i. Variance is roughly speaking Var[c_i] ~ C_i/a₀, so for 5% you might try to use a₀ of 50.

Some R code

library(MCMCpack)

apriori <- c(0.2, 0.3, 0.1, 0.4) # your C_i
a0 <- 50
a <- a0*apriori

set.seed(12345)
# sample your c_i and use it, for example, to throw uneven dice
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)

# another dice throw
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)

...

Efficiently generating discrete random numbers

How about using cut:

N <- 1e6
u <- runif(N)
system.time(as.numeric(cut(u,cdf)))
   user  system elapsed 
   1.03    0.03    1.07 

head(table(as.numeric(cut(u,cdf))))

  1   2   3   4   5   6 
 51  95 165 172 148  75

How can we find E(X^n) for a discrete random variable X in R?

Take Poisson random variable X ~ Poisson(2) for example.

probabilistic method

f1 <- function (N) {
  x <- 0:N
  p <- dpois(x, 2)
  ## approximate E[X]
  m1 <- weighted.mean(x, p)
  ## approximate E[X ^ 2]
  m2 <- weighted.mean(x ^ 2, p)
  ## approximate E[X ^ 3]
  m3 <- weighted.mean(x ^ 3, p)
  ## return
  c(m1, m2, m3)
  }

As N gets bigger, approximation is more and more accurate, in the sense that the sequence converges analytically.

N <- seq(10, 200, 10)
m123_prob <- t(sapply(N, f1))
matplot(m123_prob, type = "l", lty = 1)

statistical method (sampling based method)

f2 <- function (sample_size) {
  x <- rpois(sample_size, 2)
  ## unbiased estimate of E[x]
  m1 <- mean(x)
  ## unbiased estimate of E[x ^ 2]
  m2 <- mean(x ^ 2)
  ## unbiased estimate of E[x ^ 3]
  m3 <- mean(x ^ 3)
  ## return
  c(m1, m2, m3)
  }

As sample_size grows, estimation is more and more accurate, in the sense that the sequence converges in probability.

sample_size <- seq(10, 200, 10)
m123_stat <- t(sapply(sample_size, f2))
matplot(m123_stat, type = "l", lty = 1)

Random sample from given bivariate discrete distribution

You are almost there. Assuming you have the data frame dt with the x, y, and pij values, just sample the rows!

dt <- expand.grid(X=1:3, Y=1:2)
dt$p <- runif(6)
dt$p <- dt$p / sum(dt$p)  # get fake probabilities
idx <- sample(1:nrow(dt), size=8, replace=TRUE, prob=dt$p)
sampled.x <- dt$X[idx]
sampled.y <- dt$Y[idx]

R Using distributions to generate random variables, but with many different dimensions and parameter values

I think you may be misunderstanding what rdirichlet(...) does (BTW: you do have to spell it correctly...).

rdirichlet(n,alpha) returns a matrix with n rows, and length(alpha) columns. Each row corresponds to a random deviate taken from the gamma distribution with scale parameter given by the corresponding element of alpha, normalized so that the row-wise sums are 1. So, for example,

set.seed(1)
rdirichlet(2,c(1,1,1))
#            [,1]      [,2]      [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646

Two rows because n=2, 3 columns because length(alpha)=3. There is no reason to expect that the values in the three columns will be equal (to 1/3) just because alpha = c(1,1,1), although the column-wise means will approach (1/3,1/3,1/3) for large n:

set.seed(1)
colMeans(rdirichlet(1000,c(1,1,1)))
# [1] 0.3371990 0.3314027 0.3313983

Given this, it is not clear (to me at least) what you want exactly. This will create a list of matrices:

set.seed(1)
lapply(list(param1,param2),function(x)rdirichlet(2,x))
# [[1]]
#            [,1]      [,2]      [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646

# [[2]]
#              [,1]         [,2]      [,3]
# [1,] 0.0010146803 0.0003150297 0.9986703
# [2,] 0.0001574301 0.0003112573 0.9995313

Something that looks more or less like your expected output can be generated this way:

set.seed(1)
t(apply(rbind(param1,param2),1,function(x)colMeans(rdirichlet(S,x))))
#                [,1]        [,2]      [,3]
# param1 0.3765401986 0.369370923 0.2540889
# param2 0.0005991643 0.001380334 0.9980205

Finally, the univariate distributions work differently. rnorm(...), runif(...) etc return a vector (not a matrix), so the apply(...) functions can be used more or less directly:

param1 <- c(0,1)
param2 <- c(5,2)
param3 <- c(1,.2)
set.seed(1)
sapply(list(param1,param2,param3),function(x)rnorm(5,mean=x[1],sd=x[2]))
#           [,1]     [,2]      [,3]
# [1,] -0.6264538 3.359063 1.3023562
# [2,]  0.1836433 5.974858 1.0779686
# [3,] -0.8356286 6.476649 0.8757519
# [4,]  1.5952808 6.151563 0.5570600
# [5,]  0.3295078 4.389223 1.2249862

Here, each column is a vector of random variates from the corresponding parameter-set.

R-How to Generate Random Sample of a Discrete Random Variables