R-How to generate random sample of a discrete random variables?
I think you are looking to generate samples of a Bernoulli random variable. A Bernoulli random variable is a special case of a binomial random variable. Therefore, you can try rbinom(N,1,p)
. This will generate N
samples, with value 1
with probability p
, value 0
with probability (1-p)
. To get values of a
and -a
you can use a*(2*rbinom(N,1,p)-1)
.
generating a discrete random probability distribution, by perturbing an existing one
As proposed by prof.Bolker, you ought to look at Dirichlet distribution. Let's denote mean apriori values by capital letters Ci and sampled values by small letters ci. It will automatically, from distribution properties, provide you with two features:
Sum i ci = 1
Each ci is within [0...1] range
so right away you could use them as probabilities.
Given Ci, and looking at distribution definition (check the link), the only free parameter left is
a0 = Sum i ai
and each ai = Ci * a0
Such choice of ai will (again, automatically) provide proper mean value E[ci] = Ci.
Bigger a0 - ci would be more narrow around Ci. Variance is roughly speaking Var[ci] ~ Ci/a0, so for 5% you might try to use a0 of 50.
Some R code
library(MCMCpack)
apriori <- c(0.2, 0.3, 0.1, 0.4) # your C_i
a0 <- 50
a <- a0*apriori
set.seed(12345)
# sample your c_i and use it, for example, to throw uneven dice
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)
# another dice throw
ci <- rdirichlet(1, a)
dice <- rmultinom(1, 1, ci)
...
Efficiently generating discrete random numbers
How about using cut
:
N <- 1e6
u <- runif(N)
system.time(as.numeric(cut(u,cdf)))
user system elapsed
1.03 0.03 1.07
head(table(as.numeric(cut(u,cdf))))
1 2 3 4 5 6
51 95 165 172 148 75
How can we find E(X^n) for a discrete random variable X in R?
Take Poisson random variable X ~ Poisson(2)
for example.
probabilistic method
f1 <- function (N) {
x <- 0:N
p <- dpois(x, 2)
## approximate E[X]
m1 <- weighted.mean(x, p)
## approximate E[X ^ 2]
m2 <- weighted.mean(x ^ 2, p)
## approximate E[X ^ 3]
m3 <- weighted.mean(x ^ 3, p)
## return
c(m1, m2, m3)
}
As N
gets bigger, approximation is more and more accurate, in the sense that the sequence converges analytically.
N <- seq(10, 200, 10)
m123_prob <- t(sapply(N, f1))
matplot(m123_prob, type = "l", lty = 1)
statistical method (sampling based method)
f2 <- function (sample_size) {
x <- rpois(sample_size, 2)
## unbiased estimate of E[x]
m1 <- mean(x)
## unbiased estimate of E[x ^ 2]
m2 <- mean(x ^ 2)
## unbiased estimate of E[x ^ 3]
m3 <- mean(x ^ 3)
## return
c(m1, m2, m3)
}
As sample_size
grows, estimation is more and more accurate, in the sense that the sequence converges in probability.
sample_size <- seq(10, 200, 10)
m123_stat <- t(sapply(sample_size, f2))
matplot(m123_stat, type = "l", lty = 1)
Random sample from given bivariate discrete distribution
You are almost there. Assuming you have the data frame dt
with the x, y, and pij values, just sample the rows!
dt <- expand.grid(X=1:3, Y=1:2)
dt$p <- runif(6)
dt$p <- dt$p / sum(dt$p) # get fake probabilities
idx <- sample(1:nrow(dt), size=8, replace=TRUE, prob=dt$p)
sampled.x <- dt$X[idx]
sampled.y <- dt$Y[idx]
R Using distributions to generate random variables, but with many different dimensions and parameter values
I think you may be misunderstanding what rdirichlet(...)
does (BTW: you do have to spell it correctly...).
rdirichlet(n,alpha)
returns a matrix with n rows, and length(alpha) columns. Each row corresponds to a random deviate taken from the gamma distribution with scale parameter given by the corresponding element of alpha, normalized so that the row-wise sums are 1. So, for example,
set.seed(1)
rdirichlet(2,c(1,1,1))
# [,1] [,2] [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646
Two rows because n=2
, 3 columns because length(alpha)=3
. There is no reason to expect that the values in the three columns will be equal (to 1/3) just because alpha = c(1,1,1)
, although the column-wise means will approach (1/3,1/3,1/3) for large n:
set.seed(1)
colMeans(rdirichlet(1000,c(1,1,1)))
# [1] 0.3371990 0.3314027 0.3313983
Given this, it is not clear (to me at least) what you want exactly. This will create a list of matrices:
set.seed(1)
lapply(list(param1,param2),function(x)rdirichlet(2,x))
# [[1]]
# [,1] [,2] [,3]
# [1,] 0.04037978 0.4899465 0.4696737
# [2,] 0.25991848 0.3800170 0.3600646
# [[2]]
# [,1] [,2] [,3]
# [1,] 0.0010146803 0.0003150297 0.9986703
# [2,] 0.0001574301 0.0003112573 0.9995313
Something that looks more or less like your expected output can be generated this way:
set.seed(1)
t(apply(rbind(param1,param2),1,function(x)colMeans(rdirichlet(S,x))))
# [,1] [,2] [,3]
# param1 0.3765401986 0.369370923 0.2540889
# param2 0.0005991643 0.001380334 0.9980205
Finally, the univariate distributions work differently. rnorm(...)
, runif(...)
etc return a vector (not a matrix), so the apply(...)
functions can be used more or less directly:
param1 <- c(0,1)
param2 <- c(5,2)
param3 <- c(1,.2)
set.seed(1)
sapply(list(param1,param2,param3),function(x)rnorm(5,mean=x[1],sd=x[2]))
# [,1] [,2] [,3]
# [1,] -0.6264538 3.359063 1.3023562
# [2,] 0.1836433 5.974858 1.0779686
# [3,] -0.8356286 6.476649 0.8757519
# [4,] 1.5952808 6.151563 0.5570600
# [5,] 0.3295078 4.389223 1.2249862
Here, each column is a vector of random variates from the corresponding parameter-set.
Related Topics
Custom Ggplot2 Axis and Label Formatting
How to Change Factor Labels into String in a Data Frame
Equation Numbering in Rmarkdown - for Export to Word
Ggplot2 Wind Time Series with Arrows/Vectors
Unzip Password Protected Zip Files in R
How to Draw a Contour Plot When Data Are Not on a Regular Grid
Shading Area Between Two Lines in R
How to Load Xlsx File Using Fread Function
How to Download a Large Binary File with Rcurl *After* Server Authentication
Inserting Stargazer or Xable Table into Knitr Document
Let Each Plot in Facet_Grid Have Its Own Y-Axis Value
Why Should Someone Use {} for Initializing an Empty Object in R
How to Use the Function Curve in [R] to Graph a Normal Curve
Simple Comparing of Two Texts in R
Transfer Data from Database to Spark Using Sparklyr
Adding Annotation (Segment/Arrow) in Only Certain Facet Ggplot