Vectorised Rcpp Random Binomial Draws

Vectorised Rcpp random binomial draws

Following Dirk's response here:

Is there a way of fixing the code without using an explicit loop
in the C++ code?


I don't think so. The code currently has this hard-wired: <...> so
until one of us has sufficient [time] to extend this (and test it) will have
to do the loop at your end.

Here's my implementation of a "vectorised" code:

library(Rcpp)
cppFunction("NumericVector cpprbinom(int n, double size, NumericVector prob) {
NumericVector v(n);
for (int i=0; i<n; i++) {v[i] = as<double>(rbinom(1, size, prob[i]));}
return(v); }")
r <- runif(1e6)
all.equal({set.seed(42); rbinom(length(r), 1, r)},
{set.seed(42); cpprbinom(length(r), 1, r)})
#TRUE

But the problem is (again citing Dirk),

And I suggest that before expending a lot of effort on this you check
whether you are likely to do better than the R function rbinom. That
R function is vectorized in C code and you are unlikely to make things
much faster by using Rcpp, unless you want to use the random variates
in another C++ function.

And it is actually slower (x3 on my machine), so at least such naive implementation as mine won't help:

library(microbenchmark)
microbenchmark(rbinom(length(r), 1, r), cpprbinom(length(r), 1, r))

Unit: milliseconds
expr min lq mean median uq max neval
rbinom(length(r), 1, r) 55.50856 56.09292 56.49456 56.45297 56.65897 59.42524 100
cpprbinom(length(r), 1, r) 117.63761 153.37599 154.94164 154.29623 155.37247 225.56535 100

EDIT: according to Romain's comment below, here's an advanced version, which is faster!

cppFunction(plugins=c("cpp11"), "NumericVector cpprbinom2(int n, double size, NumericVector prob) { 
NumericVector v = no_init(n);
std::transform( prob.begin(), prob.end(), v.begin(), [=](double p){ return R::rbinom(size, p); });
return(v);}")
r <- runif(1e6)
all.equal({set.seed(42); rbinom(length(r), 1, r)},
{set.seed(42); cpprbinom(length(r), 1, r)},
{set.seed(42); cpprbinom2(length(r), 1, r)})
#TRUE
microbenchmark(rbinom(length(r), 1, r), cpprbinom(length(r), 1, r), cpprbinom2(length(r), 1, r))

Unit: milliseconds
expr min lq mean median uq max neval
rbinom(length(r), 1, r) 55.26412 56.00314 56.57814 56.28616 56.59561 60.01861 100
cpprbinom(length(r), 1, r) 113.72513 115.94758 122.81545 117.24708 119.95134 168.47246 100
cpprbinom2(length(r), 1, r) 36.67589 37.12182 38.95318 37.37436 37.97719 84.73516 100

dqrng with Rcpp for drawing from a normal and a binomial distribution

A simple solution would be to create a new distribution object within the loop:

// [[Rcpp::depends(dqrng, BH, RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <boost/random/binomial_distribution.hpp>
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
arma::mat parallel_random_matrix(int n, int m, int ncores, double p=0.5) {
dqrng::xoshiro256plus rng(42);
arma::mat out(n,m);
// ok to use rng here

#pragma omp parallel num_threads(ncores)
{
dqrng::xoshiro256plus lrng(rng); // make thread local copy of rng
lrng.jump(omp_get_thread_num() + 1); // advance rng by 1 ... ncores jumps

#pragma omp for
for (int i = 0; i < m; ++i) {
for (int j = 0; j < n; ++j) {
// p can be a function of i and j
boost::random::binomial_distribution<int> dist(1,p);
auto gen = std::bind(dist, std::ref(lrng));
out(j,i) = gen();
}
}
}
// ok to use rng here
return out;
}

/*** R
parallel_random_matrix(5, 5, 4, 0.75)
*/

This way you can make p depend on i and j. It might be more efficient to keep one global dist object and reconfigure it within the loop, see here for a similar question.

Drawing from a binomial distribution conditional on a vector drawn from standard normal distribution

X will vary from -Inf to +Inf so that will not work as the probability you need for the binomial distribution, but pnorm(X) would give you values from 0 to 1, then use ifelse() statements to convert values below .05 to .05 and over .95 to .95:

X <- rnorm(100)   # 0, 1 are the default values
Xp <- pnorm(X) # transform to probabilities
Xp <- ifelse(Xp < .05, .05, ifelse(Xp > .95, .95, Xp)) # Bound the results
vals <- rbinom(length(Xp), 1, Xp) # Your binary values

Quick evaluation of binomial likelihood in Rcpp

Your implementation looks fine. As R's dbinom() is already implemented in efficient C code, you probably won't significantly improve on it. I do see a couple of things that might make small differences (which, when you're doing this a lot of times, might help):

  • You can use [ii] rather than (ii) to avoid bounds checking, as it sounds like you're in a situation where you don't have to worry about that (i.e., this will not be a user-called function, it would only be called within your C++ code where presumably your objects are set up in such a way that this won't be a problem)
  • You can pass by reference rather than by value (see, e.g. here)

So, I add the following version of your function:

// [[Rcpp::export]]
NumericVector eval_likelihood2(const arma::vec& Yi,
const arma::vec& Ni,
const arma::vec& prob){

// length of vector
int N = prob.n_rows;

// storage for evaluated log likelihoods
NumericVector eval(N);

for(int ii = 0; ii < N; ii++){

int y = Yi[ii]; // no. of successes
int n = Ni[ii]; // no. of trials
double p = prob[ii]; // success probability

eval[ii] = R::dbinom(y,n,p,1); // argument 4 is set to true to return log-likelihood

}

return eval;

}

You can see I've just changed those two things.

I also use slightly bigger data for benchmark, though I also add in benchmark for your original smaller example too:

Rcpp::sourceCpp("so.cpp") #source Rcpp script

# fake data
Yi = 1:99999
Ni = 2:100000
probs = runif(99999)

evalR = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
evalRcpp2 = eval_likelihood(Yi, Ni, probs) # my Rcpp solution

identical(evalR,evalRcpp)
# [1] TRUE
identical(evalR,evalRcpp2)
# [1] TRUE

microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))

Unit: milliseconds
expr min lq mean median uq max neval
R 7.427669 7.577011 8.565015 7.650762 7.916891 62.63154 100
Rcpp 7.368547 7.858408 8.884823 8.014881 8.353808 63.48417 100
Rcpp2 6.952519 7.256376 7.859609 7.376959 7.829000 12.51065 100

Yi = 1:999
Ni = 2:1000
probs = runif(999)
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))

Unit: microseconds
expr min lq mean median uq max neval
R 90.073 100.5035 113.5084 109.5230 122.5260 188.304 100
Rcpp 90.188 97.8565 112.9082 105.2505 122.4255 172.975 100
Rcpp2 86.093 92.0745 103.9474 97.9380 113.2660 148.591 100

How can I replicate random draws in RcppArmadillo?

The function arma::randn() is not connected to the R RNGs, so calling set.seed() has
no influence on it.

What we do in Rcpp is to take advantage of the fine R API which permits us to access the same RNGs from R and C++. And by being careful with RNGScope instances (which get inserted automagically) the RNG state is always correct between R and C++.

But you simply cannot assume that any other third-party RNG (here: Arma's) was automatically aligned as well. Moreover, in this particular case, Conrad's documentation for Armadillo is clear:

To change the seed, use the std::srand() function

To clarify (Hi, @DWin) here is full R and C++ example:

R> set.seed(42); rnorm(5)           ## Five N(0,1) draws in R
[1] 1.3710 -0.5647 0.3631 0.6329 0.4043
R> cppFunction('NumericVector foo(int n) { return rnorm(n); }')
R> set.seed(42); foo(5) ## Five N(0,1) draws from C++ fun.
[1] 1.3710 -0.5647 0.3631 0.6329 0.4043
R>

We get the same numbers via R and C++ as we a) seed the RNGs identically and b) do in fact call the same RNG provided by R.

Not proper use of Random Generator for Gamma distribution draws in Rcpp

The Rcpp package interfaces the random number generators (and special functions) from R, while properly interfacing the (high-quality) RNG inside R itself.

So I recommend you rely on that as it is in fact easy to use:

> Rcpp::cppFunction("NumericVector mygamma(int n, double a, double b) { return Rcpp::rgamma(n, a, b); }")
> set.seed(123)
> mygamma(3, 0.5, 0.5)
[1] 0.05796143 1.14034608 0.00742452
> mygamma(3, 0.5, 0.5)
[1] 0.0753645 0.2474057 0.0151682
> set.seed(123)
> mygamma(3, 0.5, 0.5)
[1] 0.05796143 1.14034608 0.00742452
>

Resetting the seed gets us the same draw, not resetting gets us different draws. As it should. Also:

> set.seed(123)
> rgamma(3, 0.5, 2.0)
[1] 0.05796143 1.14034608 0.00742452
>

Note that the second parameter for the Gamma is used as a reciprocal value between the R version and the compiled version.

Edit: For completeness, a working version with C++11 RNG follows. Obviously we cannot double the values from R.

Code

#include <random>
#include <Rcpp.h>

std::mt19937 engine(123);

// [[Rcpp::export]]
Rcpp::NumericVector mygamma(int n, double a, double b) {
Rcpp::NumericVector v(n);
std::gamma_distribution<> gamma(a, b);
for (auto i=0; i<n; i++) {
v[i] = gamma(engine);
}
return v;
}

/*** R
mygamma(3, 0.5, 0.5)
mygamma(3, 0.5, 0.5)
*/

Output

> Rcpp::sourceCpp("~/git/stackoverflow/68235476/answer.cpp")

> mygamma(3, 0.5, 0.5)
[1] 0.168918 1.254678 0.311767

> mygamma(3, 0.5, 0.5)
[1] 0.0451722 0.5485451 0.0443048
>

Rcpp - speeding up random normal draws within for and while loop

You can use RcppZiggurat for faster RNG draws -- I have timing comparisons in the package:

R> library(RcppZiggurat)
R> library(microbenchmark)
R> microbenchmark(rnorm(1e5), zrnorm(1e5))
Unit: microseconds
expr min lq mean median uq max neval cld
rnorm(1e+05) 6148.781 6169.917 6537.31 6190.073 6923.357 10166.96 100 b
zrnorm(1e+05) 719.458 887.554 1016.03 901.182 939.652 2880.47 100 a
R>

This RNG can be used in other packages at the C++ level too. It is just a header you pull in the usual way.

RCPP Function for Generation Random Walk in 4 Directions

I would use

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerMatrix StepstoShop(double Steps){

IntegerVector possible_x = IntegerVector::create(0, 1, -1, 0);
IntegerVector possible_y = IntegerVector::create(1, 0, 0, -1);
IntegerMatrix SampleGen(Steps, 2);
int ind;

for (int i = 0; i < Steps; i++) {
ind = R::runif(0, 4);
SampleGen(i, 0) = possible_x[ind];
SampleGen(i, 1) = possible_y[ind];
}

return SampleGen;
}

Vectorized Rcpp rbinom with probabilities in Armadillo matrix

Thank you so much. I also used this

   umat binom4(mat& prob){
int n=prob.n_rows;
mat temp(n,n,fill::randu);
return (temp<prob);
}

I think it is a bit more faster

microbenchmark(rbinom(nrow(z)^2,1,z),binom1(z),binom2(z),binom3(z),binom4(z))

expr min lq mean median uq max neval
rbinom(nrow(z)^2, 1, z) 94.24809 95.29728 97.24977 95.86829 98.19758 108.30877 100
binom1(z) 130.20266 132.48951 138.07100 134.03693 137.34613 297.86393 100
binom2(z) 164.96716 168.17024 175.89784 170.29310 173.93890 338.99306 100
binom3(z) 64.57977 64.78340 67.03158 65.81533 67.42386 92.31300 100
binom4(z) 29.66925 31.44107 32.81296 31.77392 33.31575 55.65539 100


Related Topics



Leave a reply



Submit