Vectorised Rcpp Random Binomial Draws

Vectorised Rcpp random binomial draws

Following Dirk's response here:

Is there a way of fixing the code without using an explicit loop
in the C++ code?

I don't think so. The code currently has this hard-wired: <...> so
until one of us has sufficient [time] to extend this (and test it) will have
to do the loop at your end.

Here's my implementation of a "vectorised" code:

library(Rcpp)
cppFunction("NumericVector cpprbinom(int n, double size, NumericVector prob) { 
    NumericVector v(n);            
    for (int i=0; i<n; i++) {v[i] = as<double>(rbinom(1, size, prob[i]));} 
    return(v); }")
r <- runif(1e6)
all.equal({set.seed(42); rbinom(length(r), 1, r)}, 
          {set.seed(42); cpprbinom(length(r), 1, r)})
#TRUE

But the problem is (again citing Dirk),

And I suggest that before expending a lot of effort on this you check
whether you are likely to do better than the R function rbinom. That
R function is vectorized in C code and you are unlikely to make things
much faster by using Rcpp, unless you want to use the random variates
in another C++ function.

And it is actually slower (x3 on my machine), so at least such naive implementation as mine won't help:

library(microbenchmark)
microbenchmark(rbinom(length(r), 1, r), cpprbinom(length(r), 1, r))

Unit: milliseconds
                       expr       min        lq      mean    median        uq       max neval
    rbinom(length(r), 1, r)  55.50856  56.09292  56.49456  56.45297  56.65897  59.42524   100
 cpprbinom(length(r), 1, r) 117.63761 153.37599 154.94164 154.29623 155.37247 225.56535   100

EDIT: according to Romain's comment below, here's an advanced version, which is faster!

cppFunction(plugins=c("cpp11"), "NumericVector cpprbinom2(int n, double size, NumericVector prob) { 
    NumericVector v = no_init(n);
    std::transform( prob.begin(), prob.end(), v.begin(), [=](double p){ return R::rbinom(size, p); }); 
    return(v);}")
r <- runif(1e6)
all.equal({set.seed(42); rbinom(length(r), 1, r)}, 
          {set.seed(42); cpprbinom(length(r), 1, r)}, 
          {set.seed(42); cpprbinom2(length(r), 1, r)})
#TRUE
microbenchmark(rbinom(length(r), 1, r), cpprbinom(length(r), 1, r), cpprbinom2(length(r), 1, r))

Unit: milliseconds
                        expr       min        lq      mean    median        uq       max neval
     rbinom(length(r), 1, r)  55.26412  56.00314  56.57814  56.28616  56.59561  60.01861   100
  cpprbinom(length(r), 1, r) 113.72513 115.94758 122.81545 117.24708 119.95134 168.47246   100
 cpprbinom2(length(r), 1, r)  36.67589  37.12182  38.95318  37.37436  37.97719  84.73516   100

dqrng with Rcpp for drawing from a normal and a binomial distribution

A simple solution would be to create a new distribution object within the loop:

// [[Rcpp::depends(dqrng, BH, RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <boost/random/binomial_distribution.hpp>
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
arma::mat parallel_random_matrix(int n, int m, int ncores, double p=0.5) {
  dqrng::xoshiro256plus rng(42);
  arma::mat out(n,m);
  // ok to use rng here

#pragma omp parallel num_threads(ncores)
{
  dqrng::xoshiro256plus lrng(rng);      // make thread local copy of rng 
  lrng.jump(omp_get_thread_num() + 1);  // advance rng by 1 ... ncores jumps 

#pragma omp for
  for (int i = 0; i < m; ++i) {
    for (int j = 0; j < n; ++j) {
      // p can be a function of i and j
      boost::random::binomial_distribution<int> dist(1,p);
      auto gen = std::bind(dist, std::ref(lrng));
      out(j,i) = gen();
    }
  }
}
  // ok to use rng here
  return out;
}

/*** R
parallel_random_matrix(5, 5, 4, 0.75)
*/

This way you can make p depend on i and j. It might be more efficient to keep one global dist object and reconfigure it within the loop, see here for a similar question.

Drawing from a binomial distribution conditional on a vector drawn from standard normal distribution

X will vary from -Inf to +Inf so that will not work as the probability you need for the binomial distribution, but pnorm(X) would give you values from 0 to 1, then use ifelse() statements to convert values below .05 to .05 and over .95 to .95:

X <- rnorm(100)   # 0, 1 are the default values
Xp <- pnorm(X)    # transform to probabilities
Xp <- ifelse(Xp < .05, .05, ifelse(Xp > .95, .95, Xp)) # Bound the results
vals <- rbinom(length(Xp), 1, Xp)  # Your binary values

Quick evaluation of binomial likelihood in Rcpp

Your implementation looks fine. As R's dbinom() is already implemented in efficient C code, you probably won't significantly improve on it. I do see a couple of things that might make small differences (which, when you're doing this a lot of times, might help):

You can use [ii] rather than (ii) to avoid bounds checking, as it sounds like you're in a situation where you don't have to worry about that (i.e., this will not be a user-called function, it would only be called within your C++ code where presumably your objects are set up in such a way that this won't be a problem)
You can pass by reference rather than by value (see, e.g. here)

So, I add the following version of your function:

// [[Rcpp::export]]
NumericVector eval_likelihood2(const arma::vec& Yi,
                               const arma::vec& Ni,
                               const arma::vec& prob){

    // length of vector
    int N = prob.n_rows;

    // storage for evaluated log likelihoods
    NumericVector eval(N);

    for(int ii = 0; ii < N; ii++){

        int y = Yi[ii]; // no. of successes
        int n = Ni[ii]; // no. of trials
        double p = prob[ii]; // success probability

        eval[ii] = R::dbinom(y,n,p,1); // argument 4 is set to true to return log-likelihood

    }

    return eval;

}

You can see I've just changed those two things.

I also use slightly bigger data for benchmark, though I also add in benchmark for your original smaller example too:

Rcpp::sourceCpp("so.cpp") #source Rcpp script

# fake data
Yi    = 1:99999
Ni    = 2:100000
probs = runif(99999)

evalR     = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp  = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
evalRcpp2 = eval_likelihood(Yi, Ni, probs) # my Rcpp solution

identical(evalR,evalRcpp)
# [1] TRUE
identical(evalR,evalRcpp2)
# [1] TRUE

microbenchmark::microbenchmark(R     = dbinom(Yi, Ni, probs, log = T),
                               Rcpp  = eval_likelihood(Yi, Ni, probs),
                               Rcpp2 = eval_likelihood2(Yi, Ni, probs))

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
     R 7.427669 7.577011 8.565015 7.650762 7.916891 62.63154   100
  Rcpp 7.368547 7.858408 8.884823 8.014881 8.353808 63.48417   100
 Rcpp2 6.952519 7.256376 7.859609 7.376959 7.829000 12.51065   100

Yi    = 1:999
Ni    = 2:1000
probs = runif(999)
microbenchmark::microbenchmark(R     = dbinom(Yi, Ni, probs, log = T),
                               Rcpp  = eval_likelihood(Yi, Ni, probs),
                               Rcpp2 = eval_likelihood2(Yi, Ni, probs))

Unit: microseconds
  expr    min       lq     mean   median       uq     max neval
     R 90.073 100.5035 113.5084 109.5230 122.5260 188.304   100
  Rcpp 90.188  97.8565 112.9082 105.2505 122.4255 172.975   100
 Rcpp2 86.093  92.0745 103.9474  97.9380 113.2660 148.591   100

How can I replicate random draws in RcppArmadillo?

The function arma::randn() is not connected to the R RNGs, so calling set.seed() has
no influence on it.

What we do in Rcpp is to take advantage of the fine R API which permits us to access the same RNGs from R and C++. And by being careful with RNGScope instances (which get inserted automagically) the RNG state is always correct between R and C++.

But you simply cannot assume that any other third-party RNG (here: Arma's) was automatically aligned as well. Moreover, in this particular case, Conrad's documentation for Armadillo is clear:

To change the seed, use the std::srand() function

To clarify (Hi, @DWin) here is full R and C++ example:

R> set.seed(42); rnorm(5)           ## Five N(0,1) draws in R
[1]  1.3710 -0.5647  0.3631  0.6329  0.4043
R> cppFunction('NumericVector foo(int n) { return rnorm(n); }')
R> set.seed(42); foo(5)             ## Five N(0,1) draws from C++ fun.
[1]  1.3710 -0.5647  0.3631  0.6329  0.4043
R>

We get the same numbers via R and C++ as we a) seed the RNGs identically and b) do in fact call the same RNG provided by R.

Not proper use of Random Generator for Gamma distribution draws in Rcpp

The Rcpp package interfaces the random number generators (and special functions) from R, while properly interfacing the (high-quality) RNG inside R itself.

So I recommend you rely on that as it is in fact easy to use:

> Rcpp::cppFunction("NumericVector mygamma(int n, double a, double b) { return Rcpp::rgamma(n, a, b); }")
> set.seed(123)
> mygamma(3, 0.5, 0.5)
[1] 0.05796143 1.14034608 0.00742452
> mygamma(3, 0.5, 0.5)
[1] 0.0753645 0.2474057 0.0151682
> set.seed(123)
> mygamma(3, 0.5, 0.5)
[1] 0.05796143 1.14034608 0.00742452
>

Resetting the seed gets us the same draw, not resetting gets us different draws. As it should. Also:

> set.seed(123)
> rgamma(3, 0.5, 2.0)
[1] 0.05796143 1.14034608 0.00742452
>

Note that the second parameter for the Gamma is used as a reciprocal value between the R version and the compiled version.

Edit: For completeness, a working version with C++11 RNG follows. Obviously we cannot double the values from R.

Code

#include <random>
#include <Rcpp.h>

std::mt19937 engine(123);

// [[Rcpp::export]]
Rcpp::NumericVector mygamma(int n, double a, double b) {
  Rcpp::NumericVector v(n);
  std::gamma_distribution<> gamma(a, b);
  for (auto i=0; i<n; i++) {
    v[i] = gamma(engine);
  }
  return v;
}

/*** R
mygamma(3, 0.5, 0.5)
mygamma(3, 0.5, 0.5)
*/

Output

> Rcpp::sourceCpp("~/git/stackoverflow/68235476/answer.cpp")

> mygamma(3, 0.5, 0.5)
[1] 0.168918 1.254678 0.311767

> mygamma(3, 0.5, 0.5)
[1] 0.0451722 0.5485451 0.0443048
>

Rcpp - speeding up random normal draws within for and while loop

You can use RcppZiggurat for faster RNG draws -- I have timing comparisons in the package:

R> library(RcppZiggurat)
R> library(microbenchmark)
R> microbenchmark(rnorm(1e5), zrnorm(1e5))
Unit: microseconds
          expr      min       lq    mean   median       uq      max neval cld
  rnorm(1e+05) 6148.781 6169.917 6537.31 6190.073 6923.357 10166.96   100   b
 zrnorm(1e+05)  719.458  887.554 1016.03  901.182  939.652  2880.47   100  a 
R>

This RNG can be used in other packages at the C++ level too. It is just a header you pull in the usual way.

RCPP Function for Generation Random Walk in 4 Directions

I would use

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerMatrix StepstoShop(double Steps){

  IntegerVector possible_x = IntegerVector::create(0, 1, -1, 0);
  IntegerVector possible_y = IntegerVector::create(1, 0, 0, -1);
  IntegerMatrix SampleGen(Steps, 2);
  int ind;

  for (int i = 0; i < Steps; i++) {
    ind = R::runif(0, 4);
    SampleGen(i, 0) = possible_x[ind];
    SampleGen(i, 1) = possible_y[ind];
  }

  return SampleGen;
}

Vectorized Rcpp rbinom with probabilities in Armadillo matrix

Thank you so much. I also used this

   umat binom4(mat& prob){
     int n=prob.n_rows;
     mat temp(n,n,fill::randu);
     return (temp<prob);
   }

I think it is a bit more faster

microbenchmark(rbinom(nrow(z)^2,1,z),binom1(z),binom2(z),binom3(z),binom4(z))

               expr       min        lq        mean    median       uq     max       neval
rbinom(nrow(z)^2, 1, z)  94.24809  95.29728  97.24977  95.86829  98.19758 108.30877   100
              binom1(z) 130.20266 132.48951 138.07100 134.03693 137.34613 297.86393   100
              binom2(z) 164.96716 168.17024 175.89784 170.29310 173.93890 338.99306   100
              binom3(z)  64.57977  64.78340  67.03158  65.81533  67.42386  92.31300   100
              binom4(z)  29.66925  31.44107  32.81296  31.77392  33.31575  55.65539   100

Vectorised Rcpp Random Binomial Draws