Using Rcpp Functions Inside of R's Par*Apply Functions from the Parallel Package

Using Rcpp functions inside of R's par*apply functions from the parallel package

The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.

Traditionally, the clusterExport() statement allows for R specific code to be distributed to workers.

By using clusterExport() on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp function on the worker, R cannot find the correct shared library.

Take the previous question's function:

Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
return pmax(data - strike, 0);
}")

Note: I'm using cppFunction() instead of sourceCpp() but the results are equivalent since cppFunction() calls sourceCpp() to create the function.

Typing the function name:

payoff

Yields the R declaration with a shared library pointer.

function (strike, data) 
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)

This shared library is only available on process that compiled the function.

Hence, why it is always ideal to embed compiled code within a package and then distribute the package.

Using Rcpp function in parLapply on Windows

You need to run the sourceCpp() call in each spawned process, or else get them your code. Right now the main process has the function, the spawned workers do not.

Easiest way is by building a package and have it loaded by each worker process.

Use doParallel with Rcpp function on window inside an R package

Yes, you can put as many functions as you want within the package. The reason for suggesting everything is in an R package is because you would otherwise have to compile the code on every thread or node that you spin your code up on. This is because the Rcpp functions are compiled locally and only have a thread-specific pointer reference. In particular, see the discussion in:
Using Rcpp functions inside of R's par*apply functions from the parallel package.

Sample package would be:

https://github.com/r-pkg-examples/rcpp-and-doparallel

In particular, the R function should correctly setup and teardown the parallel backend.

mean_parallel_compute = function(n, mean = 0, sd = 1,
n_sim = 1000,
n_cores = parallel::detectCores()) {

# Construct cluster
cl = parallel::makeCluster(n_cores)

# After the function is run, close the cluster.
on.exit(parallel::stopCluster(cl))

# Register parallel backend
doParallel::registerDoParallel(cl)

# Compute estimates
estimates = foreach::foreach(i = iterators::icount(n_sim), # Perform n simulations
.combine = "rbind", # Combine results
# Self-load
.packages = "Rcpp2doParallel") %dopar% {
random_data = rnorm(n, mean, sd)

result = mean_rcpp(random_data) # or use Rcpp2doParallel::mean_rcpp()
result
}

estimates
}

To pass R CMD check make sure to have the following roxygen2 import tags:

#' @importFrom foreach %dopar% foreach
#' @importFrom iterators icount
#' @importFrom doParallel registerDoParallel

In addition, make sure DESCRIPTION has the following:

LinkingTo: 
Rcpp
Imports:
doParallel,
Rcpp,
foreach,
iterators,
parallel

Some other examples:

  • Compiling Rcpp functions using ClusterEvalQ
  • Using Rcpp within parallel code via snow to make a cluster

foreach with Rcpp in R package error: simpleError in .Call( function_name ... function name not available for .Call() for package package

The GitHub repo rcpp-and-doparallel provided the solution.

I will demonstrate here how I modified my package - the corresponding commit in the rnormpar repo has commit message "Solved parallelization".

First, I modified the R script titled rnorm_package.R that I created for registering my cpp functions to mirror that of the rcpp-and-doparallel package:

#' @keywords internal
"_PACKAGE"

# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @useDynLib rnormpar, .registration = TRUE
#' @importFrom Rcpp sourceCpp
## usethis namespace: end
NULL

I then deleted and re-generated my NAMESPACE using devtools::document(). This caused the following lines to be added to NAMESPACE:

importFrom(Rcpp,sourceCpp)
useDynLib(rnormpar, .registration = TRUE)

If these lines are already in the NAMESPACE, then the first two steps are perhaps not necessary.

Finally, I modified the arguments to the foreach function so that my package was passed to the workers:

norm_mat_par <- function(){

nworkers <- parallel::detectCores() - 1

cl <- parallel::makeCluster(nworkers)

doParallel::registerDoParallel(cl)

x <- foreach::`%dopar%`(
foreach::foreach(j = 1:5, .packages = "rnormpar"),
{
norm_mat()
})

parallel::stopCluster(cl)

return(x)
}

After building the package, the function produces the expected output:

Restarting R session...

> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
[,1]
[1,] -1.948502

[[2]]
[,1]
[1,] -0.2774582

[[3]]
[,1]
[1,] 0.1710537

[[4]]
[,1]
[1,] 1.784761

[[5]]
[,1]
[1,] -0.5694733

calling a user-defined R function from C++ using Rcpp

You declare that the function should return an int, but use wrap which indicates the object returned should be a SEXP. Moreover, calling an R function from Rcpp (through Function) also returns a SEXP.

You want something like:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
SEXP mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return mySum(Rcpp::Named("x", x), Rcpp::Named("y", y));
}

(or, leave function return as int and use as<int> in place of wrap).

That said, this is kind of non-idiomatic Rcpp code. Remember that calling R functions from C++ is still going to be slow.

Allow Rcpp functions in DEoptim for R

I found that by adding the Rcpp function inside the main DEoptim function (Calibrate) worked. The Calibrate function looked like:

Calibrate <- function(x) {
eastfunc <<- 'NumericMatrix eastC(NumericMatrix e, NumericMatrix zerocolmatrix, NumericMatrix zerorowmatrix) {
int ecoln = e.ncol();
int ecolnlessone = ecoln - 1;
int erown = e.nrow();
int erownlessone = erown - 1;

NumericMatrix eout(e.nrow(),e.ncol()) ;
for (int j = 0;j < ecoln;j++) {
if (j > 0) {
eout(_,j) = e(_,j-1);
} else {
eout(_,j) = e(_,0);
}
}
eout(_,0) = zerocolmatrix(_,0);
return eout;
}'
eastC <<- cppFunction(eastfunc)

cmax <<- x[1]
Cr <<- x[2]
Cl <<- x[3]
Crb <<- x[4]
Clb <<- x[5]
returnflowriver <<- x[6]
returnflowland <<- x[7]
kd <<- x[8]
startyear()
-NashSutcliffe
}

and then running DEoptim as:

ans <- DEoptimone(Calibrate,lower,upper,DEoptim.control(trace=TRUE,parallelType=1,parVar=c(parVarnames),packages=c("raster","rgdal","maptools","matrixcalc","Rcpp","RcppArmadillo","moveCpp")))

Using Rcpp within parallel code via snow to make a cluster

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.

So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.

Hence the advice is to create a local package, install it and have both snow processes load and call it.

(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)



Related Topics



Leave a reply



Submit