Using Rcpp functions inside of R's par*apply functions from the parallel package
The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.
Traditionally, the clusterExport()
statement allows for R specific code to be distributed to workers.
By using clusterExport()
on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB
given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp
function on the worker, R cannot find the correct shared library.
Take the previous question's function:
Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
return pmax(data - strike, 0);
}")
Note: I'm using cppFunction()
instead of sourceCpp()
but the results are equivalent since cppFunction()
calls sourceCpp()
to create the function.
Typing the function name:
payoff
Yields the R declaration with a shared library pointer.
function (strike, data)
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)
This shared library is only available on process that compiled the function.
Hence, why it is always ideal to embed compiled code within a package and then distribute the package.
Using Rcpp function in parLapply on Windows
You need to run the sourceCpp()
call in each spawned process, or else get them your code. Right now the main process has the function, the spawned workers do not.
Easiest way is by building a package and have it loaded by each worker process.
Use doParallel with Rcpp function on window inside an R package
Yes, you can put as many functions as you want within the package. The reason for suggesting everything is in an R package is because you would otherwise have to compile the code on every thread or node that you spin your code up on. This is because the Rcpp functions are compiled locally and only have a thread-specific pointer reference. In particular, see the discussion in:
Using Rcpp functions inside of R's par*apply functions from the parallel package.
Sample package would be:
https://github.com/r-pkg-examples/rcpp-and-doparallel
In particular, the R function should correctly setup and teardown the parallel backend.
mean_parallel_compute = function(n, mean = 0, sd = 1,
n_sim = 1000,
n_cores = parallel::detectCores()) {
# Construct cluster
cl = parallel::makeCluster(n_cores)
# After the function is run, close the cluster.
on.exit(parallel::stopCluster(cl))
# Register parallel backend
doParallel::registerDoParallel(cl)
# Compute estimates
estimates = foreach::foreach(i = iterators::icount(n_sim), # Perform n simulations
.combine = "rbind", # Combine results
# Self-load
.packages = "Rcpp2doParallel") %dopar% {
random_data = rnorm(n, mean, sd)
result = mean_rcpp(random_data) # or use Rcpp2doParallel::mean_rcpp()
result
}
estimates
}
To pass R CMD check
make sure to have the following roxygen2
import tags:
#' @importFrom foreach %dopar% foreach
#' @importFrom iterators icount
#' @importFrom doParallel registerDoParallel
In addition, make sure DESCRIPTION
has the following:
LinkingTo:
Rcpp
Imports:
doParallel,
Rcpp,
foreach,
iterators,
parallel
Some other examples:
- Compiling Rcpp functions using ClusterEvalQ
- Using Rcpp within parallel code via snow to make a cluster
foreach with Rcpp in R package error: simpleError in .Call( function_name ... function name not available for .Call() for package package
The GitHub repo rcpp-and-doparallel provided the solution.
I will demonstrate here how I modified my package - the corresponding commit in the rnormpar
repo has commit message "Solved parallelization".
First, I modified the R script titled rnorm_package.R
that I created for registering my cpp
functions to mirror that of the rcpp-and-doparallel
package:
#' @keywords internal
"_PACKAGE"
# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @useDynLib rnormpar, .registration = TRUE
#' @importFrom Rcpp sourceCpp
## usethis namespace: end
NULL
I then deleted and re-generated my NAMESPACE
using devtools::document()
. This caused the following lines to be added to NAMESPACE
:
importFrom(Rcpp,sourceCpp)
useDynLib(rnormpar, .registration = TRUE)
If these lines are already in the NAMESPACE
, then the first two steps are perhaps not necessary.
Finally, I modified the arguments to the foreach
function so that my package was passed to the workers:
norm_mat_par <- function(){
nworkers <- parallel::detectCores() - 1
cl <- parallel::makeCluster(nworkers)
doParallel::registerDoParallel(cl)
x <- foreach::`%dopar%`(
foreach::foreach(j = 1:5, .packages = "rnormpar"),
{
norm_mat()
})
parallel::stopCluster(cl)
return(x)
}
After building the package, the function produces the expected output:
Restarting R session...
> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
[,1]
[1,] -1.948502
[[2]]
[,1]
[1,] -0.2774582
[[3]]
[,1]
[1,] 0.1710537
[[4]]
[,1]
[1,] 1.784761
[[5]]
[,1]
[1,] -0.5694733
calling a user-defined R function from C++ using Rcpp
You declare that the function should return an int
, but use wrap
which indicates the object returned should be a SEXP
. Moreover, calling an R function from Rcpp
(through Function
) also returns a SEXP
.
You want something like:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return mySum(Rcpp::Named("x", x), Rcpp::Named("y", y));
}
(or, leave function return as int
and use as<int>
in place of wrap
).
That said, this is kind of non-idiomatic Rcpp
code. Remember that calling R functions from C++ is still going to be slow.
Allow Rcpp functions in DEoptim for R
I found that by adding the Rcpp function inside the main DEoptim function (Calibrate) worked. The Calibrate function looked like:
Calibrate <- function(x) {
eastfunc <<- 'NumericMatrix eastC(NumericMatrix e, NumericMatrix zerocolmatrix, NumericMatrix zerorowmatrix) {
int ecoln = e.ncol();
int ecolnlessone = ecoln - 1;
int erown = e.nrow();
int erownlessone = erown - 1;
NumericMatrix eout(e.nrow(),e.ncol()) ;
for (int j = 0;j < ecoln;j++) {
if (j > 0) {
eout(_,j) = e(_,j-1);
} else {
eout(_,j) = e(_,0);
}
}
eout(_,0) = zerocolmatrix(_,0);
return eout;
}'
eastC <<- cppFunction(eastfunc)
cmax <<- x[1]
Cr <<- x[2]
Cl <<- x[3]
Crb <<- x[4]
Clb <<- x[5]
returnflowriver <<- x[6]
returnflowland <<- x[7]
kd <<- x[8]
startyear()
-NashSutcliffe
}
and then running DEoptim as:
ans <- DEoptimone(Calibrate,lower,upper,DEoptim.control(trace=TRUE,parallelType=1,parVar=c(parVarnames),packages=c("raster","rgdal","maptools","matrixcalc","Rcpp","RcppArmadillo","moveCpp")))
Using Rcpp within parallel code via snow to make a cluster
Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)
Related Topics
Installing R 3.5.0 with --Enable-R-Shlib
How to Play Birthday Music Using R
How to Automatically Include All 2-Way Interactions in a Glm Model in R
Reading Hdf Files into R and Converting Them to Geotiff Rasters
Accessing Excel File from Sharepoint with R
Can R Read from a File Through an Ssh Connection
Multiple Functions in a Single Tapply or Aggregate Statement
Sort Data Frame Column by Factor
Convert R List to Dataframe with Missing/Null Elements
How to Get a Warning on "Shiny App Will Not Work If the Same Output Is Used Twice"
Plot Random Effects from Lmer (Lme4 Package) Using Qqmath or Dotplot: How to Make It Look Fancy
R Not Finding Package Even After Package Installation
Marking Specific Tiles in Geom_Tile()/Geom_Raster()
Align Violin Plots with Dodged Box Plots
How to Get Factor Matrices in R
Summarise_At Using Different Functions for Different Variables