Using Rcpp within parallel code via snow to make a cluster
Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)
Use doParallel with Rcpp function on window inside an R package
Yes, you can put as many functions as you want within the package. The reason for suggesting everything is in an R package is because you would otherwise have to compile the code on every thread or node that you spin your code up on. This is because the Rcpp functions are compiled locally and only have a thread-specific pointer reference. In particular, see the discussion in:
Using Rcpp functions inside of R's par*apply functions from the parallel package.
Sample package would be:
https://github.com/r-pkg-examples/rcpp-and-doparallel
In particular, the R function should correctly setup and teardown the parallel backend.
mean_parallel_compute = function(n, mean = 0, sd = 1,
n_sim = 1000,
n_cores = parallel::detectCores()) {
# Construct cluster
cl = parallel::makeCluster(n_cores)
# After the function is run, close the cluster.
on.exit(parallel::stopCluster(cl))
# Register parallel backend
doParallel::registerDoParallel(cl)
# Compute estimates
estimates = foreach::foreach(i = iterators::icount(n_sim), # Perform n simulations
.combine = "rbind", # Combine results
# Self-load
.packages = "Rcpp2doParallel") %dopar% {
random_data = rnorm(n, mean, sd)
result = mean_rcpp(random_data) # or use Rcpp2doParallel::mean_rcpp()
result
}
estimates
}
To pass R CMD check
make sure to have the following roxygen2
import tags:
#' @importFrom foreach %dopar% foreach
#' @importFrom iterators icount
#' @importFrom doParallel registerDoParallel
In addition, make sure DESCRIPTION
has the following:
LinkingTo:
Rcpp
Imports:
doParallel,
Rcpp,
foreach,
iterators,
parallel
Some other examples:
- Compiling Rcpp functions using ClusterEvalQ
- Using Rcpp within parallel code via snow to make a cluster
Cannot access parameters in C++ code in parallel code called from Snow
Finally the issue was resolved, and the problem seems to lie with getMPICluster() which works perfectly fine for pure R code, but not as well with Rcpp, as explained above.
Instead using makeMPICluster command
mc.cores <- max(1, NumberOfNodes*CoresPerNode-1) # minus one for master
cl <- makeMPIcluster(mc.cores)
cat(sprintf("Running with %d workers\n", length(cl)))
clusterCall(cl, function() { library(MyPackage); NULL })
out<-clusterApply(cl,1:mc.cores,MyRFunction)
stopCluster(cl)
Works great! The problem is that you have to manually define the number of nodes and cores per node within the R code, instead of defining it using the mpirun command.
Related Topics
Fast Pairwise Simple Linear Regression Between Variables in a Data Frame
Deleting Reversed Duplicates with R
What Is the Algorithm Behind R Core's 'Split' Function
Convert Data.Frame Column to a Vector
Add New Row to Dataframe, at Specific Row-Index, Not Appended
How to Search for "R" Materials
Use Trycatch Skip to Next Value of Loop Upon Error
Dplyr Mutate with Conditional Values
Check for Installed Packages Before Running Install.Packages()
How to Wait for a Keypress in R
Add (Insert) a Column Between Two Columns in a Data.Frame
How to Use Objects from Global Environment in Rstudio Markdown
Options for Caching/Memoization/Hashing in R