Using Rcpp Within Parallel Code via Snow to Make a Cluster

Using Rcpp within parallel code via snow to make a cluster

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.

So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.

Hence the advice is to create a local package, install it and have both snow processes load and call it.

(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

Use doParallel with Rcpp function on window inside an R package

Yes, you can put as many functions as you want within the package. The reason for suggesting everything is in an R package is because you would otherwise have to compile the code on every thread or node that you spin your code up on. This is because the Rcpp functions are compiled locally and only have a thread-specific pointer reference. In particular, see the discussion in:
Using Rcpp functions inside of R's par*apply functions from the parallel package.

Sample package would be:

https://github.com/r-pkg-examples/rcpp-and-doparallel

In particular, the R function should correctly setup and teardown the parallel backend.

mean_parallel_compute = function(n, mean = 0, sd = 1,
n_sim = 1000,
n_cores = parallel::detectCores()) {

# Construct cluster
cl = parallel::makeCluster(n_cores)

# After the function is run, close the cluster.
on.exit(parallel::stopCluster(cl))

# Register parallel backend
doParallel::registerDoParallel(cl)

# Compute estimates
estimates = foreach::foreach(i = iterators::icount(n_sim), # Perform n simulations
.combine = "rbind", # Combine results
# Self-load
.packages = "Rcpp2doParallel") %dopar% {
random_data = rnorm(n, mean, sd)

result = mean_rcpp(random_data) # or use Rcpp2doParallel::mean_rcpp()
result
}

estimates
}

To pass R CMD check make sure to have the following roxygen2 import tags:

#' @importFrom foreach %dopar% foreach
#' @importFrom iterators icount
#' @importFrom doParallel registerDoParallel

In addition, make sure DESCRIPTION has the following:

LinkingTo: 
Rcpp
Imports:
doParallel,
Rcpp,
foreach,
iterators,
parallel

Some other examples:

  • Compiling Rcpp functions using ClusterEvalQ
  • Using Rcpp within parallel code via snow to make a cluster

Cannot access parameters in C++ code in parallel code called from Snow

Finally the issue was resolved, and the problem seems to lie with getMPICluster() which works perfectly fine for pure R code, but not as well with Rcpp, as explained above.
Instead using makeMPICluster command

    mc.cores <- max(1, NumberOfNodes*CoresPerNode-1) # minus one for master
cl <- makeMPIcluster(mc.cores)
cat(sprintf("Running with %d workers\n", length(cl)))
clusterCall(cl, function() { library(MyPackage); NULL })
out<-clusterApply(cl,1:mc.cores,MyRFunction)
stopCluster(cl)

Works great! The problem is that you have to manually define the number of nodes and cores per node within the R code, instead of defining it using the mpirun command.



Related Topics



Leave a reply



Submit