Can't Run Rcpp Function in Foreach - "Null Value Passed as Symbol Address"

NULL value passed as symbol address error in foreach loop R

You cannot directly use a SpatRaster in parallelization. See this github issue. Some methods have built in support (predict, app, lapp and tapp) and there are different ways to approach this in other cases (perhaps see this issue).

For example, it may work if you use this line

bi_2021 <- rast('G:\\GridMet_Yearly\\bi_2021.nc')

inside the foreach loop.

Creating a simple Rcpp package with dependency with other Rcpp package

The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:

set.seed(42)
n <- 1000
A<-matrix(rnorm(n*n), n, n)
B<-matrix(rnorm(n*n), n, n)

MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}", depends = "RcppEigen")

bench::mark(MP(A, B), A %*% B)[1:5]
#> # A tibble: 2 x 5
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 MP(A, B) 277.8ms 278ms 3.60 7.63MB
#> 2 A %*% B 37.4ms 39ms 22.8 7.63MB

So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.

Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.

foreach with Rcpp in R package error: simpleError in .Call( function_name ... function name not available for .Call() for package package

The GitHub repo rcpp-and-doparallel provided the solution.

I will demonstrate here how I modified my package - the corresponding commit in the rnormpar repo has commit message "Solved parallelization".

First, I modified the R script titled rnorm_package.R that I created for registering my cpp functions to mirror that of the rcpp-and-doparallel package:

#' @keywords internal
"_PACKAGE"

# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @useDynLib rnormpar, .registration = TRUE
#' @importFrom Rcpp sourceCpp
## usethis namespace: end
NULL

I then deleted and re-generated my NAMESPACE using devtools::document(). This caused the following lines to be added to NAMESPACE:

importFrom(Rcpp,sourceCpp)
useDynLib(rnormpar, .registration = TRUE)

If these lines are already in the NAMESPACE, then the first two steps are perhaps not necessary.

Finally, I modified the arguments to the foreach function so that my package was passed to the workers:

norm_mat_par <- function(){

nworkers <- parallel::detectCores() - 1

cl <- parallel::makeCluster(nworkers)

doParallel::registerDoParallel(cl)

x <- foreach::`%dopar%`(
foreach::foreach(j = 1:5, .packages = "rnormpar"),
{
norm_mat()
})

parallel::stopCluster(cl)

return(x)
}

After building the package, the function produces the expected output:

Restarting R session...

> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
[,1]
[1,] -1.948502

[[2]]
[,1]
[1,] -0.2774582

[[3]]
[,1]
[1,] 0.1710537

[[4]]
[,1]
[1,] 1.784761

[[5]]
[,1]
[1,] -0.5694733

Rcpp: why I can not run the function in my defined package?

From the top of my head, it looks pretty complete but do try

R> Rcpp.package.skeleton("newpackage",
+ example_code=FALSE, ## useful but not required
+ cpp_files=c("New.cpp"), ## may not be required
+ attributes=TRUE) ## this is important
R>

as both Rcpp modules and Rcpp attributes need to be turned on.

After that, things should work as you do the required compileAttributes.

Edit: It is even simpler. Just do do the Rcpp.package.skeleton() call I outlined above, that is with the added attributes=TRUE after which you are done -- install the package and test it.

doParallel issue with inline function on Windows 7 (works on Linux)

The error message "NULL value passed as symbol address" is unusual, and isn't due to the function not being exported to the workers. The cFunc function just doesn't work after being serialized, sent to a worker, and unserialized. It also doesn't work when it's loaded from a saved workspace, which results in the same error message. That doesn't surprise me much, and it may be a documented behavior of the inline package.

As you've demonstrated, you can work-around the problem by creating cFunc on the workers. To do that efficiently, you need to do it only once on each of the workers. To do that with the doParallel backend, I would define a worker initialization function, and execute it on each of the workers using the clusterCall function:

worker.init <- function() {
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
'
assign('cFunc', cxxfunction(sigFunc, code), .GlobalEnv)
NULL
}

f1 <- function(){
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}

library(foreach)
library(doParallel)
cl <- makePSOCKcluster(3)
clusterCall(cl, worker.init)
registerDoParallel(cl)
res <- foreach(counter=1:100) %dopar% f1()

Note that you must create the PSOCK cluster object explicitly in order to call clusterCall.

The reason that your example worked on Linux is that the mclapply function is used when you call registerDoParallel without an argument, while on Windows a cluster object is created and the clusterApplyLB function is used. Functions and variables aren't serialized and sent to the workers when using mclapply, so there is no error.

It would be nice if doParallel included support for initializing the workers without the need for using clusterCall, but it doesn't yet.

function leading to check error in automatically generated RcppExports.R

Congratulations, you've experienced the Section 5.4: Registering native routines requirement added in R 3.4.0. The requirement mandated the inclusion of a src/init.c file that registered each C++ function and their parameters. Thus, Rcpp 0.12.11 generates this file inside of the RcppExports.cpp. Meanwhile, the RcppExports.R file, which is what this question is based upon, has its context being dependent on whether the user appropriately sets useDynLib(pkgname, .registration=TRUE) or useDynLib(pkgname), where the later is not ideal as it does not take advantage of a new option introduced in Rcpp 0.12.11 discussed next.

As a result of this shift in CRAN policy, JJ Allaire, the creator of Attributes for Rcpp 1, was inspired to advance a suggestion made by Douglas Bates back in 2012 when attributes was first added. Specifically, the goal was to change the call from being string-based to being a symbol. The rationale behind the change is simply put that a symbol is onhand when the package loads vs. a string which has to be looked up and converted into a symbol each time the function is run. Therefore, symbol lookup is less expensive on repetitive calls when compared to the string based method of Rcpp in the past.

Basically, this line:

.Call('RGraphM_run_graph_match', PACKAGE = 'RGraphM', A, B, algorithm_params)

Involved R looking up the symbol on each call of the encompassing R function to access the C++ function.

Meanwhile, this line:

.Call(RGraphM_run_graph_match, A, B, algorithm_params)

is a direct call to the C++ function as the symbol is already in memory.

And those are primarily the reasons behind why Rcpp changed how RcppExports.R was automatically generated. One of the downside of this approach is the inability to globally export all functions like before. In particular, some users that had in their NAMESPACE file a global symbol export statement e.g.

exportPattern("^[[:alpha:]]+")

had to remove it and opt to manually specify what functions or variables should be exported.

For more details, you may wish to see the GitHub PR that introduced this feature:

https://github.com/RcppCore/Rcpp/pull/694


1: For more on Attributes, see my history post: http://thecoatlessprofessor.com/programming/rcpp/to-rcpp-attributes-and-beyond-from-inline/



Related Topics



Leave a reply



Submit