Using Parlapply and Clusterexport Inside a Function

Using parLapply and clusterExport inside a function

By default clusterExport looks in the .GlobalEnv for objects to export that are named in varlist. If your objects are not in the .GlobalEnv, you must tell clusterExport in which environment it can find those objects.

You can change your clusterExport to the following (which I didn't test, but you said works in the comments)

clusterExport(cl=cl, varlist=c("text.var", "ntv", "gc.rate", "pos"), envir=environment())

This way, it will look in the function's environment for the objects to export.

Running parLapply and future_map inside another function unnecessarily copies large objects to each worker

You can change the environment of your local function so that it does not include big_obj by assigning e.g. only the base environment.

fun <- function(x) {
big_obj <- 1
cl <- parallel::makeCluster(2)
on.exit(parallel::stopCluster(cl), add = TRUE)
parallel::clusterExport(cl, c("x"), envir = environment())
local_fun <- function(x) {
x + 1
env <- environment()
parent_env <- parent.env(env)
return(list(this_env = env, parent_env = parent_env))
}
environment(local_fun) <- baseenv()
parallel::parLapply(cl, c(1), local_fun)
}
res <- fun(1)
"big_obj" %in% names(res[[1]]$parent_env) # FALSE

Using parApply() inside a function

clusterExport() pulls from the global environment. Your input variable is not there, it's an argument local to the function, so you need to specify clusterExport(clust, "input", envir = environment()).

parallel::clusterExport how to pass nested functions from global environment?

A possible solution, use:

myFUN <- function(data, yourFUN, n_cores = 1) {

cl <- parallel::makeCluster(n_cores)
on.exit(parallel::stopCluster(cl), add = TRUE)

envir <- environment(yourFUN)
parallel::clusterExport(cl, varlist = ls(envir), envir = envir)

parallel::parApply(cl, data, 1, yourFUN)
}

How to clusterExport a function without its evaluation environment

It's hard to guess the problem without a complete example, but I'm wondering if the error message isn't coming from clusterExport, rather than parLapply. That would happen if functionList was defined in a function rather than the global environment, since the clusterExport envir argument specifies the environment from which to export the variables.

To export variables defined in a function, from that same function, you would use:

clusterExport(cl, varlist=c("functionList", "y"), envir=environment())

I'm just guessing this might be a problem for you since I don't know how or where you defined functionList. Note that clusterExport always assigns the variables to the global environment of the cluster workers.

I'm also suspicious of the way that you are apparently setting the environment of a list: that seems to be legal, but I don't think it will change the environment of functions in that list. In fact, I suspect that exporting functions to the workers in a list may have other problems that you haven't encountered yet. I would use something like this:

mainFunction <- function(cl) {
fa <- function(x) fb(x)
fb <- function(x) fc(x)
fc <- function(x) x
y <- 7
workerFunction <- function(i) {
do.call(functionNames[[i]], list(y))
}
environment(workerFunction) <- .GlobalEnv
environment(fa) <- .GlobalEnv
environment(fb) <- .GlobalEnv
environment(fc) <- .GlobalEnv
functionNames <- c("fa", "fb", "fc")
clusterExport(cl, varlist=c("functionNames", functionNames, "y"),
envir=environment())
parLapply(cl, seq_along(functionNames), workerFunction)
}

library(parallel)
cl <- makeCluster(detectCores())
mainFunction(cl)
stopCluster(cl)

Note that I've taken liberties with your example, so I'm not sure how well this corresponds with your problem.

How to export objects to parallel clusters within a function in R

By default, clusterExport looks for the variables specified by "varlist" in the global environment. In your case, it should look in the local environment of the dm100zip function. To make it do that, you use the clusterExport "envir" argument:

clusterExport(cl=CL, list("jdata100", "params100", "inits100", "ymax1",
"ymax2", "n.burn", "jag", "n.thin"),
envir=environment())

Note that variables in "varlist" that are defined in the global environment will also be found, but values defined in dm100zip will take precedence.

How can I pass arguments from an outer function to an inner function using parLapply

This is one of the trickier aspects of parallel::clusterExport. As it says in the docs,

clusterExport assigns the values on the master R process of the variables named in varlist to variables of the same names in the global environment (aka ‘workspace’) of each node

That is, it looks in the global environment for the variable names. The default environment argument also demonstrates this

clusterExport(cl = NULL, varlist, envir = .GlobalEnv)

You need to specify the environment to the function (non-global) environment like so

clusterExport(cl, args, env = environment())

In your case, update to

parallel::clusterExport(cl, varlist = ARGS, env = environment())

Replacing with the updated version, this leads to the output for res1

           50%
1 0.11379733
2 -0.01619026
3 0.05117174
4 -0.11234621
5 0.37001881
6 0.07445315
7 0.01455376
8 -0.03924000
9 0.01481569
10 0.18364332


Related Topics



Leave a reply



Submit