Reading Global Variables Using Foreach in R

How to export many variables and functions from global environment to foreach loop?

If the foreach loop is in the global environment, variables should be exported automatically. If not, you can use .export = ls(globalenv()) (or .GlobalEnv).

For functions from other packages, you just need to use the syntax package::function.

R foreach %dopar% Results

If I understand your question correctly, your issues are caused because you are unable to update the global variable test_data from within the parallelised for-loop.

To understand why you are being prevented from doing so, consider what is actually happening within the parallelised for-loop: multiple workers running on different threads are performing operations in parallel, each with their own separate, locally-scoped variables. If they had access to any global variable (or shared memory) without any kind of protection that controls access to it, then it would be possible to corrupt whatever is stored in the variable - and there are several different ways this corruption might happen.

Preventing this is the raison d'être of concurrency control structures like semaphores. These allow users to do what you are trying to, but require some care to use correctly.

However, they are not a available in R. Hence, it makes sense that R should protect that global variable test_data from being modified in a non-thread safe manner. It's actually trying to protect your data.

The solution is to rewrite your code to remove any attempt to update global variables (if you still want to do any kind of parallel processing) or switch to using a traditional, sequential for loop (as some commenters have already suggested).

Global Assignment, Parallelism, and foreach

Your attempts to assign to global variables in the foreach loop are failing because they are happening on the worker processes that were forked by mclapply. Those variables aren't sent back to the master process, so they are lost.

You could try something like this:

r <- foreach(i = 1:3) %dopar% {
if (i == 1) {
bigAnalysis(data1)
} else if (i == 2) {
bigAnalysis(data2)
} else {
bigAnalysis(data3)
}
}

a <- r[[1]]
b <- r[[2]]
c <- r[[3]]
ls(a)

This uses the default combine function which returns the three environment objects in a list.

Executing the foreach loop in a function isn't going to make it work. However, the assignments would work if you didn't call registerDoMC so that you were actually running sequentially. In that case you really are making assignments to the master process's global environment.

results from foreach loop in R

I don't believe you should be trying to modify a global variable from within each worker. See my comment above and link. You shouldn't be checking within iteration process if 500 iterations have convergence=0, because that information is not available to each iteration. The below is one option to return what you want

cl = makeCluster(6)
registerDoParallel(cl)

mse = foreach(i = 1:2000, .packages = c('data.table','matrixStats')) %dopar%{
beta <- rbind(1,0.2,1.2,0.05)
val <- dpd_tdependent(datalist[[i]], c(0.7,FALSE,FALSE,FALSE,FALSE))
optim_sol <- optim(c(beta_0 =0.7, beta_1 =0.05 ,beta_2 = 0.9,rho=0.001),val)
b_s <- optim_sol$par
conv <- optim_sol$convergence
c(b_s-beta,conv,i)
}
mse <- matrix(unlist(m),nrow=2000, byrow=T)

stopCluster(cl)


Related Topics



Leave a reply



Submit