How to export many variables and functions from global environment to foreach loop?
If the foreach loop is in the global environment, variables should be exported automatically. If not, you can use .export = ls(globalenv())
(or .GlobalEnv
).
For functions from other packages, you just need to use the syntax package::function
.
R foreach %dopar% Results
If I understand your question correctly, your issues are caused because you are unable to update the global variable test_data
from within the parallelised for-loop.
To understand why you are being prevented from doing so, consider what is actually happening within the parallelised for-loop: multiple workers running on different threads are performing operations in parallel, each with their own separate, locally-scoped variables. If they had access to any global variable (or shared memory) without any kind of protection that controls access to it, then it would be possible to corrupt whatever is stored in the variable - and there are several different ways this corruption might happen.
Preventing this is the raison d'être of concurrency control structures like semaphores. These allow users to do what you are trying to, but require some care to use correctly.
However, they are not a available in R. Hence, it makes sense that R should protect that global variable test_data
from being modified in a non-thread safe manner. It's actually trying to protect your data.
The solution is to rewrite your code to remove any attempt to update global variables (if you still want to do any kind of parallel processing) or switch to using a traditional, sequential for loop (as some commenters have already suggested).
Global Assignment, Parallelism, and foreach
Your attempts to assign to global variables in the foreach
loop are failing because they are happening on the worker processes that were forked by mclapply
. Those variables aren't sent back to the master process, so they are lost.
You could try something like this:
r <- foreach(i = 1:3) %dopar% {
if (i == 1) {
bigAnalysis(data1)
} else if (i == 2) {
bigAnalysis(data2)
} else {
bigAnalysis(data3)
}
}
a <- r[[1]]
b <- r[[2]]
c <- r[[3]]
ls(a)
This uses the default combine function which returns the three environment objects in a list.
Executing the foreach
loop in a function isn't going to make it work. However, the assignments would work if you didn't call registerDoMC
so that you were actually running sequentially. In that case you really are making assignments to the master process's global environment.
results from foreach loop in R
I don't believe you should be trying to modify a global variable from within each worker. See my comment above and link. You shouldn't be checking within iteration process if 500 iterations have convergence=0, because that information is not available to each iteration. The below is one option to return what you want
cl = makeCluster(6)
registerDoParallel(cl)
mse = foreach(i = 1:2000, .packages = c('data.table','matrixStats')) %dopar%{
beta <- rbind(1,0.2,1.2,0.05)
val <- dpd_tdependent(datalist[[i]], c(0.7,FALSE,FALSE,FALSE,FALSE))
optim_sol <- optim(c(beta_0 =0.7, beta_1 =0.05 ,beta_2 = 0.9,rho=0.001),val)
b_s <- optim_sol$par
conv <- optim_sol$convergence
c(b_s-beta,conv,i)
}
mse <- matrix(unlist(m),nrow=2000, byrow=T)
stopCluster(cl)
Related Topics
Writing Robust R Code: Namespaces, Masking and Using the '::' Operator
Predict.Lm() in a Loop. Warning: Prediction from a Rank-Deficient Fit May Be Misleading
Convert Character Matrix into Numeric Matrix
Why Doesn't Outer Work the Way I Think It Should (In R)
How to Collapse Many Records into One While Removing Na Values
R - Ggplot2 Issues with Date as Character for X-Axis
How to Change Xts to Data.Frame and Keep Index
Find All Functions (Including Private) in a Package
Ggplot2 - Adding Secondary Y-Axis on Top of a Plot
Why Use As.Factor() Instead of Just Factor()
Knitr Gets Tricked by Data.Table ':=' Assignment
Knitr: How to Prevent Text Wrapping in Output