"Un-Register" a Doparallel Cluster

un-register a doParallel cluster

The only official way to "unregister" a foreach backend is to register the sequential backend:

registerDoSEQ()

This makes sense to me because you're supposed to declare which backend to use, so I didn't see any point in providing a way to "undeclare" which backend to use. Instead, you declare that you want to use the sequential backend, which is the default.

I originally considered including an "unregister" function, but since I couldn't convince myself that it was useful, I decided to leave it out since it's much easier to add a function than to remove one.

That being said, I think all you need to do is to remove all of the variables from foreach:::.foreachGlobals which is where foreach keeps all of its state:

unregister <- function() {
env <- foreach:::.foreachGlobals
rm(list=ls(name=env), pos=env)
}

After calling this function, any parallel backend will be deregistered and the warning will be issued again if %dopar% is called.

doParallel instead of apply

df <- master.iter

library(doParallel)

ncores <- detectCores()-1
cl <- parallel::makeCluster(ncores)
registerDoParallel(cl)

v <- foreach(i = 1:nrow(df)) %dopar% {
master.function(df[i,1], df[i,2], df[i,3], df[i,4], df[i,5], df[i,6])
}
stopCluster(cl)

R: Parallelization with doParallel and foreach

If you want to output something when using parallelism, use makeCluster(no_cores, outfile = "").

Parallel proccessing in R doParallel foreach

Alright, I think I got it by invoking foreach and %dopar%:

# Libraries ---------------------------------------------------------------
if (!require("pacman")) install.packages("pacman")
pacman::p_load(lakemorpho,rgdal,maptools,sp,doParallel,foreach,
doParallel)

# Data --------------------------------------------------------------------
ogrDrivers()
dsn <- system.file("vectors", package = "rgdal")[1]
ogrListLayers(dsn)
ogrInfo(dsn=dsn, layer="trin_inca_pl03")
owd <- getwd()
setwd(dsn)
ogrInfo(dsn="trin_inca_pl03.shp", layer="trin_inca_pl03")
setwd(owd)
x <- readOGR(dsn=dsn, layer="trin_inca_pl03")
summary(x)

# HPC ---------------------------------------------------------------------
cores_2_use <- detectCores() - 4
cl <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)

# Analysis ----------------------------------------------------------------
myfun <- function(x,i){tmp<-lakeMorphoClass(x[i,],NULL,NULL,NULL)
x_lake_length<-vector("numeric",length = nrow(x))
x_lake_length[i]<-lakeMaxLength(tmp,200)
print(i)
Sys.sleep(0.1)}

foreach(i = 1:nrow(x),.combine=cbind,.packages=c("lakemorpho","rgdal")) %dopar% (
myfun(x,i)
)

df_Kodiak <- data.frame(x_lake_length)

As you can see in the screenshot below this creates an army of Rscript.exe processes using 20 of 24 CPU cores. Of course, the example data I used is small so it didn't really need all those cores, but it should serve as a proof of concept.

I never go above that ratio because if you use 100% of all CPU cores sometimes bad things happen and other server users may not be happy with you.

Many CPU cores in use

foreach, doParallel and random generation

Your worries are correct; random number generation does not magically work in parallel and further steps need to be taken. When using the foreach framework, you can use the doRNG extension to make sure to get sound random numbers also when done in parallel.

Example:

library("doParallel")
cl <- makeCluster(2)
registerDoParallel(cl)

## Declare that parallel RNG should be used for in a parallel foreach() call.
## %dorng% will still result in parallel processing; it uses %dopar% internally.
library("doRNG")

y <- foreach(i = 1:100) %dorng% rnorm(1)

EDIT 2020-08-04: Previously this answer proposed the alternative:

library("doRNG")
registerDoRNG()
y <- foreach(i = 1:100) %dopar% rnorm(1)

However, the downside for that is that it is more complicated for the developer to use registerDoRNG() in a clean way inside functions. Because of this, I recommend to use %dorng% to specify that parallel RNG should be used.

How to kill a doMC worker when it's done?

I never did find a suitable solution for doMC, so for a while I've been doing the following:

library(doParallel)
cl <- makePSOCKcluster(4) # number of cores to use
registerDoParallel(cl)

## computation

stopCluster(cl)

Works every time.



Related Topics



Leave a reply



Submit