un-register a doParallel cluster
The only official way to "unregister" a foreach backend is to register the sequential backend:
registerDoSEQ()
This makes sense to me because you're supposed to declare which backend to use, so I didn't see any point in providing a way to "undeclare" which backend to use. Instead, you declare that you want to use the sequential backend, which is the default.
I originally considered including an "unregister" function, but since I couldn't convince myself that it was useful, I decided to leave it out since it's much easier to add a function than to remove one.
That being said, I think all you need to do is to remove all of the variables from foreach:::.foreachGlobals
which is where foreach keeps all of its state:
unregister <- function() {
env <- foreach:::.foreachGlobals
rm(list=ls(name=env), pos=env)
}
After calling this function, any parallel backend will be deregistered and the warning will be issued again if %dopar%
is called.
doParallel instead of apply
df <- master.iter
library(doParallel)
ncores <- detectCores()-1
cl <- parallel::makeCluster(ncores)
registerDoParallel(cl)
v <- foreach(i = 1:nrow(df)) %dopar% {
master.function(df[i,1], df[i,2], df[i,3], df[i,4], df[i,5], df[i,6])
}
stopCluster(cl)
R: Parallelization with doParallel and foreach
If you want to output something when using parallelism, use makeCluster(no_cores, outfile = "")
.
Parallel proccessing in R doParallel foreach
Alright, I think I got it by invoking foreach
and %dopar%
:
# Libraries ---------------------------------------------------------------
if (!require("pacman")) install.packages("pacman")
pacman::p_load(lakemorpho,rgdal,maptools,sp,doParallel,foreach,
doParallel)
# Data --------------------------------------------------------------------
ogrDrivers()
dsn <- system.file("vectors", package = "rgdal")[1]
ogrListLayers(dsn)
ogrInfo(dsn=dsn, layer="trin_inca_pl03")
owd <- getwd()
setwd(dsn)
ogrInfo(dsn="trin_inca_pl03.shp", layer="trin_inca_pl03")
setwd(owd)
x <- readOGR(dsn=dsn, layer="trin_inca_pl03")
summary(x)
# HPC ---------------------------------------------------------------------
cores_2_use <- detectCores() - 4
cl <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)
# Analysis ----------------------------------------------------------------
myfun <- function(x,i){tmp<-lakeMorphoClass(x[i,],NULL,NULL,NULL)
x_lake_length<-vector("numeric",length = nrow(x))
x_lake_length[i]<-lakeMaxLength(tmp,200)
print(i)
Sys.sleep(0.1)}
foreach(i = 1:nrow(x),.combine=cbind,.packages=c("lakemorpho","rgdal")) %dopar% (
myfun(x,i)
)
df_Kodiak <- data.frame(x_lake_length)
As you can see in the screenshot below this creates an army of Rscript.exe processes using 20 of 24 CPU cores. Of course, the example data I used is small so it didn't really need all those cores, but it should serve as a proof of concept.
I never go above that ratio because if you use 100% of all CPU cores sometimes bad things happen and other server users may not be happy with you.
foreach, doParallel and random generation
Your worries are correct; random number generation does not magically work in parallel and further steps need to be taken. When using the foreach framework, you can use the doRNG extension to make sure to get sound random numbers also when done in parallel.
Example:
library("doParallel")
cl <- makeCluster(2)
registerDoParallel(cl)
## Declare that parallel RNG should be used for in a parallel foreach() call.
## %dorng% will still result in parallel processing; it uses %dopar% internally.
library("doRNG")
y <- foreach(i = 1:100) %dorng% rnorm(1)
EDIT 2020-08-04: Previously this answer proposed the alternative:
library("doRNG")
registerDoRNG()
y <- foreach(i = 1:100) %dopar% rnorm(1)
However, the downside for that is that it is more complicated for the developer to use registerDoRNG()
in a clean way inside functions. Because of this, I recommend to use %dorng%
to specify that parallel RNG should be used.
How to kill a doMC worker when it's done?
I never did find a suitable solution for doMC, so for a while I've been doing the following:
library(doParallel)
cl <- makePSOCKcluster(4) # number of cores to use
registerDoParallel(cl)
## computation
stopCluster(cl)
Works every time.
Related Topics
Evaluate (I.E., Predict) a Smoothing Spline Outside R
Unimplemented Type List When Trying to Write.Table
Putting X-Axis at Top of Ggplot2 Chart
Specifying Column Types When Importing Xlsx Data to R with Package Readxl
Copying and Modifying a Default Theme
Date Format for Plotting X Axis Ticks of Time Series Data
How to Remove Partial Duplicates from a Data Frame
Stat_Contour with Data Labels on Lines
How to Access the Data Frame That Has Been Passed to Ggplot()
Bookmarking and Saving the Bookmarks in R Shiny
Change the Index Number of a Dataframe
Rmarkdown: Pandoc: PDFlatex Not Found
R Change All Columns of Type Factor to Numeric
Use of Switch() in R to Replace Vector Values