How to Show the Progress of Code in Parallel Computation in R

How to show the progress of code in parallel computation in R?

The doSNOW package has support for progress bars, while doParallel does not. Here's a way to put a progress bar in your example:

require(doSNOW)
require(Kendall)
cores <- parallel::detectCores()
cl <- makeSOCKcluster(cores)
registerDoSNOW(cl)
mydata <- matrix(rnorm(8000*500), ncol=500)
pb <- txtProgressBar(min=1, max=8000, style=3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress=progress)
result <-
foreach(i=1:8000, .packages="Kendall", .options.snow=opts,
.combine='rbind') %dopar% {
abc <- MannKendall(mydata[i,])
data.frame(tau=abc$tau, sl=abc$sl)
}
close(pb)
stopCluster(cl)

Using standard R shiny progress bar in parallel foreach calculations

The doParallel package is an extension of the parallel package as shown in the documentation here.

https://cran.r-project.org/web/packages/doParallel/doParallel.pdf

Reading the parallel package's documentation we see that it implements 3 different methods to achieve parallelism. Keep in mind R is a single threaded language.

  1. A new R session where the parent process communicates with a worker or child process.
  2. Via Forking
  3. Using OS level facilities

You can find this information here,

https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

A consequence of this is that the child process cannot communicate with the parent process until it completes its computation and returns a value. This is to the best of my knowledge.

Hence, ticking the progress bar within the worker process will not be possible.

Full disclosure, I have not worked with the doParallel package and the documentation with respect to shiny was limited.


Alternative solution

There is a similar package however with extensive documentation with respect to shiny. These are the futures and promises and ipc packages. futures and promises enable asynchronous programming while ipc enables interprocess communication. To help us even more it also has an AsyncProgress() function.

Here is an example where we tick two counters synchronously.

Example

library(shiny)
library(future)
library(promises)
library(ipc)

plan(multisession)

ui <- fluidPage(
actionButton(inputId = "go", label = "Launch calculation")
)

server <- function(input, output, session) {

observeEvent(input$go, {

progress = AsyncProgress$new(message="Complex analysis")

future({
for (i in 1:15) {
progress$inc(1/15)
Sys.sleep(0.5)
}

progress$close()
return(i)
})%...>%
cat(.,"\n")

Sys.sleep(1)

progress2 = AsyncProgress$new(message="Complex analysis")

future({
for (i in 1:5) {
progress2$inc(1/5)
Sys.sleep(0.5)
}

progress2$close()

return(i)
})%...>%
cat(.,"\n")

NULL
})
}

shinyApp(ui = ui, server = server)

Your code adapted

Here is the code you have written, slightly modified to spin off many asynchronous processes. Any work can be performed in the worker, such as the vector you create and add an rnorm too. (Not shown here)

library(shiny)
library(future)
library(promises)
library(ipc)

plan(multisession)

ui <- fluidPage(
actionButton(inputId = "go", label = "Launch calculation")
)

server <- function(input, output, session) {

observeEvent(input$go, {
Runs=c(1:4) #define the number of runs
progress = list() #A list to maintain progress for each run

for(j in Runs){
progress[[j]] = AsyncProgress$new(message="Complex analysis")
future({
for (i in 1:10) {
progress[[j]]$inc(1/10)
Sys.sleep(0.2)
}
progress[[j]]$close()
return(i)
})%...>%
cat(.,'\n')
}

NULL
})
}

shinyApp(ui = ui, server = server)

The code above is a modified version of the code found in the ipc documentation here:

http://htmlpreview.github.io/?https://github.com/fellstat/ipc/blob/master/inst/doc/shinymp.html

Additional Resources:

https://rstudio.github.io/promises/articles/overview.html

R: asynchronous parallel lapply

(Disclaimer: I'm the author of the future framework and the progressr package)

A close solution that resembles base::lapply(), and your pbapply::pblapply() example, is to use the future.apply as:

library(future.apply)

## The below is same as plan(multisession, workers=4)
cl <- parallel::makeCluster(4)
plan(cluster, workers=cl)

xs <- 1:100
results <- future_lapply(xs, FUN=function(x) {
Sys.sleep(0.1)
sqrt(x)
})

Chunking:
You can control the amount of chunking with argument future.chunk.size or supplementary future.schedule. To disable chunking such that each element is processed in a unique parallel task, use future.chunk.size=1. This way, if there is one element that takes much longer than other elements, it will not hold up any other elements.

xs <- 1:100
results <- future_lapply(xs, FUN=function(x) {
Sys.sleep(0.1)
sqrt(x)
}, future.chunk.size=1)

Progress updates in parallel:
If you want to receive progress updates when doing parallel processing, you can use progressr package and configure it to use the progress package to report updates as a progress bar (here also with an ETA).

library(future.apply)
plan(multisession, workers=4)

library(progressr)
handlers(handler_progress(format="[:bar] :percent :eta :message"))

with_progress({
p <- progressor(along=xs)
results <- future_lapply(xs, FUN=function(x) {
p() ## signal progress
Sys.sleep(0.1)
sqrt(x)
}, future.chunk.size=1)
})

You can wrap this into a function, e.g.

my_fcn <- function(xs) {
p <- progressor(along=xs)
future_lapply(xs, FUN=function(x) {
p()
Sys.sleep(0.1)
sqrt(x)
}, future.chunk.size=1)
}

This way you can call it as a regular function:

> result <- my_fcn(xs)

and use plan() to control exactly how you want it to parallelize. This will not report on progress. To do that, you'll have to do:

> with_progress(result <- my_fcn(xs))
[====>-----------------------------------------------------] 9% 1m

Run everything in the background: If your question was how to run the whole shebang in the background, see the 'Future Topologies' vignette. That's another level of parallelization but it's possible.

how to track progress in mclapply in R in parallel package

This is an update of my related answer.

library(parallel)

finalResult <- local({
f <- fifo(tempfile(), open="w+b", blocking=T)
if (inherits(parallel:::mcfork(), "masterProcess")) {
# Child
progress <- 0.0
while (progress < 1 && !isIncomplete(f)) {
msg <- readBin(f, "double")
progress <- progress + as.numeric(msg)
cat(sprintf("Progress: %.2f%%\n", progress * 100))
}
parallel:::mcexit()
}
numJobs <- 100
result <- mclapply(1:numJobs, function(...) {
# Do something fancy here... For this example, just sleep
Sys.sleep(0.05)
# Send progress update
writeBin(1/numJobs, f)
# Some arbitrary result
sample(1000, 1)
})
close(f)
result
})

cat("Done\n")

parSapply and progress bar

The parSapply function doesn't support a progress bar, and I don't think there is any really good way to implement one by adding extra code to the task function, although people have made valiant efforts to do that.

The doSNOW package supports progress bars, so you could either use that directly or write a wrapper function that works like the parSapply function. Here's one way to write such a wrapper function:

# This function is similar to "parSapply", but doesn't preschedule
# tasks and doesn't support "simplify" and "USE.NAMES" options
pbSapply <- function(cl, X, FUN, ...) {
registerDoSNOW(cl)
pb <- txtProgressBar(max=length(X))
on.exit(close(pb))
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress=progress)
foreach(i=X, .combine='c', .options.snow=opts) %dopar% {
FUN(i, ...)
}
}

You can easily modify this function to use either the tkProgressBar or winProgressBar function.

Here's an example use of pbSapply:

library(doSNOW)
cl <- makeSOCKcluster(3)
x <- pbSapply(cl, 1:100, function(i, j) {Sys.sleep(1); i + j}, 100)

Note that this doesn't use prescheduling, so the performance won't be as good as parSapply if you have small tasks.



Related Topics



Leave a reply



Submit