How to Print When Using %Dopar%

How can I print when using %dopar%

There are a number of good solutions posted here, but I find it easiest to log to a socket and use a separate process to output the log calls in a console.

I use the following function:

log.socket <- make.socket(port=4000)

Log <- function(text, ...) {
msg <- sprintf(paste0(as.character(Sys.time()), ": ", text, "\n"), ...)
cat(msg)
write.socket(log.socket, msg)
}

You can then place log statements in the code such as:

Log("Processing block %d of %d", i, n.blocks)

Log output can viewed in real-time using any simple socket listening tool. For example, using netcat on Linux:

nc -l 4000

The above log statement would display in the netcat terminal as:

2014-06-25 12:30:45: Processing block 2 of 13

This method has the advantage of working remotely and provides as detailed output as you care to log.

p.s. For those on Windows, see Jon Craton's netcat port.

p.p.s I'm guessing the write.socket R function probably isn't thread-safe, but unless you're logging at high frequency, you're unlikely to run into any issue. Something to be aware of though.

Print outputs in foreach loop in R

I'm not sure if there's a way to output to the screen, but you can easily output to a log file using the sink function like so

ptm1 <- proc.time()
foreach (i = 1:50, .packages = c("MASS"), .combine='+') %dopar% {
ginv(matrix(rexp(1000000, rate=.001), ncol=1000))
if (i >49){

sink("Report.txt", append=TRUE) #open sink file and add output

cat("Time taken", proc.time() - ptm1)

}
}

EDIT : As @Roland points out, this can be dangerous if you want to capture output from every iteration and not just the final one, because you don't want the workers to clobber each other. He links to a better alternative for this scenario in his comment.

How can I `print` or `cat` when using parallel

Using outfile param in makeCluster you can redirect the output to a file and then check that file to see how your program progresses.

Interestingly on a Linux machine setting it to "" outputs to the console, but that doesn't work for me on a Windows machine. File output works on both.

How to log when using foreach (print or futile.logger)

Following the solution from How can I print when using %dopar%: the idea is to use snow to set up your cluster, and set outfile="" to redirect worker output to master.

library(foreach)
library(futile.logger)
library(doParallel)

library(doSNOW)
cluster <- makeCluster(3, outfile="") # I only have 4 cores, but you could do 8
registerDoSNOW(cluster)
flog.threshold(DEBUG)

doStuff <- function(input){
flog.info('Doing some stuff with %s', input) # change to flog.info
return(input)
}
res <- lapply(FUN=doStuff, X=seq(1,8,1))
# >> this prints
res2 <- foreach(input = seq(1,8,1)) %do% doStuff(input)
# >> this prints
res3 <- foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)
# >> this prints too

Output:

> res3 <- foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)  
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 3
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 1
INFO [2016-08-08 08:22:39] Doing some stuff with 2
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 5
INFO [2016-08-08 08:22:39] Doing some stuff with 4
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 6
INFO [2016-08-08 08:22:39] Doing some stuff with 7
INFO [2016-08-08 08:22:39] Doing some stuff with 8

Output to log file. Here's an alternative that outputs to a log file, following How to log using futile logger from within a parallel method in R?. It has the advantage of having a cleaner output, but still requires flog.info:

library(doSNOW)
library(foreach)
library(futile.logger)
nworkers <- 3
cluster <- makeCluster(nworkers)
registerDoSNOW(cluster)
loginit <- function(logfile) flog.appender(appender.file(logfile))
foreach(input=rep('~/Desktop/out.log', nworkers),
.packages='futile.logger') %dopar% loginit(input)
doStuff <- function(input){
flog.info('Doing some stuff with %s', input)
return(input)
}
foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)
stopCluster(cluster)
readLines("~/Desktop/out.log")

Output:

> readLines("~/Desktop/out.log")
[1] "INFO [2016-08-08 10:07:30] Doing some stuff with 2"
[2] "INFO [2016-08-08 10:07:30] Doing some stuff with 1"
[3] "INFO [2016-08-08 10:07:30] Doing some stuff with 3"
[4] "INFO [2016-08-08 10:07:30] Doing some stuff with 4"
[5] "INFO [2016-08-08 10:07:30] Doing some stuff with 5"
[6] "INFO [2016-08-08 10:07:30] Doing some stuff with 6"
[7] "INFO [2016-08-08 10:07:30] Doing some stuff with 7"
[8] "INFO [2016-08-08 10:07:30] Doing some stuff with 8"

No standard output received inside foreach loop

If you want to output from a parallel-foreach loop, just use the option outfile: makeCluster(no_cores, outfile = "").

foreach %dopar% write in a same file

Another alternative is to redirect all output to a file (this may not be what you want)

  library(doParallel)
library(flock)
path_file <- "path1.txt"
cl <- makeCluster(4,outfile=path_file)
registerDoParallel(cl)
foreach(i = 1:10) %dopar%
{
message <- paste("hello","world", i,"\n")
print(message)
}
parallel::stopCluster(cl)

or you may want to have a file for each element and then concat them

  library(doParallel)
library(flock)
path_file <- "path"

cl <- makeCluster(4)
registerDoParallel(cl)
foreach(i = 1:103, .export ="fileConn") %dopar%
{
filename = paste0(path_file,i,".txt")
message <- paste("hello","world", i,"\n")
print(filename)
cat(message, file = filename, append=TRUE)
print(message)
}

parallel::stopCluster(cl)

startfile= "full.txt"
foreach(i = 1:103, .export ="fileConn") %do%
{
filename = paste0(path_file,i,".txt")
file.append(startfile,filename)
file.remove(filename)
}

You need to be careful when multiple threads are trying to access the same resource. in order to synchronise the access to a shared resource you can use the flock package to set mutex.
(not sure why the following is not working, file connection may not be threadable

Take a look at the following code sample

  library(doParallel)
library(flock)
path_file <- "path12.txt"
fileConn<-file(path_file,open = "a")
lock <-tempfile()

cl <- makeCluster(4)
registerDoParallel(cl)
foreach(i = 1:103) %do%
{
locked <- flock::lock(lock) # Lock in order to use shared resources
message <- paste("hello","world", i,"\n")
cat(message, file = fileConn, append=TRUE)
print(message)
flock::unlock(locked) # Release lock
print(message)
}

close(fileConn)
parallel::stopCluster(cl)

Why is using %dopar% with foreach causing R to not recognize package?

You need to include the packages you will use inside the loop in the foreachfunction

foreach(i=1:10,.packages="sf") %dopar% {
if (st_geometry_type(sfObject[i,]) == "LINESTRING")
{
print("check")
}
}

How to speed up a while-loop in R (perhaps using dopar)?

Thank you @Bas! I tested your suggestion on a Linux machine: for a file with ~239 million lines it took less than 1 min. By adding >lines.txt I could save the results. Interestingly, my first readLines R script needed "only" 29 min, which was surprisingly fast compared with my first experience (so I might have had some problem with my Windows computer at work which was not related to R).



Related Topics



Leave a reply



Submit