How can I print when using %dopar%
There are a number of good solutions posted here, but I find it easiest to log to a socket and use a separate process to output the log calls in a console.
I use the following function:
log.socket <- make.socket(port=4000)
Log <- function(text, ...) {
msg <- sprintf(paste0(as.character(Sys.time()), ": ", text, "\n"), ...)
cat(msg)
write.socket(log.socket, msg)
}
You can then place log statements in the code such as:
Log("Processing block %d of %d", i, n.blocks)
Log output can viewed in real-time using any simple socket listening tool. For example, using netcat on Linux:
nc -l 4000
The above log statement would display in the netcat terminal as:
2014-06-25 12:30:45: Processing block 2 of 13
This method has the advantage of working remotely and provides as detailed output as you care to log.
p.s. For those on Windows, see Jon Craton's netcat port.
p.p.s I'm guessing the write.socket
R function probably isn't thread-safe, but unless you're logging at high frequency, you're unlikely to run into any issue. Something to be aware of though.
Print outputs in foreach loop in R
I'm not sure if there's a way to output to the screen, but you can easily output to a log file using the sink
function like so
ptm1 <- proc.time()
foreach (i = 1:50, .packages = c("MASS"), .combine='+') %dopar% {
ginv(matrix(rexp(1000000, rate=.001), ncol=1000))
if (i >49){
sink("Report.txt", append=TRUE) #open sink file and add output
cat("Time taken", proc.time() - ptm1)
}
}
EDIT : As @Roland points out, this can be dangerous if you want to capture output from every iteration and not just the final one, because you don't want the workers to clobber each other. He links to a better alternative for this scenario in his comment.
How can I `print` or `cat` when using parallel
Using outfile
param in makeCluster
you can redirect the output to a file and then check that file to see how your program progresses.
Interestingly on a Linux machine setting it to ""
outputs to the console, but that doesn't work for me on a Windows machine. File output works on both.
How to log when using foreach (print or futile.logger)
Following the solution from How can I print when using %dopar%: the idea is to use snow
to set up your cluster, and set outfile=""
to redirect worker output to master.
library(foreach)
library(futile.logger)
library(doParallel)
library(doSNOW)
cluster <- makeCluster(3, outfile="") # I only have 4 cores, but you could do 8
registerDoSNOW(cluster)
flog.threshold(DEBUG)
doStuff <- function(input){
flog.info('Doing some stuff with %s', input) # change to flog.info
return(input)
}
res <- lapply(FUN=doStuff, X=seq(1,8,1))
# >> this prints
res2 <- foreach(input = seq(1,8,1)) %do% doStuff(input)
# >> this prints
res3 <- foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)
# >> this prints too
Output:
> res3 <- foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 3
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 1
INFO [2016-08-08 08:22:39] Doing some stuff with 2
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 5
INFO [2016-08-08 08:22:39] Doing some stuff with 4
Type: EXEC
Type: EXEC
INFO [2016-08-08 08:22:39] Doing some stuff with 6
INFO [2016-08-08 08:22:39] Doing some stuff with 7
INFO [2016-08-08 08:22:39] Doing some stuff with 8
Output to log file. Here's an alternative that outputs to a log file, following How to log using futile logger from within a parallel method in R?. It has the advantage of having a cleaner output, but still requires flog.info
:
library(doSNOW)
library(foreach)
library(futile.logger)
nworkers <- 3
cluster <- makeCluster(nworkers)
registerDoSNOW(cluster)
loginit <- function(logfile) flog.appender(appender.file(logfile))
foreach(input=rep('~/Desktop/out.log', nworkers),
.packages='futile.logger') %dopar% loginit(input)
doStuff <- function(input){
flog.info('Doing some stuff with %s', input)
return(input)
}
foreach(input = seq(1,8,1), .packages='futile.logger') %dopar% doStuff(input)
stopCluster(cluster)
readLines("~/Desktop/out.log")
Output:
> readLines("~/Desktop/out.log")
[1] "INFO [2016-08-08 10:07:30] Doing some stuff with 2"
[2] "INFO [2016-08-08 10:07:30] Doing some stuff with 1"
[3] "INFO [2016-08-08 10:07:30] Doing some stuff with 3"
[4] "INFO [2016-08-08 10:07:30] Doing some stuff with 4"
[5] "INFO [2016-08-08 10:07:30] Doing some stuff with 5"
[6] "INFO [2016-08-08 10:07:30] Doing some stuff with 6"
[7] "INFO [2016-08-08 10:07:30] Doing some stuff with 7"
[8] "INFO [2016-08-08 10:07:30] Doing some stuff with 8"
No standard output received inside foreach loop
If you want to output from a parallel-foreach loop, just use the option outfile
: makeCluster(no_cores, outfile = "")
.
foreach %dopar% write in a same file
Another alternative is to redirect all output to a file (this may not be what you want)
library(doParallel)
library(flock)
path_file <- "path1.txt"
cl <- makeCluster(4,outfile=path_file)
registerDoParallel(cl)
foreach(i = 1:10) %dopar%
{
message <- paste("hello","world", i,"\n")
print(message)
}
parallel::stopCluster(cl)
or you may want to have a file for each element and then concat them
library(doParallel)
library(flock)
path_file <- "path"
cl <- makeCluster(4)
registerDoParallel(cl)
foreach(i = 1:103, .export ="fileConn") %dopar%
{
filename = paste0(path_file,i,".txt")
message <- paste("hello","world", i,"\n")
print(filename)
cat(message, file = filename, append=TRUE)
print(message)
}
parallel::stopCluster(cl)
startfile= "full.txt"
foreach(i = 1:103, .export ="fileConn") %do%
{
filename = paste0(path_file,i,".txt")
file.append(startfile,filename)
file.remove(filename)
}
You need to be careful when multiple threads are trying to access the same resource. in order to synchronise the access to a shared resource you can use the flock package to set mutex.
(not sure why the following is not working, file connection may not be threadable
Take a look at the following code sample
library(doParallel)
library(flock)
path_file <- "path12.txt"
fileConn<-file(path_file,open = "a")
lock <-tempfile()
cl <- makeCluster(4)
registerDoParallel(cl)
foreach(i = 1:103) %do%
{
locked <- flock::lock(lock) # Lock in order to use shared resources
message <- paste("hello","world", i,"\n")
cat(message, file = fileConn, append=TRUE)
print(message)
flock::unlock(locked) # Release lock
print(message)
}
close(fileConn)
parallel::stopCluster(cl)
Why is using %dopar% with foreach causing R to not recognize package?
You need to include the packages you will use inside the loop in the foreach
function
foreach(i=1:10,.packages="sf") %dopar% {
if (st_geometry_type(sfObject[i,]) == "LINESTRING")
{
print("check")
}
}
How to speed up a while-loop in R (perhaps using dopar)?
Thank you @Bas! I tested your suggestion on a Linux machine: for a file with ~239 million lines it took less than 1 min. By adding >lines.txt
I could save the results. Interestingly, my first readLines
R script needed "only" 29 min, which was surprisingly fast compared with my first experience (so I might have had some problem with my Windows computer at work which was not related to R).
Related Topics
How to Convert Long to Wide Format With Counts
Yaml Current Date in Rmarkdown
Plot Multiple Lines in One Graph
Frequency Count of Two Column in R
How to Perform Natural (Lexicographic) Sorting in R
Dummy Variables from a String Variable
How to Calculate Cumulative Sum
Subset Data Frame Based on Multiple Conditions
How to Save Plots That Are Made in a Shiny App
How to Use an Image as a Point in Ggplot
Can Dplyr Summarise Over Several Variables Without Listing Each One