How to 'Print' or 'Cat' When Using Parallel

How can I `print` or `cat` when using parallel

Using outfile param in makeCluster you can redirect the output to a file and then check that file to see how your program progresses.

Interestingly on a Linux machine setting it to "" outputs to the console, but that doesn't work for me on a Windows machine. File output works on both.

Why don't parallel jobs print in RStudio?

None of the functions in the 'parallel' package guarantee proper displaying of output sent to the standard output (stdout) or the standard error (stderr) on workers. This is true for all types of parallelization approaches, e.g. forked processing (mclapply()), or PSOCK clusters (parLapply()). The reason for this is because it was never designed to relay output in a consistent manner.

A good test is to see if you can capture the output via capture.output(). For example, I get:

bfr <- utils::capture.output({
  y <- lapply(1:3, FUN = print)
})
print(bfr)
## [1] "[1] 1" "[1] 2" "[1] 3"

as expected but when I try:

bfr <- utils::capture.output({
  y <- parallel::mclapply(1:3, FUN = print)
})
print(bfr)
## character(0)

there's no output captured. Interestingly though, if I call it without capturing output in R 4.0.1 on Linux in the terminal, I get:

y <- parallel::mclapply(1:3, FUN = print)
[1] 1
[1] 3
[1] 2

Interesting, eh?

Another suggestion that you might get when using local PSOCK clusters, is to set argument outfile = "" when creating the cluster. Indeed, when you try this on Linux in the terminal, it certainly looks like it works:

cl <- parallel::makeCluster(2L, outfile = "")
## starting worker pid=25259 on localhost:11167 at 17:50:03.974
## starting worker pid=25258 on localhost:11167 at 17:50:03.974

y <- parallel::parLapply(cl, 1:3, fun = print)
## [1] 1
## [1] 2
## [1] 3

But also this gives false hopes. It turns out that the output you're seeing is only because the terminal happens to display it. This might or might not work in the RStudio Console. You might see different behavior on Linux, macOS, and MS Windows. The most important part of the understanding is that your R session does not see this output at all. If we try to capture it, we get:

bfr <- utils::capture.output({
  y <- parallel::parLapply(cl, 1:3, fun = print)
})
## [1] 1
## [1] 2
## [1] 3
print(bfr)
## character(0)

Interesting, eh? But actually not surprising if you understand the inner details on the 'parallel' package.

(Disclaimer: I'm the author) The only parallel framework that I'm aware that properly relays standard output (e.g. cat(), print(), ...) and message conditions (e.g. message()) to the main R session is the future framework. You can read about the details in its 'Text and Message Output' vignette but here's an example showing that it works:

future::plan("multicore", workers = 2) ## forked processing

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

It works the same regardless of underlying parallelization framework, e.g. with local PSOCK workers:

future::plan("multisession", workers = 2) ## PSOCK cluster

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

This works the same on all operating systems and environments where you run R, including the RStudio Console. It also behaves the same regardless of which future map-reduce framework you use, e.g. (here) future.apply, furrr, and foreach with doFuture.

How can I print when using %dopar%

There are a number of good solutions posted here, but I find it easiest to log to a socket and use a separate process to output the log calls in a console.

I use the following function:

log.socket <- make.socket(port=4000)

Log <- function(text, ...) {
  msg <- sprintf(paste0(as.character(Sys.time()), ": ", text, "\n"), ...)
  cat(msg)
  write.socket(log.socket, msg)
}

You can then place log statements in the code such as:

Log("Processing block %d of %d", i, n.blocks)

Log output can viewed in real-time using any simple socket listening tool. For example, using netcat on Linux:

nc -l 4000

The above log statement would display in the netcat terminal as:

2014-06-25 12:30:45: Processing block 2 of 13

This method has the advantage of working remotely and provides as detailed output as you care to log.

p.s. For those on Windows, see Jon Craton's netcat port.

p.p.s I'm guessing the write.socket R function probably isn't thread-safe, but unless you're logging at high frequency, you're unlikely to run into any issue. Something to be aware of though.

How to remove print suppression - parallel package

I'm not sure that this it is possible because of issues with parallel access to R terminal from separate processes forked by parallel

If you need to get messages from your processes then this SO answer should help (Briefly: use outfile param in parallel::makeCluster)

How to print from clusterApply?

you can follow your progress by using makeCluster(4, outfile = ""). This also turns on the output of write(txt, stderr())

This solution outfile = "" seems just to work on linux systems. For further information of windows check the linked question and the commentaries. There seem to be some solutions like using Rterm instead of Rgui, but i can't provide it to you since i am not able to test it.

I used following code on xubuntu 18.04 and getting all calls.

library(parallel)
cl=makeCluster(4, outfile ="")
txts = c("I", "AM", "NOT", "PRINTED", seq(1,1000000,1))
clusterApply(cl, txts, function(txt){write(txt,stdout())})
stopCluster(cl)

from the documentary of makeCluster:

outfile:

Where to direct the stdout and stderr connection output
from the workers. "" indicates no redirection (which may only be
useful for workers on the local machine). Defaults to ‘/dev/null’
(‘nul:’ on Windows). The other possibility is a file path on the
worker's host. Files will be opened in append mode, as all workers log
to the same file.

So if you want to use stderr, you have to clarify the outfile

Python: execute cat subprocess in parallel

Another approach (rather than the other suggestion of putting shell processes in the background) is to use multithreading.

The run method that you have would then do something like this:

thread.start_new_thread ( myFuncThatDoesZGrep)

To collect results, you can do something like this:

class MyThread(threading.Thread):
   def run(self):
       self.finished = False
       # Your code to run the command here.
       blahBlah()
       # When finished....
       self.finished = True
       self.results = []

Run the thread as stated above in the link on multithreading. When your thread object has myThread.finished == True, then you can collect the results via myThread.results.