Remove zombie processes using parallel package
This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl)
.
Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?
R parallel computing and zombie processes
You could get rid of the zombie processes using the "inline" package. Just implement a function that calls "waitpid":
library(inline)
includes <- '#include <sys/wait.h>'
code <- 'int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};'
wait <- cfunction(body=code, includes=includes, convention='.C')
I tested this by first creating some zombies with the mclapply function:
> library(parallel)
> pids <- unlist(mclapply(1:4, function(i) Sys.getpid(), mc.cores=4))
> system(paste0('ps --pid=', paste(pids, collapse=',')))
PID TTY TIME CMD
17447 pts/4 00:00:00 R <defunct>
17448 pts/4 00:00:00 R <defunct>
17449 pts/4 00:00:00 R <defunct>
17450 pts/4 00:00:00 R <defunct>
(Note that I'm using the GNU version of "ps" which supports the "--pid" option.)
Then I called my "wait" function and called "ps" again to verify that the zombies are gone:
> wait()
list()
> system(paste0('ps --pid=', paste(pids, collapse=',')))
PID TTY TIME CMD
It appears that the worker processes created by mclapply are now gone. This should work as long as the processes were created by the current R process.
How to stop R from leaving zombie processes behind
This really has nothing to do with foreach or doMC; as Steve Weston has pointed out in answer to other StackOverflow queries, doMC is essentially just a wrapper for mclapply, and you can see zombie processes created with a simple call to mclapply:
library(parallel)
mclapply(rep(5,4), rnorm)
On my system, this leaves two zombie processes:
[richcalaway@richcalaway-pc ~]$ ps -efl | grep defunct
1 Z 1660945517 28701 28624 0 77 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
1 Z 1660945517 28702 28624 0 78 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
0 S 1660945517 28704 28308 0 78 0 - 15306 pipe_w 12:00 pts/2 00:00:00 grep defunct
Under normal circumstances, these zombie processes won't cause any trouble, and they do disappear when the R session ends. You can avoid them by using doParallel and a fork cluster instead of using doMC.
Cheers,
Rich Calaway
Principal Program Manager
Revolution Analytics
Python Multiprocessing leading to many zombie processes
Usually the most common problem is that the pool is created but it is not closed.
The best way I know to guarantee that the pool is closed is to use a try/finally clause:
try:
pool = Pool(ncores)
pool.map(yourfunction, arguments)
finally:
pool.close()
pool.join()
If you don't want to struggle with multiprocessing
, I wrote a simple package named parmap
that wraps multiprocessing to make my life (and potentially yours) easier.
pip install parmap
import parmap
parmap.map(yourfunction, arguments)
From the parmap usage section:
Simple parallel example:
import parmap
y1 = [myfunction(x, argument1, argument2) for x in mylist]
y2 = parmap.map(myfunction, mylist, argument1, argument2)
y1 == y2Iterating over a list of tuples:
# You want to do:
z = [myfunction(x, y, argument1, argument2) for (x,y) in mylist]
z = parmap.starmap(myfunction, mylist, argument1, argument2)
# You want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)
Killing a process leaves zombie process to haunt me... :(
Even if you kill
the child, you still need to wait
for it.
On open, Rstudio starts many processes (started with parallel package in previous session) -- how to kill them?
Here's what ended up fixing it:
Delete the package I built (the binary, I believe...I clicked the "x" to the right of it's name in the "Packages" part of RStudio).
Rebuild it, with
library(parallel)
commented out.
Related Topics
How to Print the Name of Current Row When Using Apply in R
Remove Some of the Axis Labels in Ggplot Faceted Plots
Scraping a Complex HTML Table into a Data.Frame in R
How to Make Shinyapp to Use Environmental Variables When Deployed on the Web
Is There a Command Similar to Matlab's "Close All" in R? (How to Close All Graphics Devices)
Add Hline with Population Median for Each Facet
Inline R Code in Yaml for Rmarkdown Doesn't Run
How to Prevent Rplots.Pdf from Being Generated
Specify Function Parameters in Do.Call
How to Use an R Script from Github
How to Hide/Toggle Legends Based on Addlayercontrol() in Leaflet for R
Click on Points in a Leaflet Map as Input for a Plot in Shiny
Error Connecting to Azure Blob Storage API from R
Use of .By and .Eachi in the Data.Table Package