Remove Zombie Processes Using Parallel Package

Remove zombie processes using parallel package

This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl).

Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?

R parallel computing and zombie processes

You could get rid of the zombie processes using the "inline" package. Just implement a function that calls "waitpid":

library(inline)
includes <- '#include <sys/wait.h>'
code <- 'int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};'
wait <- cfunction(body=code, includes=includes, convention='.C')

I tested this by first creating some zombies with the mclapply function:

> library(parallel)
> pids <- unlist(mclapply(1:4, function(i) Sys.getpid(), mc.cores=4))
> system(paste0('ps --pid=', paste(pids, collapse=',')))
PID TTY TIME CMD
17447 pts/4 00:00:00 R <defunct>
17448 pts/4 00:00:00 R <defunct>
17449 pts/4 00:00:00 R <defunct>
17450 pts/4 00:00:00 R <defunct>

(Note that I'm using the GNU version of "ps" which supports the "--pid" option.)

Then I called my "wait" function and called "ps" again to verify that the zombies are gone:

> wait()
list()
> system(paste0('ps --pid=', paste(pids, collapse=',')))
PID TTY TIME CMD

It appears that the worker processes created by mclapply are now gone. This should work as long as the processes were created by the current R process.

How to stop R from leaving zombie processes behind

This really has nothing to do with foreach or doMC; as Steve Weston has pointed out in answer to other StackOverflow queries, doMC is essentially just a wrapper for mclapply, and you can see zombie processes created with a simple call to mclapply:

library(parallel)
mclapply(rep(5,4), rnorm)

On my system, this leaves two zombie processes:

[richcalaway@richcalaway-pc ~]$ ps -efl | grep defunct
1 Z 1660945517 28701 28624 0 77 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
1 Z 1660945517 28702 28624 0 78 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
0 S 1660945517 28704 28308 0 78 0 - 15306 pipe_w 12:00 pts/2 00:00:00 grep defunct

Under normal circumstances, these zombie processes won't cause any trouble, and they do disappear when the R session ends. You can avoid them by using doParallel and a fork cluster instead of using doMC.

Cheers,

Rich Calaway

Principal Program Manager

Revolution Analytics

Python Multiprocessing leading to many zombie processes

Usually the most common problem is that the pool is created but it is not closed.

The best way I know to guarantee that the pool is closed is to use a try/finally clause:

try:
pool = Pool(ncores)
pool.map(yourfunction, arguments)
finally:
pool.close()
pool.join()

If you don't want to struggle with multiprocessing, I wrote a simple package named parmap that wraps multiprocessing to make my life (and potentially yours) easier.

pip install parmap

import parmap
parmap.map(yourfunction, arguments)

From the parmap usage section:

  • Simple parallel example:

    import parmap
    y1 = [myfunction(x, argument1, argument2) for x in mylist]
    y2 = parmap.map(myfunction, mylist, argument1, argument2)
    y1 == y2
  • Iterating over a list of tuples:

    # You want to do:
    z = [myfunction(x, y, argument1, argument2) for (x,y) in mylist]
    z = parmap.starmap(myfunction, mylist, argument1, argument2)

    # You want to do:
    listx = [1, 2, 3, 4, 5, 6]
    listy = [2, 3, 4, 5, 6, 7]
    param = 3.14
    param2 = 42
    listz = []
    for (x, y) in zip(listx, listy):
    listz.append(myfunction(x, y, param1, param2))
    # In parallel:
    listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)

Killing a process leaves zombie process to haunt me... :(

Even if you kill the child, you still need to wait for it.

On open, Rstudio starts many processes (started with parallel package in previous session) -- how to kill them?

Here's what ended up fixing it:

Delete the package I built (the binary, I believe...I clicked the "x" to the right of it's name in the "Packages" part of RStudio).

Rebuild it, with

library(parallel)

commented out.



Related Topics



Leave a reply



Submit