Popen Waiting for Child Process Even When the Immediate Child Has Terminated

Popen waiting for child process even when the immediate child has terminated

You could provide start_new_session analog for the C subprocess:

#!/usr/bin/env python
import os
import sys
import platform
from subprocess import Popen, PIPE

# set system/version dependent "start_new_session" analogs
kwargs = {}
if platform.system() == 'Windows':
    # from msdn [1]
    CREATE_NEW_PROCESS_GROUP = 0x00000200  # note: could get it from subprocess
    DETACHED_PROCESS = 0x00000008          # 0x8 | 0x200 == 0x208
    kwargs.update(creationflags=DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP)  
elif sys.version_info < (3, 2):  # assume posix
    kwargs.update(preexec_fn=os.setsid)
else:  # Python 3.2+ and Unix
    kwargs.update(start_new_session=True)

p = Popen(["C"], stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
assert not p.poll()

[1]: Process Creation Flags for CreateProcess()

Have subprocess.Popen only wait on its child process to return, but not any grandchildren

The solution above (using the join method with the shell=True addition) stopped working when we upgraded our Python recently.

There are many references on the internet about the pieces and parts of this, but it took me some doing to come up with a useful solution to the entire problem.

The following solution has been tested in Python 3.9.5 and 3.9.7.

Problem Synopsis

The names of the scripts match those in the code example below.

A top-level program (grandparent.py):

Uses subprocess.run or subprocess.Popen to call a program (parent.py)
Checks return value from parent.py for sanity.
Collects stdout and stderr from the main process 'parent.py'.
Does not want to wait around for the grandchild to complete.

The called program (parent.py)

Might do some stuff first.
Spawns a very long process (the grandchild - "longProcess" in the code below).
Might do a little more work.
Returns its results and exits while the grandchild (longProcess) continues doing what it does.

Solution Synopsis

The important part isn't so much what happens with subprocess. Instead, the method for creating the grandchild/longProcess is the critical part. It is necessary to ensure that the grandchild is truly emancipated from parent.py.

Subprocess only needs to be used in a way that captures output.
The longProcess (grandchild) needs the following to happen:
- It should be started using multiprocessing.
- It needs multiprocessing's 'daemon' set to False.
- It should also be invoked using the double-fork procedure.
- In the double-fork, extra work needs to be done to ensure that the process is truly separate from parent.py. Specifically:
  - Move the execution away from the environment of parent.py.
  - Use file handling to ensure that the grandchild no longer uses the file handles (stdin, stdout, stderr) inherited from parent.py.

Example Code

grandparent.py - calls parent.py using subprocess.run()

#!/usr/bin/env python3
import subprocess 
p = subprocess.run(["/usr/bin/python3", "/path/to/parent.py"], capture_output=True) 

## Comment the following if you don't need reassurance

print("The return code is:  " + str(p.returncode))
print("The standard out is: ")
print(p.stdout)
print("The standard error is: ")
print(p.stderr)

parent.py - starts the longProcess/grandchild and exits, leaving the grandchild running. After 10 seconds, the grandchild will write timing info to /tmp/timelog.

!/usr/bin/env python3

import time
def longProcess() :
    time.sleep(10)
    fo = open("/tmp/timelog", "w")
    fo.write("I slept!  The time now is: " + time.asctime(time.localtime()) + "\n")
    fo.close()

import os,sys
def spawnDaemon(func):
    # do the UNIX double-fork magic, see Stevens' "Advanced
    # Programming in the UNIX Environment" for details (ISBN 0201563177)
    try:
        pid = os.fork()
        if pid > 0: # parent process
            return
    except OSError as e:
        print("fork #1 failed. See next. " )
        print(e)
        sys.exit(1)

    # Decouple from the parent environment.
    os.chdir("/")
    os.setsid()
    os.umask(0)

    # do second fork
    try:
        pid = os.fork()
        if pid > 0:
            # exit from second parent
            sys.exit(0)
    except OSError as  e:
        print("fork #2 failed. See next. " )
        print(e)
        print(1)

    # Redirect standard file descriptors.
    # Here, they are reassigned to /dev/null, but they could go elsewhere.
    sys.stdout.flush()
    sys.stderr.flush()
    si = open('/dev/null', 'r')
    so = open('/dev/null', 'a+')
    se = open('/dev/null', 'a+')
    os.dup2(si.fileno(), sys.stdin.fileno())
    os.dup2(so.fileno(), sys.stdout.fileno())
    os.dup2(se.fileno(), sys.stderr.fileno())

    # Run your daemon
    func()

    # Ensure that the daemon exits when complete
    os._exit(os.EX_OK)

import multiprocessing
daemonicGrandchild=multiprocessing.Process(target=spawnDaemon, args=(longProcess,))
daemonicGrandchild.daemon=False
daemonicGrandchild.start()
print("have started the daemon")  # This will get captured as stdout by grandparent.py

References

The code above was mainly inspired by the following two resources.

This reference is succinct about the use of the double-fork but does not include the file handling we need in this situation.
This reference contains the needed file handling, but does many other things that we do not need.

Why is subprocess.Popen not waiting until the child process terminates?

subprocess.Popen, when instantiated, runs the program. It does not, however, wait for it -- it fires it off in the background as if you'd typed cmd & in a shell. So, in the code above, you've essentially defined a race condition -- if the inserts can finish in time, it will appear normal, but if not you get the unexpected output. You are not waiting for your first run()'d PID to finish, you are simply returning its Popen instance and continuing.

I'm not sure how this behavior contradicts the documentation, because there's some very clear methods on Popen that seem to indicate it is not waited for, like:

Popen.wait()
  Wait for child process to terminate. Set and return returncode attribute.

I do agree, however, that the documentation for this module could be improved.

To wait for the program to finish, I'd recommend using subprocess's convenience method, subprocess.call, or using communicate on a Popen object (for the case when you need stdout). You are already doing this for your second call.

### START MAIN
# copy some rows from a source table to a destination table
# note that the destination table is empty when this script is run
cmd = 'mysql -u ve --skip-column-names --batch --execute="insert into destination (select * from source limit 100000)" test'
subprocess.call(cmd)

# check to see how many rows exist in the destination table
cmd = 'mysql -u ve --skip-column-names --batch --execute="select count(*) from destination" test'
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
try: count = (int(process.communicate()[0][:-1]))
except: count = 0

Additionally, in most cases, you do not need to run the command in a shell. This is one of those cases, but you'll have to rewrite your command like a sequence. Doing it that way also allows you to avoid traditional shell injection and worry less about quoting, like so:

prog = ["mysql", "-u", "ve", "--execute", 'insert into foo values ("snargle", 2)']
subprocess.call(prog)

This will even work, and will not inject as you'd expect:

prog = ["printf", "%s", "<", "/etc/passwd"]
subprocess.call(prog)

Try it interactively. You avoid the possibilities of shell injection, particularly if you're accepting user input. I suspect you're using the less-awesome string method of communicating with subprocess because you ran into trouble getting the sequences to work :^)

Terminate child process on subprocess.TimeoutExpired

Four months later: I got it.

The core issue appears to be that using os.kill with signal.SIGKILL doesn't properly kill the process.

Modifying my code to the following works.

    def custom_terminal_command(self, command, timeout=5*60, cwd=None):
        with subprocess.Popen(command.split(" "), preexec_fn=os.setsid) as process:
            wd = os.getcwd()
            try:
                if cwd is not None:
                    # Man fuck linux
                    for d in cwd.split("/"):
                        os.chdir(d)
                stdout, stderr = process.communicate(None, timeout=timeout)
            except subprocess.TimeoutExpired as exc:
                import signal

                os.killpg(os.getpgid(process.pid), signal.SIGTERM)

                try:
                    import msvcrt
                except ModuleNotFoundError:
                    _mswindows = False
                else:
                    _mswindows = True

                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                reason = 'timeout'
                stdout, stderr = process.communicate()
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                reason = 'other'
                stdout, stderr = process.communicate()
                raise
            else:
                reason = 'finished'
            finally:
                os.chdir(wd)

            try:
                return stdout.decode('utf-8').strip(), stderr.decode('utf-8').strip(), reason
            except AttributeError:
                try:
                    return stdout.strip(), stderr.strip(), reason
                except AttributeError:
                    return stdout, stderr, reason

See the following SO post for a short discussion: How to terminate a python subprocess launched with shell=True

subprocess: deleting child processes in Windows

By using psutil:

import psutil, os

def kill_proc_tree(pid, including_parent=True):    
    parent = psutil.Process(pid)
    children = parent.children(recursive=True)
    for child in children:
        child.kill()
    gone, still_alive = psutil.wait_procs(children, timeout=5)
    if including_parent:
        parent.kill()
        parent.wait(5)

me = os.getpid()
kill_proc_tree(me)

subprocess.communicate() hangs on Windows 8 if parent process creates some child

To allow .communicate() to return without waiting for the grandchild (notepad) to exit, you could try in test.py:

import sys
from subprocess import Popen, PIPE

CREATE_NEW_PROCESS_GROUP = 0x00000200
DETACHED_PROCESS = 0x00000008

p = Popen('grandchild', stdin=PIPE, stdout=PIPE, stderr=PIPE,
          creationflags=DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP)

See Popen waiting for child process even when the immediate child has terminated.

Why does this pclose() implementation return early with ECHILD unless invocation is delayed after popen()?

The problem with your my_pclose() is that you are trying to perform a process-group wait instead of waiting for the specific child process. This:

      pid = waitpid( -1 * (p->pid), &wstatus, 0 );

attempts to wait for a child belonging to process group p->pid, but that is extremely unlikely to work without the setpgid() call you later added. The forked child will initially be in the same process group as its parent, and that group's process group number almost certainly will differ from the child's process number.

Moreover, it's unclear why you are trying to wait on the process group in the first place. You know the specific process you want to wait for, and it would be incorrect for my_pclose() to collect a different one instead, regardless of whether it belongs to the same process group. You should wait for that specific process:

      pid = waitpid(p->pid, &wstatus, 0 );

That will work either with or without the setpgid() call, but almost certainly you should omit that call in a general-purpose function such as this.

Wait for the first subprocess to finish

Here's a solution using psutil - which is aimed exactly at this use-case:

import subprocess
import psutil

a = subprocess.Popen(['/bin/sleep', "2"])

b = subprocess.Popen(['/bin/sleep', "4"])

procs_list = [psutil.Process(a.pid), psutil.Process(b.pid)]

def on_terminate(proc):
     print("process {} terminated".format(proc))

# waits for multiple processes to terminate
gone, alive = psutil.wait_procs(procs_list, timeout=3, callback=on_terminate)

Or, if you'd like to have a loop waiting for one of the process to be done:

while True: 
    gone, alive = psutil.wait_procs(procs_list, timeout=3, callback=on_terminate) 
    if len(gone)>0: 
        break

Popen Waiting for Child Process Even When the Immediate Child Has Terminated