Popen waiting for child process even when the immediate child has terminated
You could provide start_new_session
analog for the C
subprocess:
#!/usr/bin/env python
import os
import sys
import platform
from subprocess import Popen, PIPE
# set system/version dependent "start_new_session" analogs
kwargs = {}
if platform.system() == 'Windows':
# from msdn [1]
CREATE_NEW_PROCESS_GROUP = 0x00000200 # note: could get it from subprocess
DETACHED_PROCESS = 0x00000008 # 0x8 | 0x200 == 0x208
kwargs.update(creationflags=DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP)
elif sys.version_info < (3, 2): # assume posix
kwargs.update(preexec_fn=os.setsid)
else: # Python 3.2+ and Unix
kwargs.update(start_new_session=True)
p = Popen(["C"], stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
assert not p.poll()
[1]: Process Creation Flags for CreateProcess()
Have subprocess.Popen only wait on its child process to return, but not any grandchildren
The solution above (using the join
method with the shell=True
addition) stopped working when we upgraded our Python recently.
There are many references on the internet about the pieces and parts of this, but it took me some doing to come up with a useful solution to the entire problem.
The following solution has been tested in Python 3.9.5 and 3.9.7.
Problem Synopsis
The names of the scripts match those in the code example below.
A top-level program (grandparent.py):
- Uses subprocess.run or subprocess.Popen to call a program (parent.py)
- Checks return value from parent.py for sanity.
- Collects stdout and stderr from the main process 'parent.py'.
- Does not want to wait around for the grandchild to complete.
The called program (parent.py)
- Might do some stuff first.
- Spawns a very long process (the grandchild - "longProcess" in the code below).
- Might do a little more work.
- Returns its results and exits while the grandchild (longProcess) continues doing what it does.
Solution Synopsis
The important part isn't so much what happens with subprocess
. Instead, the method for creating the grandchild/longProcess is the critical part. It is necessary to ensure that the grandchild is truly emancipated from parent.py.
- Subprocess only needs to be used in a way that captures output.
- The longProcess (grandchild) needs the following to happen:
- It should be started using multiprocessing.
- It needs multiprocessing's 'daemon' set to False.
- It should also be invoked using the double-fork procedure.
- In the double-fork, extra work needs to be done to ensure that the process is truly separate from parent.py. Specifically:
- Move the execution away from the environment of parent.py.
- Use file handling to ensure that the grandchild no longer uses the file handles (stdin, stdout, stderr) inherited from parent.py.
Example Code
grandparent.py - calls parent.py using subprocess.run()
#!/usr/bin/env python3
import subprocess
p = subprocess.run(["/usr/bin/python3", "/path/to/parent.py"], capture_output=True)
## Comment the following if you don't need reassurance
print("The return code is: " + str(p.returncode))
print("The standard out is: ")
print(p.stdout)
print("The standard error is: ")
print(p.stderr)
parent.py - starts the longProcess/grandchild and exits, leaving the grandchild running. After 10 seconds, the grandchild will write timing info to /tmp/timelog
.
!/usr/bin/env python3
import time
def longProcess() :
time.sleep(10)
fo = open("/tmp/timelog", "w")
fo.write("I slept! The time now is: " + time.asctime(time.localtime()) + "\n")
fo.close()
import os,sys
def spawnDaemon(func):
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0: # parent process
return
except OSError as e:
print("fork #1 failed. See next. " )
print(e)
sys.exit(1)
# Decouple from the parent environment.
os.chdir("/")
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError as e:
print("fork #2 failed. See next. " )
print(e)
print(1)
# Redirect standard file descriptors.
# Here, they are reassigned to /dev/null, but they could go elsewhere.
sys.stdout.flush()
sys.stderr.flush()
si = open('/dev/null', 'r')
so = open('/dev/null', 'a+')
se = open('/dev/null', 'a+')
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# Run your daemon
func()
# Ensure that the daemon exits when complete
os._exit(os.EX_OK)
import multiprocessing
daemonicGrandchild=multiprocessing.Process(target=spawnDaemon, args=(longProcess,))
daemonicGrandchild.daemon=False
daemonicGrandchild.start()
print("have started the daemon") # This will get captured as stdout by grandparent.py
References
The code above was mainly inspired by the following two resources.
- This reference is succinct about the use of the double-fork but does not include the file handling we need in this situation.
- This reference contains the needed file handling, but does many other things that we do not need.
Why is subprocess.Popen not waiting until the child process terminates?
subprocess.Popen
, when instantiated, runs the program. It does not, however, wait for it -- it fires it off in the background as if you'd typed cmd &
in a shell. So, in the code above, you've essentially defined a race condition -- if the inserts can finish in time, it will appear normal, but if not you get the unexpected output. You are not waiting for your first run()
'd PID to finish, you are simply returning its Popen
instance and continuing.
I'm not sure how this behavior contradicts the documentation, because there's some very clear methods on Popen that seem to indicate it is not waited for, like:
Popen.wait()
Wait for child process to terminate. Set and return returncode attribute.
I do agree, however, that the documentation for this module could be improved.
To wait for the program to finish, I'd recommend using subprocess
's convenience method, subprocess.call
, or using communicate
on a Popen
object (for the case when you need stdout). You are already doing this for your second call.
### START MAIN
# copy some rows from a source table to a destination table
# note that the destination table is empty when this script is run
cmd = 'mysql -u ve --skip-column-names --batch --execute="insert into destination (select * from source limit 100000)" test'
subprocess.call(cmd)
# check to see how many rows exist in the destination table
cmd = 'mysql -u ve --skip-column-names --batch --execute="select count(*) from destination" test'
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
try: count = (int(process.communicate()[0][:-1]))
except: count = 0
Additionally, in most cases, you do not need to run the command in a shell. This is one of those cases, but you'll have to rewrite your command like a sequence. Doing it that way also allows you to avoid traditional shell injection and worry less about quoting, like so:
prog = ["mysql", "-u", "ve", "--execute", 'insert into foo values ("snargle", 2)']
subprocess.call(prog)
This will even work, and will not inject as you'd expect:
prog = ["printf", "%s", "<", "/etc/passwd"]
subprocess.call(prog)
Try it interactively. You avoid the possibilities of shell injection, particularly if you're accepting user input. I suspect you're using the less-awesome string method of communicating with subprocess because you ran into trouble getting the sequences to work :^)
Terminate child process on subprocess.TimeoutExpired
Four months later: I got it.
The core issue appears to be that using os.kill
with signal.SIGKILL
doesn't properly kill the process.
Modifying my code to the following works.
def custom_terminal_command(self, command, timeout=5*60, cwd=None):
with subprocess.Popen(command.split(" "), preexec_fn=os.setsid) as process:
wd = os.getcwd()
try:
if cwd is not None:
# Man fuck linux
for d in cwd.split("/"):
os.chdir(d)
stdout, stderr = process.communicate(None, timeout=timeout)
except subprocess.TimeoutExpired as exc:
import signal
os.killpg(os.getpgid(process.pid), signal.SIGTERM)
try:
import msvcrt
except ModuleNotFoundError:
_mswindows = False
else:
_mswindows = True
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
reason = 'timeout'
stdout, stderr = process.communicate()
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
reason = 'other'
stdout, stderr = process.communicate()
raise
else:
reason = 'finished'
finally:
os.chdir(wd)
try:
return stdout.decode('utf-8').strip(), stderr.decode('utf-8').strip(), reason
except AttributeError:
try:
return stdout.strip(), stderr.strip(), reason
except AttributeError:
return stdout, stderr, reason
See the following SO post for a short discussion: How to terminate a python subprocess launched with shell=True
subprocess: deleting child processes in Windows
By using psutil:
import psutil, os
def kill_proc_tree(pid, including_parent=True):
parent = psutil.Process(pid)
children = parent.children(recursive=True)
for child in children:
child.kill()
gone, still_alive = psutil.wait_procs(children, timeout=5)
if including_parent:
parent.kill()
parent.wait(5)
me = os.getpid()
kill_proc_tree(me)
subprocess.communicate() hangs on Windows 8 if parent process creates some child
To allow .communicate()
to return without waiting for the grandchild (notepad) to exit, you could try in test.py
:
import sys
from subprocess import Popen, PIPE
CREATE_NEW_PROCESS_GROUP = 0x00000200
DETACHED_PROCESS = 0x00000008
p = Popen('grandchild', stdin=PIPE, stdout=PIPE, stderr=PIPE,
creationflags=DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP)
See Popen waiting for child process even when the immediate child has terminated.
Why does this pclose() implementation return early with ECHILD unless invocation is delayed after popen()?
The problem with your my_pclose()
is that you are trying to perform a process-group wait instead of waiting for the specific child process. This:
pid = waitpid( -1 * (p->pid), &wstatus, 0 );
attempts to wait for a child belonging to process group p->pid
, but that is extremely unlikely to work without the setpgid()
call you later added. The forked child will initially be in the same process group as its parent, and that group's process group number almost certainly will differ from the child's process number.
Moreover, it's unclear why you are trying to wait on the process group in the first place. You know the specific process you want to wait for, and it would be incorrect for my_pclose()
to collect a different one instead, regardless of whether it belongs to the same process group. You should wait for that specific process:
pid = waitpid(p->pid, &wstatus, 0 );
That will work either with or without the setpgid()
call, but almost certainly you should omit that call in a general-purpose function such as this.
Wait for the first subprocess to finish
Here's a solution using psutil - which is aimed exactly at this use-case:
import subprocess
import psutil
a = subprocess.Popen(['/bin/sleep', "2"])
b = subprocess.Popen(['/bin/sleep', "4"])
procs_list = [psutil.Process(a.pid), psutil.Process(b.pid)]
def on_terminate(proc):
print("process {} terminated".format(proc))
# waits for multiple processes to terminate
gone, alive = psutil.wait_procs(procs_list, timeout=3, callback=on_terminate)
Or, if you'd like to have a loop waiting for one of the process to be done:
while True:
gone, alive = psutil.wait_procs(procs_list, timeout=3, callback=on_terminate)
if len(gone)>0:
break
Related Topics
How to Override the Copy/Deepcopy Operations for a Python Object
Attributeerror: 'Module' Object Has No Attribute 'Urlopen'
How to Decrypt Aws Ruby Client-Side Encryption in Python
Using Perl, Python, or Ruby, How to Write a Program to "Click" on the Screen at Scheduled Time
How to Do Multiple Substitutions Using Regex
How to Create a Large Pandas Dataframe from an SQL Query Without Running Out of Memory
How to Link Pycharm with Pyspark
Difference Between Multiple If's and Elif'S
Simple File Server to Serve Current Directory
Python, Ruby, Haskell - Do They Provide True Multithreading
How to Redirect Stdout to Both File and Console with Scripting
Ruby Equivalent to Python's Help()
Is There a Function That Checks If a Character in a String Is a Letter in the Alphabet? (Swift)
Convert Tuple to List and Back