Blocking and Non Blocking subprocess calls
Popen
is nonblocking. call
and check_call
are blocking.
You can make the Popen
instance block by calling its wait
or communicate
method.
If you look in the source code, you'll see call
calls Popen(...).wait()
, which is why it is blocking. check_call
calls call
, which is why it blocks as well.
Strictly speaking, shell=True
is orthogonal to the issue of blocking. However, shell=True
causes Python to exec a shell and then run the command in the shell. If you use a blocking call, the call will return when the shell finishes. Since the shell may spawn a subprocess to run the command, the shell may finish before the spawned subprocess. For example,
import subprocess
import time
proc = subprocess.Popen('ls -lRa /', shell=True)
time.sleep(3)
proc.terminate()
proc.wait()
Here two processes are spawned: Popen spawns one subprocess running the shell. The shell in turn spawns a subprocess running ls
. proc.terminate()
kills the shell, but the subprocess running ls
remains. (That is manifested by copious output, even after the python script has ended. Be prepared to kill the ls
with pkill ls
.)
A non-blocking read on a subprocess.PIPE in Python
fcntl
, select
, asyncproc
won't help in this case.
A reliable way to read a stream without blocking regardless of operating system is to use Queue.get_nowait()
:
import sys
from subprocess import PIPE, Popen
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty # python 2.x
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen(['myprogram.exe'], stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
# ... do other things here
# read line without blocking
try: line = q.get_nowait() # or q.get(timeout=.1)
except Empty:
print('no output yet')
else: # got line
# ... do something with line
Python subprocess multiple non blocking communicates
I now switched from using subprocess to using pexpect.
My syntax is now as follows:
child = pexpect.spawn('rosrun ros_pkg ros_node')
command = child.sendline('new command')
output = child.read_nonblocking(10000, timeout=1)
....
logic
....
command = child.sendline('new command')
output = child.read_nonblocking(10000, timeout=1)
Many thanks to novel_yet_trivial on reddit: https://www.reddit.com/r/learnpython/comments/2o2viz/subprocess_popen_multiple_times/
Python blocking versus non-blocking OS calls
Sure, if you're running heavy queries once for the life of the app then run them all on startup and cache the results. Do this before you begin serving requests and it will never impact your users. Checking when a user requests a resource to see if a value exists in the cache is a 'lazy' or 'deferred' cache and this will still impact the user. Even using Popen you will still need a way to defer responding to the client and yield to other threads.
It sounds like you're writing a raw HTTP server based on BaseHTTPServer or similar? If so you want to take a look at WSGI and choose one of the WSGI compliant servers such as Gunicorn. Combine this with a WSGI framework such as Flask and you will solve your scaling issues without having to resort to Popen and reinventing the wheel.
Gunicorn handles multi-threading your client connections and Flask handles your request state. You may still need to do some work to handle long-running requests but the process will be much easier.
Typically you want to keep your response times short so you don't have to worry about timeouts etc. To this end if you have a long-running process initiated by the user you may want to split the process into three steps.
- start_task The user initiates the request and it is submitted to a task queue (check out Celery or Python RQ) returning a tracking ID.
- check_task The user provides a tracking ID and the API returns a status.
- get_result Once the task is complete the user retrieves the result.
In your web app UI then you can provide the user with feedback at each stage and potentially provide a progress indicator via the 'check_task' call.
nonblocking subprocesses in python
subprocess.Popen
isn't inherently blocking. You can still use proc.stdin.write()
and proc.stdout.read()
; the only problem is that you risk blocking on one side, or even deadlocking[1], if a pipe fills up. If you know your subprocess will only ever read or write a small amount of data, you don't have to worry about that.
So you can do:
proc = subprocess.Popen(['perl', 'somescript.pl'], stdout=subprocess.PIPE)
buf = StringIO()
CHUNKSIZE = 1024 # how much to read at a time
while True:
# do whatever other time-consuming work you want here, including monitoring
# other processes...
# this keeps the pipe from filling up
buf.write(proc.stdout.read(CHUNKSIZE))
proc.poll()
if proc.returncode is not None:
# process has finished running
buf.write(proc.stdout.read())
print "return code is", proc.returncode
print "output is", buf.getvalue()
break
In a larger app, you could schedule this to happen in your event loop, reactor, etc.
[1] The OS will only allow so much data to fit in a pipe at a time. Say you run cat
as your subprocess, and write a ton of data to its stdin. cat
will write that data to its own stdout until it fills up, and then it'll block until your program reads some data from stdout and empties the pipe. But your program is still writing to stdin, and cat
is no longer reading from it, so that pipe will fill up too. Both processes will be stuck with blocking writes, waiting for the other to read, which will never happen.
Related Topics
Using Beautiful Soup to Convert CSS Attributes to Individual HTML Attributes
Best Way to Set Entry Background Color in Python Gtk3 and Set Back to Default
I Can't Import Python Modules in Xcode 11 Using Pythonkit
Generate Correlated Data in Python (3.3)
R Markdown: How to Make Rstudio Display Python Plots Inline Instead of in New Window
How to Add Sum to Zero Constraint to Glm in Python
R, Python: Install Packages on Rpy2
Fama MACbeth Regression in Python (Pandas or Statsmodels)
Plotting of 2D Data:Heatmap with Different Colormaps
R's Which() and Which.Min() Equivalent in Python
Combine a Folder of Text Files into a CSV with Each Content in a Cell
Install Rpy2 on Windows7 64Bit for Python 2.7
Python How to Parse CSS File as Key Value
How to Highlight Searched Queries in Result Page of Django Template
How to Generate All Possible Three Letter Strings
Why Is Numpy's Einsum Faster Than Numpy's Built in Functions