Blocking and Non Blocking Subprocess Calls

Blocking and Non Blocking subprocess calls

Popen is nonblocking. call and check_call are blocking.
You can make the Popen instance block by calling its wait or communicate method.

If you look in the source code, you'll see call calls Popen(...).wait(), which is why it is blocking.
check_call calls call, which is why it blocks as well.

Strictly speaking, shell=True is orthogonal to the issue of blocking. However, shell=True causes Python to exec a shell and then run the command in the shell. If you use a blocking call, the call will return when the shell finishes. Since the shell may spawn a subprocess to run the command, the shell may finish before the spawned subprocess. For example,

import subprocess
import time

proc = subprocess.Popen('ls -lRa /', shell=True)
time.sleep(3)
proc.terminate()
proc.wait()

Here two processes are spawned: Popen spawns one subprocess running the shell. The shell in turn spawns a subprocess running ls. proc.terminate() kills the shell, but the subprocess running ls remains. (That is manifested by copious output, even after the python script has ended. Be prepared to kill the ls with pkill ls.)

A non-blocking read on a subprocess.PIPE in Python

fcntl, select, asyncproc won't help in this case.

A reliable way to read a stream without blocking regardless of operating system is to use Queue.get_nowait():

import sys
from subprocess import PIPE, Popen
from threading import Thread

try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty # python 2.x

ON_POSIX = 'posix' in sys.builtin_module_names

def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()

p = Popen(['myprogram.exe'], stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()

# ... do other things here

# read line without blocking
try: line = q.get_nowait() # or q.get(timeout=.1)
except Empty:
print('no output yet')
else: # got line
# ... do something with line

Python subprocess multiple non blocking communicates

I now switched from using subprocess to using pexpect.
My syntax is now as follows:

child = pexpect.spawn('rosrun ros_pkg ros_node')
command = child.sendline('new command')
output = child.read_nonblocking(10000, timeout=1)
....
logic
....
command = child.sendline('new command')
output = child.read_nonblocking(10000, timeout=1)

Many thanks to novel_yet_trivial on reddit: https://www.reddit.com/r/learnpython/comments/2o2viz/subprocess_popen_multiple_times/

Python blocking versus non-blocking OS calls

Sure, if you're running heavy queries once for the life of the app then run them all on startup and cache the results. Do this before you begin serving requests and it will never impact your users. Checking when a user requests a resource to see if a value exists in the cache is a 'lazy' or 'deferred' cache and this will still impact the user. Even using Popen you will still need a way to defer responding to the client and yield to other threads.

It sounds like you're writing a raw HTTP server based on BaseHTTPServer or similar? If so you want to take a look at WSGI and choose one of the WSGI compliant servers such as Gunicorn. Combine this with a WSGI framework such as Flask and you will solve your scaling issues without having to resort to Popen and reinventing the wheel.

Gunicorn handles multi-threading your client connections and Flask handles your request state. You may still need to do some work to handle long-running requests but the process will be much easier.

Typically you want to keep your response times short so you don't have to worry about timeouts etc. To this end if you have a long-running process initiated by the user you may want to split the process into three steps.

  1. start_task The user initiates the request and it is submitted to a task queue (check out Celery or Python RQ) returning a tracking ID.
  2. check_task The user provides a tracking ID and the API returns a status.
  3. get_result Once the task is complete the user retrieves the result.

In your web app UI then you can provide the user with feedback at each stage and potentially provide a progress indicator via the 'check_task' call.

nonblocking subprocesses in python

subprocess.Popen isn't inherently blocking. You can still use proc.stdin.write() and proc.stdout.read(); the only problem is that you risk blocking on one side, or even deadlocking[1], if a pipe fills up. If you know your subprocess will only ever read or write a small amount of data, you don't have to worry about that.

So you can do:

proc = subprocess.Popen(['perl', 'somescript.pl'], stdout=subprocess.PIPE)
buf = StringIO()
CHUNKSIZE = 1024 # how much to read at a time

while True:
# do whatever other time-consuming work you want here, including monitoring
# other processes...

# this keeps the pipe from filling up
buf.write(proc.stdout.read(CHUNKSIZE))

proc.poll()
if proc.returncode is not None:
# process has finished running
buf.write(proc.stdout.read())
print "return code is", proc.returncode
print "output is", buf.getvalue()

break

In a larger app, you could schedule this to happen in your event loop, reactor, etc.


[1] The OS will only allow so much data to fit in a pipe at a time. Say you run cat as your subprocess, and write a ton of data to its stdin. cat will write that data to its own stdout until it fills up, and then it'll block until your program reads some data from stdout and empties the pipe. But your program is still writing to stdin, and cat is no longer reading from it, so that pipe will fill up too. Both processes will be stuck with blocking writes, waiting for the other to read, which will never happen.



Related Topics



Leave a reply



Submit