Python Threading Multiple Bash Subprocesses

Python multithreading multiple bash subprocesses. BUT with setting an environmental variable?

Here's the working code, thanks to Chepner AND JohanL comments

import subprocess, os

def make_tmp(tmp_path):

my_env = os.environ.copy()
my_env['TMP'] = tmp_path
dos = 'echo '+tmp_path[-1]+' > '+my_env['TMP']+'\\output.log'
# or run a bat script
# dos = 'C:\\Temp\\launch.bat'
subprocess.Popen(dos, env=my_env, stdout=subprocess.PIPE, shell=True)

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(2)

path_array = ['C:\\Temp\\1', 'C:\\Temp\\2', 'C:\\Temp\\3']

results = pool.map(make_tmp, path_array)

note that the path_array has to be existing folders

How do I run multiple subprocesses in parallel and wait for them to finish in Python

You can still use Popen which takes the same input parameters as subprocess.call but is more flexible.

subprocess.call: The full function signature is the same as that of the Popen constructor - this functions passes all supplied arguments directly through to that interface.

One difference is that subprocess.call blocks and waits for the subprocess to complete (it is built on top of Popen), whereas Popen doesn't block and consequently allows you to launch other processes in parallel.

Try the following:

from subprocess import Popen
commands = ['command1', 'command2']
procs = [ Popen(i) for i in commands ]
for p in procs:
p.wait()

deciding among subprocess, multiprocessing, and thread in Python?

multiprocessing is a great Swiss-army knife type of module. It is more general than threads, as you can even perform remote computations. This is therefore the module I would suggest you use.

The subprocess module would also allow you to launch multiple processes, but I found it to be less convenient to use than the new multiprocessing module.

Threads are notoriously subtle, and, with CPython, you are often limited to one core, with them (even though, as noted in one of the comments, the Global Interpreter Lock (GIL) can be released in C code called from Python code).

I believe that most of the functions of the three modules you cite can be used in a platform-independent way. On the portability side, note that multiprocessing only comes in standard since Python 2.6 (a version for some older versions of Python does exist, though). But it's a great module!

Python: execute cat subprocess in parallel

Another approach (rather than the other suggestion of putting shell processes in the background) is to use multithreading.

The run method that you have would then do something like this:

thread.start_new_thread ( myFuncThatDoesZGrep)

To collect results, you can do something like this:

class MyThread(threading.Thread):
def run(self):
self.finished = False
# Your code to run the command here.
blahBlah()
# When finished....
self.finished = True
self.results = []

Run the thread as stated above in the link on multithreading. When your thread object has myThread.finished == True, then you can collect the results via myThread.results.



Related Topics



Leave a reply



Submit