How to Run Multiple Subprocesses via Fork()

What is the best way to run multiple subprocesses via fork()?

Simple example:

import os
chidren = []
for job in jobs:
child = os.fork()
if child:
children.append(child)
else:
pass # really should exec the job
for child in children:
os.waitpid(child, 0)

Timing out a slow child is a little more work; you can use wait instead of waitpid, and cull the returned values from the list of children, instead of waiting on each one in turn (as here). If you set up an alarm with a SIGALRM handler, you can terminate the waiting after a specified delay. This is all standard UNIX stuff, not Python-specific...

How to fork and join multiple subprocesses with a global timeout in Python?

In the "Programming guidlines" section of the multiprocessing — Process-based parallelism documentation, there is this paragraph:

Better to inherit than pickle/unpickle

When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

So multiprocessing.Event() caused a RuntimeError because it is not pickable, as demonstrated by the following Python code snippet:

import multiprocessing
import pickle

pickle.dumps(multiprocessing.Event())

which raises the same exception:

RuntimeError: Condition objects should only be shared between processes through inheritance

A solution is to use a proxy object:

A proxy is an object which refers to a shared object which lives (presumably) in a different process.

because:

An important feature of proxy objects is that they are picklable so they can be passed between processes.

multiprocessing.Manager().Event() creates a shared threading.Event() object and returns a proxy for it, so replacing this line:

self.event = multiprocessing.Event()

by the following line in the Python code snippet of the question solves the problem:

self.event = multiprocessing.Manager().Event()

How do I use subprocess.Popen to connect multiple processes by pipes?

You'd be a little happier with the following.

import subprocess

awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )

Delegate part of the work to the shell. Let it connect two processes with a pipeline.

You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.

Edit. Some of the reasons for suggesting that awk isn't helping.

[There are too many reasons to respond via comments.]

  1. Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.

  2. The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.

  3. The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.

  4. Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.

  5. Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.

Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.

Sidebar Why building a pipeline (a | b) is so hard.

When the shell is confronted with a | b it has to do the following.

  1. Fork a child process of the original shell. This will eventually become b.

  2. Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".

  3. Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.

  4. The b child closes replaces its stdin with the new b's stdin. Exec the b process.

  5. The b child waits for a to complete.

  6. The parent is waiting for b to complete.

I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).

Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.

However, it's easier to delegate that operation to the shell.

how to create two processes from a single Parent

To create a second process, call fork() again - either within the parent or the child (but not both!). Which you choose depends on whether you want this process to be a child of the original parent or a child of the first child process (it is usual for it to be a child of the original parent).

Communicating through a pipe is much simpler and more reliable than using signals. pipe(), close(), read(), write() and select() are the key functions here.


For example, to have the parent create two child processes, you would do something like:

pid_t child_a, child_b;

child_a = fork();

if (child_a == 0) {
/* Child A code */
} else {
child_b = fork();

if (child_b == 0) {
/* Child B code */
} else {
/* Parent Code */
}
}

How do I manage multiple subprocesses in Perl?

Don't use threads. Threads suck. The proper way is to fork multiple processes and wait for them to finish. If you use wait or waitpid, the exit status of the process in question will be available in $?.

See the perldocs for fork, wait, and waitpid, and also the examples in this SO thread.

If all you need is to just manage a pool of subprocesses that doesn't exceed a certain size, check out the excellent Parallel::ForkManager.

How to run child processes simultaneously? in c

You're calling wait() inside of the loop where you're spawning the children, so it won't continue the loop to start the next child until the current one is done.

You need to call wait() outside of the loop in a separate loop:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <time.h>

int main ( int argc, char *argv[] )
{
int i, pid, ran;

for(i = 0; i < atoi(argv[1]); i++) {
pid = fork();
srand(time(NULL));
ran = (rand() % 10) + 1 ;

if (pid < 0) {
printf("Error");
exit(1);
} else if (pid == 0) {
printf("Child (%d): %d\n", i + 1, getpid());
printf("Sleep for = %d\n", ran);
sleep(ran);
exit(ran);
}
}

for(i = 0; i < atoi(argv[1]); i++) {
int status = 0;
pid_t childpid = wait(&status);
printf("Parent knows child %d is finished. \n", (int)childpid);
}
}

Run child processes as different user from a long running Python process

Since you mentioned a daemon, I can conclude that you are running on a Unix-like operating system. This matters, because how to do this depends on the kind operating system. This answer applies only to Unix, including Linux, and Mac OS X.

  1. Define a function that will set the gid and uid of the running process.
  2. Pass this function as the preexec_fn parameter to subprocess.Popen

subprocess.Popen will use the fork/exec model to use your preexec_fn. That is equivalent to calling os.fork(), preexec_fn() (in the child process), and os.exec() (in the child process) in that order. Since os.setuid, os.setgid, and preexec_fn are all only supported on Unix, this solution is not portable to other kinds of operating systems.

The following code is a script (Python 2.4+) that demonstrates how to do this:

import os
import pwd
import subprocess
import sys

def main(my_args=None):
if my_args is None: my_args = sys.argv[1:]
user_name, cwd = my_args[:2]
args = my_args[2:]
pw_record = pwd.getpwnam(user_name)
user_name = pw_record.pw_name
user_home_dir = pw_record.pw_dir
user_uid = pw_record.pw_uid
user_gid = pw_record.pw_gid
env = os.environ.copy()
env[ 'HOME' ] = user_home_dir
env[ 'LOGNAME' ] = user_name
env[ 'PWD' ] = cwd
env[ 'USER' ] = user_name
report_ids('starting ' + str(args))
process = subprocess.Popen(
args, preexec_fn=demote(user_uid, user_gid), cwd=cwd, env=env
)
result = process.wait()
report_ids('finished ' + str(args))
print 'result', result

def demote(user_uid, user_gid):
def result():
report_ids('starting demotion')
os.setgid(user_gid)
os.setuid(user_uid)
report_ids('finished demotion')
return result

def report_ids(msg):
print 'uid, gid = %d, %d; %s' % (os.getuid(), os.getgid(), msg)

if __name__ == '__main__':
main()

You can invoke this script like this:

Start as root...

(hale)/tmp/demo$ sudo bash --norc
(root)/tmp/demo$ ls -l
total 8
drwxr-xr-x 2 hale wheel 68 May 17 16:26 inner
-rw-r--r-- 1 hale staff 1836 May 17 15:25 test-child.py

Become non-root in a child process...

(root)/tmp/demo$ python test-child.py hale inner /bin/bash --norc
uid, gid = 0, 0; starting ['/bin/bash', '--norc']
uid, gid = 0, 0; starting demotion
uid, gid = 501, 20; finished demotion
(hale)/tmp/demo/inner$ pwd
/tmp/demo/inner
(hale)/tmp/demo/inner$ whoami
hale

When the child process exits, we go back to root in parent ...

(hale)/tmp/demo/inner$ exit
exit
uid, gid = 0, 0; finished ['/bin/bash', '--norc']
result 0
(root)/tmp/demo$ pwd
/tmp/demo
(root)/tmp/demo$ whoami
root

Note that having the parent process wait around for the child process to exit is for demonstration purposes only. I did this so that the parent and child could share a terminal. A daemon would have no terminal and would seldom wait around for a child process to exit.

How to use Fork() to create only 2 child processes?

pid = fork (); #1
pidb = fork (); #2

Let us assume the parent process id is 100, the first fork creates another process 101. Now both 100 & 101 continue execution after #1, so they execute second fork. pid 100 reaches #2 creating another process 102. pid 101 reaches #2 creating another process 103. So we end up with 4 processes.

What you should do is something like this.

if(fork()) # parent
if(fork()) #parent
else # child2
else #child1


Related Topics



Leave a reply



Submit