What is the best way to run multiple subprocesses via fork()?
Simple example:
import os
chidren = []
for job in jobs:
child = os.fork()
if child:
children.append(child)
else:
pass # really should exec the job
for child in children:
os.waitpid(child, 0)
Timing out a slow child is a little more work; you can use wait
instead of waitpid
, and cull the returned values from the list of children, instead of waiting on each one in turn (as here). If you set up an alarm
with a SIGALRM
handler, you can terminate the waiting after a specified delay. This is all standard UNIX stuff, not Python-specific...
How to fork and join multiple subprocesses with a global timeout in Python?
In the "Programming guidlines" section of the multiprocessing
— Process-based parallelism documentation, there is this paragraph:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from
multiprocessing
need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
So multiprocessing.Event()
caused a RuntimeError
because it is not pickable, as demonstrated by the following Python code snippet:
import multiprocessing
import pickle
pickle.dumps(multiprocessing.Event())
which raises the same exception:
RuntimeError: Condition objects should only be shared between processes through inheritance
A solution is to use a proxy object:
A proxy is an object which refers to a shared object which lives (presumably) in a different process.
because:
An important feature of proxy objects is that they are picklable so they can be passed between processes.
multiprocessing.Manager().Event()
creates a shared threading.Event()
object and returns a proxy for it, so replacing this line:
self.event = multiprocessing.Event()
by the following line in the Python code snippet of the question solves the problem:
self.event = multiprocessing.Manager().Event()
How do I use subprocess.Popen to connect multiple processes by pipes?
You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of
awk >file ; sort file
andawk | sort
will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b
) is so hard.
When the shell is confronted with a | b
it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call
os.pipe()
which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".Fork a child. The child replaces its stdout with the new a's stdout. Exec the
a
process.The b child closes replaces its stdin with the new b's stdin. Exec the
b
process.The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c
, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c)
.
Since Python has os.pipe()
, os.exec()
and os.fork()
, and you can replace sys.stdin
and sys.stdout
, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe()
and subprocess.Popen
.
However, it's easier to delegate that operation to the shell.
how to create two processes from a single Parent
To create a second process, call fork()
again - either within the parent or the child (but not both!). Which you choose depends on whether you want this process to be a child of the original parent or a child of the first child process (it is usual for it to be a child of the original parent).
Communicating through a pipe is much simpler and more reliable than using signals. pipe()
, close()
, read()
, write()
and select()
are the key functions here.
For example, to have the parent create two child processes, you would do something like:
pid_t child_a, child_b;
child_a = fork();
if (child_a == 0) {
/* Child A code */
} else {
child_b = fork();
if (child_b == 0) {
/* Child B code */
} else {
/* Parent Code */
}
}
How do I manage multiple subprocesses in Perl?
Don't use threads. Threads suck. The proper way is to fork
multiple processes and wait
for them to finish. If you use wait
or waitpid
, the exit status of the process in question will be available in $?
.
See the perldocs for fork, wait, and waitpid, and also the examples in this SO thread.
If all you need is to just manage a pool of subprocesses that doesn't exceed a certain size, check out the excellent Parallel::ForkManager.
How to run child processes simultaneously? in c
You're calling wait()
inside of the loop where you're spawning the children, so it won't continue the loop to start the next child until the current one is done.
You need to call wait()
outside of the loop in a separate loop:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <time.h>
int main ( int argc, char *argv[] )
{
int i, pid, ran;
for(i = 0; i < atoi(argv[1]); i++) {
pid = fork();
srand(time(NULL));
ran = (rand() % 10) + 1 ;
if (pid < 0) {
printf("Error");
exit(1);
} else if (pid == 0) {
printf("Child (%d): %d\n", i + 1, getpid());
printf("Sleep for = %d\n", ran);
sleep(ran);
exit(ran);
}
}
for(i = 0; i < atoi(argv[1]); i++) {
int status = 0;
pid_t childpid = wait(&status);
printf("Parent knows child %d is finished. \n", (int)childpid);
}
}
Run child processes as different user from a long running Python process
Since you mentioned a daemon, I can conclude that you are running on a Unix-like operating system. This matters, because how to do this depends on the kind operating system. This answer applies only to Unix, including Linux, and Mac OS X.
- Define a function that will set the gid and uid of the running process.
- Pass this function as the preexec_fn parameter to subprocess.Popen
subprocess.Popen will use the fork/exec model to use your preexec_fn. That is equivalent to calling os.fork(), preexec_fn() (in the child process), and os.exec() (in the child process) in that order. Since os.setuid, os.setgid, and preexec_fn are all only supported on Unix, this solution is not portable to other kinds of operating systems.
The following code is a script (Python 2.4+) that demonstrates how to do this:
import os
import pwd
import subprocess
import sys
def main(my_args=None):
if my_args is None: my_args = sys.argv[1:]
user_name, cwd = my_args[:2]
args = my_args[2:]
pw_record = pwd.getpwnam(user_name)
user_name = pw_record.pw_name
user_home_dir = pw_record.pw_dir
user_uid = pw_record.pw_uid
user_gid = pw_record.pw_gid
env = os.environ.copy()
env[ 'HOME' ] = user_home_dir
env[ 'LOGNAME' ] = user_name
env[ 'PWD' ] = cwd
env[ 'USER' ] = user_name
report_ids('starting ' + str(args))
process = subprocess.Popen(
args, preexec_fn=demote(user_uid, user_gid), cwd=cwd, env=env
)
result = process.wait()
report_ids('finished ' + str(args))
print 'result', result
def demote(user_uid, user_gid):
def result():
report_ids('starting demotion')
os.setgid(user_gid)
os.setuid(user_uid)
report_ids('finished demotion')
return result
def report_ids(msg):
print 'uid, gid = %d, %d; %s' % (os.getuid(), os.getgid(), msg)
if __name__ == '__main__':
main()
You can invoke this script like this:
Start as root...
(hale)/tmp/demo$ sudo bash --norc
(root)/tmp/demo$ ls -l
total 8
drwxr-xr-x 2 hale wheel 68 May 17 16:26 inner
-rw-r--r-- 1 hale staff 1836 May 17 15:25 test-child.py
Become non-root in a child process...
(root)/tmp/demo$ python test-child.py hale inner /bin/bash --norc
uid, gid = 0, 0; starting ['/bin/bash', '--norc']
uid, gid = 0, 0; starting demotion
uid, gid = 501, 20; finished demotion
(hale)/tmp/demo/inner$ pwd
/tmp/demo/inner
(hale)/tmp/demo/inner$ whoami
hale
When the child process exits, we go back to root in parent ...
(hale)/tmp/demo/inner$ exit
exit
uid, gid = 0, 0; finished ['/bin/bash', '--norc']
result 0
(root)/tmp/demo$ pwd
/tmp/demo
(root)/tmp/demo$ whoami
root
Note that having the parent process wait around for the child process to exit is for demonstration purposes only. I did this so that the parent and child could share a terminal. A daemon would have no terminal and would seldom wait around for a child process to exit.
How to use Fork() to create only 2 child processes?
pid = fork (); #1
pidb = fork (); #2
Let us assume the parent process id is 100, the first fork creates another process 101. Now both 100 & 101 continue execution after #1, so they execute second fork. pid 100 reaches #2 creating another process 102. pid 101 reaches #2 creating another process 103. So we end up with 4 processes.
What you should do is something like this.
if(fork()) # parent
if(fork()) #parent
else # child2
else #child1
Related Topics
Decrypt Chrome Linux Blob Encrypted Cookies in Python
Oserror: [Error 1] Operation Not Permitted
Matplotlib-Animation "No Moviewriters Available"
How to Retrieve the Process Start Time (Or Uptime) in Python
Setting Ld_Library_Path from Inside Python
Find the Oldest File (Recursively) in a Directory
Default Buffer Size for a File on Linux
Getting Another Program's Output as Input on the Fly
Running Process of Remote Ssh Server in the Background Using Python Paramiko
Mismatch Between Sys.Executable and Sys.Version in Python
Groupby Weighted Average and Sum in Pandas Dataframe
Python Multiprocessing Memory Usage
How to Send a Signal from a Python Program
Unresolved Reference Issue in Pycharm