Log output of multiprocessing.Process
The easiest way might be to just override sys.stdout
. Slightly modifying an example from the multiprocessing manual:
from multiprocessing import Process
import os
import sys
def info(title):
print title
print 'module name:', __name__
print 'parent process:', os.getppid()
print 'process id:', os.getpid()
def f(name):
sys.stdout = open(str(os.getpid()) + ".out", "w")
info('function f')
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
q = Process(target=f, args=('fred',))
q.start()
p.join()
q.join()
And running it:
$ ls
m.py
$ python m.py
$ ls
27493.out 27494.out m.py
$ cat 27493.out
function f
module name: __main__
parent process: 27492
process id: 27493
hello bob
$ cat 27494.out
function f
module name: __main__
parent process: 27492
process id: 27494
hello fred
How should I log while using multiprocessing in Python?
The only way to deal with this non-intrusively is to:
- Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
- Your controller process can then do one of the following:
- If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
- If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically
select
from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)
Does python logging support multiprocessing?
As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.
Now what happens is that every process holds an open file handle and does an "append write" into that file. The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by e.g. another process writing to the same file and intermingling his output). This problem applies to every programming language, as in the end they'll do a syscall to the kernel. This answer answers under which circumstances a shared log file is ok.
It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h
and is 4096 bytes. For other OSes you find here a good list.
That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (i.e. no network in between). But for more details please check the first link in my answer. To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000))
with different lenghts. With 5000 for instance I got already mixed up log lines in /tmp/test.log
.
In this question there are already quite a few solutions to this, so I won't add my own solution here.
Update: Flask and multiprocessing
Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx. In that case, multiple processes may write into one log file. Will it have problems?
The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see e.g. [this flask+nginx example])(http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/), probably also adding process information so you can associate error lines to processes. From flasks doc:
By default as of Flask 0.11, errors are logged to your webserver’s log automatically. Warnings however are not.
So you'd still have this issue of intermingled log files if you use warn
and the message exceeds the pipe buffer size.
How to get Python multiprocessing pool working to write into the same log file
In general it is not a good idea to try and write to the same file at the same time from multiple processes. It might or might not work depending on the OS and implementation details.
In the best case it works. In the worst case you get writes from different processes interleaved in interesting ways.
The way to fix this with the least amount of change to your code is to protect the log file with a Lock
object. That is, you create a Lock
object before using apply_async
, and you pass that lock in the args
of apply_async
. In the worker process, you acquire
the Lock
before writing to the log file. After writing, flush the log file (using the reset
method) and release
the lock. This should ensure that only one process at a time is writing to the log file.
Python start multiprocessing without print/logging statements from processes
If I correct understand you, you want to not show printing from one of processes.
You can achieve this by redirect output of the Python Interpreter.
Add sys.stdout = open("/dev/null", 'w')
to the process which you want to "mute".
Full working example below.
from multiprocessing import Process
from time import sleep
import sys
def start_viewer():
sys.stdout = open("/dev/null", 'w')
while True:
print("start_viewer")
sleep(1)
def start_server():
while True:
print("start_server")
sleep(1)
if __name__ == '__main__':
processes = [
Process(target=start_viewer, args=()),
Process(target=start_server, args=())
]
for p in processes:
p.start()
Be aware that /dev/null
is like passing prints to nowhere, if you want to save it you can use text file. Also to achieve multi os support you should use os.devnull
.
Related Topics
Is There a Simple Way to Change a Column of Yes/No to 1/0 in a Pandas Dataframe
How to Compare Dates in Django Templates
How to Write Tests for the Argparse Portion of a Python Module
How to Match Any String from a List of Strings in Regular Expressions in Python
Appending Item to Lists Within a List Comprehension
Sphinx's Autodoc's Automodule Having Apparently No Effect
Replace Nth Occurrence of Substring in String
Python Overwriting Variables in Nested Functions
Get the Position of the Largest Value in a Multi-Dimensional Numpy Array
How to Remove Item from a Python List in a Loop
Interactive Pixel Information of an Image in Python
How to Remove the First Item from a List
How to Convert Column with List of Values into Rows in Pandas Dataframe