Log Output of Multiprocessing.Process

Log output of multiprocessing.Process

The easiest way might be to just override sys.stdout. Slightly modifying an example from the multiprocessing manual:

from multiprocessing import Process
import os
import sys

def info(title):
print title
print 'module name:', __name__
print 'parent process:', os.getppid()
print 'process id:', os.getpid()

def f(name):
sys.stdout = open(str(os.getpid()) + ".out", "w")
info('function f')
print 'hello', name

if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
q = Process(target=f, args=('fred',))
q.start()
p.join()
q.join()

And running it:


$ ls
m.py
$ python m.py
$ ls
27493.out 27494.out m.py
$ cat 27493.out
function f
module name: __main__
parent process: 27492
process id: 27493
hello bob
$ cat 27494.out
function f
module name: __main__
parent process: 27492
process id: 27494
hello fred

How should I log while using multiprocessing in Python?

The only way to deal with this non-intrusively is to:

  1. Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
  2. Your controller process can then do one of the following:

    • If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
    • If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)

Does python logging support multiprocessing?

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.

Now what happens is that every process holds an open file handle and does an "append write" into that file. The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by e.g. another process writing to the same file and intermingling his output). This problem applies to every programming language, as in the end they'll do a syscall to the kernel. This answer answers under which circumstances a shared log file is ok.

It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h and is 4096 bytes. For other OSes you find here a good list.

That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (i.e. no network in between). But for more details please check the first link in my answer. To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) with different lenghts. With 5000 for instance I got already mixed up log lines in /tmp/test.log.

In this question there are already quite a few solutions to this, so I won't add my own solution here.

Update: Flask and multiprocessing

Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx. In that case, multiple processes may write into one log file. Will it have problems?

The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see e.g. [this flask+nginx example])(http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/), probably also adding process information so you can associate error lines to processes. From flasks doc:

By default as of Flask 0.11, errors are logged to your webserver’s log automatically. Warnings however are not.

So you'd still have this issue of intermingled log files if you use warn and the message exceeds the pipe buffer size.

How to get Python multiprocessing pool working to write into the same log file

In general it is not a good idea to try and write to the same file at the same time from multiple processes. It might or might not work depending on the OS and implementation details.

In the best case it works. In the worst case you get writes from different processes interleaved in interesting ways.

The way to fix this with the least amount of change to your code is to protect the log file with a Lock object. That is, you create a Lock object before using apply_async, and you pass that lock in the args of apply_async. In the worker process, you acquire the Lock before writing to the log file. After writing, flush the log file (using the reset method) and release the lock. This should ensure that only one process at a time is writing to the log file.

Python start multiprocessing without print/logging statements from processes

If I correct understand you, you want to not show printing from one of processes.
You can achieve this by redirect output of the Python Interpreter.
Add sys.stdout = open("/dev/null", 'w') to the process which you want to "mute".

Full working example below.

from multiprocessing import Process
from time import sleep
import sys

def start_viewer():
sys.stdout = open("/dev/null", 'w')
while True:
print("start_viewer")
sleep(1)

def start_server():
while True:
print("start_server")
sleep(1)

if __name__ == '__main__':
processes = [
Process(target=start_viewer, args=()),
Process(target=start_server, args=())
]

for p in processes:
p.start()

Be aware that /dev/null is like passing prints to nowhere, if you want to save it you can use text file. Also to achieve multi os support you should use os.devnull.



Related Topics



Leave a reply



Submit