Python Process Pool Non-Daemonic

Python Process Pool non-daemonic?

The multiprocessing.pool.Pool class creates the worker processes in its __init__ method, makes them daemonic and starts them, and it is not possible to re-set their daemon attribute to False before they are started (and afterwards it's not allowed anymore). But you can create your own sub-class of multiprocesing.pool.Pool (multiprocessing.Pool is just a wrapper function) and substitute your own multiprocessing.Process sub-class, which is always non-daemonic, to be used for the worker processes.

Here's a full example of how to do this. The important parts are the two classes NoDaemonProcess and MyPool at the top and to call pool.close() and pool.join() on your MyPool instance at the end.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time

from random import randint

class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)

# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess

def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t

def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)

result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])

# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result

def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(5)

result = pool.map(work, [randint(1, 5) for x in range(5)])

pool.close()
pool.join()
print(result)

if __name__ == '__main__':
test()

multiprocessing gives AssertionError: daemonic processes are not allowed to have children

The problem appears to be that primefac uses its own multiprocessing.Pool. Unfortunately, while PyPI is down, I can't find the source to the module—but I did find various forks on GitHub, like this one, and they all have multiprocessing code.

So, your apparently simple example isn't all that simple—because it's importing and running non-simple code.

By default, all Pool processes are daemonic, so you can't create more child processes from inside another Pool. Usually, attempting to do so is a mistake.

If you really do want to multiprocess the factors even though some of them are going to multiprocess their own work (quite possibly adding more contention overhead without adding any parallelism), then you just have to subclass Pool and override that—as explained in the related question that you linked.

But the simplest thing is to just not use multiprocessing here, if primefac is already using your cores efficiently. (If you need quasi-concurrency, getting answers as they come in instead of getting them in sequence, I suppose you could do that with a thread pool, but I don't think there's any advantage to that here—you're not using imap_unordered or explicit AsyncResult anywhere.)

Alternatively, if it's not using all of your cores most of the time, only doing so for the "tricky remainders" at the end of factoring some numbers, while you've got 7 cores sitting idle for 60% of the time… then you probably want to prevent primefac from using multiprocessing at all. I don't know if the module has a public API for doing that. If so, of course, just use it. If not… well, you may have to subclass or monkeypatch some of its code, or, at worst, monkeypatching its import of multiprocessing, and that may not be worth doing.

The ideal solution would probably be to refactor primefac to push the "tricky remainder" jobs onto the same pool you're already using. But that's probably by far the most work, and not that much more benefit.


As a side note, this isn't your problem, but you should have a __main__ guard around your top-level code, like this:

from multiprocessing import Pool
from primefac import factorint

if __name__ == '__main__':
N = 10**30
L = range(N,N + 100)
pool = Pool()
pool.map(factorint, L)

Otherwise, when run with the spawn or forkserver startmethods—and notice that spawn is the only one available on Windows—each pool process is going to try to create another pool of children. So, if you run your code on Windows, you would get this same assertion—as a way for multiprocessing to protect you from accidentally forkbombing your system.

This is explained under safe importing of main module in the "programming guidelines" section of the multiprocessing docs.

Is multiprocessing.Pool not allowed in Airflow task? - AssertionError: daemonic processes are not allowed to have children

Airflow 2 uses different processing model under the hood to speed up processing, yet to maintain process-based isolation between running tasks.

That's why it uses forking and multiprocessing under the hook to run Tasks, but this also means that if you are using multiprocessing, you will hit the limits of Python multiprocessing that does not allow to chain multi-processing.

I am not 100% sure if it will work but you might try to set execute_tasks_new_python_interpreter configuration to True. https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#execute-tasks-new-python-interpreter . This setting will cause airflow to start a new Python interpreter when running task instead of forking/using multiprocessing (though I am not 100% sure of the latter). It will work quite a bit slower (up to a few seconds of overhead) though to run your task as the new Python interpreter will have to reinitialize and import all the airflow code before running your task.

If that does not work, then you can lunch your multiprocessing job using PythonVirtualenvOperator - that one will launch a new Python interpreter to run your python code and you should be able to use multiprocessing.

Disadvantage to disabling daemon attribute on a Process?

A daemonic process will be killed if its parent process terminates. A non-daemonic process will block its parent process from terminating until it terminates too.

So if you don't mind sub-processes blocking its parent process, you can feel free to use non-daemonic process.



Related Topics



Leave a reply



Submit