Can't Detach Child Process When Main Process Is Started from Systemd

Can't detach child process when main process is started from systemd

Solution is to add

KillMode=process

to the service block. Default value is control-group which means systemd cleans up any child processes.

From man systemd.kill

KillMode= Specifies how processes of this unit shall be killed. One of
control-group, process, mixed, none.

If set to control-group, all remaining processes in the control group
of this unit will be killed on unit stop (for services: after the stop
command is executed, as configured with ExecStop=). If set to process,
only the main process itself is killed. If set to mixed, the SIGTERM
signal (see below) is sent to the main process while the subsequent
SIGKILL signal (see below) is sent to all remaining processes of the
unit's control group. If set to none, no process is killed. In this
case, only the stop command will be executed on unit stop, but no
process be killed otherwise. Processes remaining alive after stop are
left in their control group and the control group continues to exist
after stop unless it is empty.

What is the way to avoid systemctl from killing one of my child processes on a stop?

You can ask systemd to start a process inside a new control group by using the systemd-run command.

Why child process (daemon=True) not exiting when main process exit in python?

Notes:

  • The use of xrange the implies Python 2
  • xrange(1, 4) will yield 3 values not 4 (so, there will only be 3 children)

This is not quite how things work. The doc ([Python 2.Docs]: multiprocessing - daemon) should probably be more specific.

The thing is that multiprocessing registers a cleanup function to kill all its deamonic children when exiting. That is done via [Python 2.Docs]: atexit - Exit handlers:

Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.

You don't handle the TERM signal (sent by default by the kill command), therefore the cleanup function is not called by the main process (leaving its children running).

I modified your code to better illustrate the behavior.

code00.py:

#!/usr/bin/env python2

import sys
import multiprocessing
import os
import time


print_text_pattern = "Output from process {0:s} - pid: {1:d}, ppid: {2:d}"


def child(name):
while True:
print(print_text_pattern.format(name, os.getpid(), os.getppid()))
time.sleep(1)


def main():
procs = list()
for x in xrange(1, 3):
proc_name = "Child{0:d}".format(x)
proc = multiprocessing.Process(target=child, args=(proc_name,))
proc.daemon = True #x % 2 == 0
print("Process {0:s} daemon: {1:}".format(proc_name, proc.daemon))
procs.append(proc)

for proc in procs:
proc.start()

counter = 0
while counter < 3:
print(print_text_pattern.format("Main", os.getpid(), os.getppid()))
time.sleep(1)
counter += 1


if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main()
print("\nDone.")

Notes:

  • Changed the way how children processes are spawned a bit: all of them are created 1st, and only then started
  • Added some print calls from each process, to track their activity in the stdout - also added some time.sleep calls (1 second), to avoid producing too much output
  • Most important - the main process no longer runs forever. At some point it exits gracefully (after 3 cycles - due to counter variable), and there's when the behavior that I mentioned earlier kicks in.
    This could also have been possible by intercepting the TERM signal (and others that can be explicitly be sent by the kill command) and performing the cleanup then - in that way the children would be killed as well when killing the main process - but that's more complicated
  • I simplified things a bit so that only 2 children are spawned
  • Moved everything in a main function (for structure) enclosed in a if __name__ == "__main__": conditional, so the processes are not spawned if you import the module
  • Give different values proc.daemon for each child then monitor the output and ps -ef | grep "code00.py" output
  • Added an argument (name) to child func, but that's only for display purposes

Output:

[cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow]> python2 code00.py
Python 2.7.12 (default, Oct 8 2019, 14:14:10) [GCC 5.4.0 20160609] 64bit on linux2

Process Child1 daemon: True
Process Child2 daemon: True
Output from process Main - pid: 1433, ppid: 1209
Output from process Child1 - pid: 1434, ppid: 1433
Output from process Child2 - pid: 1435, ppid: 1433
Output from process Main - pid: 1433, ppid: 1209
Output from process Child2 - pid: 1435, ppid: 1433
Output from process Child1 - pid: 1434, ppid: 1433
Output from process Main - pid: 1433, ppid: 1209
Output from process Child1 - pid: 1434, ppid: 1433
Output from process Child2 - pid: 1435, ppid: 1433
Output from process Child1 - pid: 1434, ppid: 1433
Output from process Child2 - pid: 1435, ppid: 1433

Done.

My program does not stop running after finishing child process

From my top comment ...

You are creating a zombie process. This is because the parent process is not waiting for the child to complete.

The parent process will terminate [relatively] quickly. Thus, the child loses its parent and becomes a zombie. A zombie will be reparented by the kernel as a child of process 1 (e.g. systemd or initd).

To fix, add: wait(NULL); after the final printf


UPDATE:

Therefore do I need to always put wait(NULL) in these types of situations?

The TL;DR is ... Yes!

This is what you normally want to do for most programs.

One of the few times you would want to create a zombie is (e.g.) if you're a server program (e.g. inetd).

Servers want to run "detached". That is, as a child of the init process (e.g. systemd, initd, etc.). There is one and only one init process on the system.

All other processes are children of init, even if indirectly. For example, your program's process hierarchy was something like:

init -> window_manager -> xterm -> bash -> your_program

Anyway, most server programs these days are fired up by systemd directly. It examines some config files and starts things based on these config options. So, now, most server programs don't have to do anything special.

But, if you were testing a server of your own, invoked it from the command line, and wanted it to run [detached] in the background, you might do:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <sys/wait.h>

int opt_d;

int
main(int argc, char **argv)
{
char *cp;
pid_t childpid;
int status;

// skip over program name
--argc;
++argv;

for (; argc > 0; --argc, ++argv) {
cp = *argv;
if (*cp != '-')
break;

cp += 2;
switch (cp[-1]) {
case 'd':
opt_d = 1;
break;
}
}

// detach into background
if (opt_d) {
childpid = fork();

if (childpid == -1) {
perror("Failed to detach\n");
exit(1);
}

// exit the parent -- child is now detached [and a zombie] and a child
// of the init process
if (childpid != 0)
exit(0);
}

childpid = fork();

if (childpid == -1) {
perror("Failed to fork\n");
exit(1);
}

if (childpid == 0) {
printf("I am in child process with id = %lu\n", (long) getpid());
execvp(*argv, argv);
perror("exec failure ");
exit(1);
}

printf("I am in parent process with id = %lu\n", (long) getpid());
wait(&status);

return 0;
}

PID returned by spawn differs from Process.pid of the child process

Looks like the background job operator (&) is causing the intermediate process 1886789. When I remove the background job operator, I get the following output:

Hi parent
93185
Start pid 93185
Bye parent
End pid 93185

How to keep child process active when parent is killed/finished (in windows)


  1. In Windows, when you kill the parent process, its children are also killed. (Not like Linux, where after the parent is killed, children get their new parent -INIT with PID 1).
  2. In windows, the parent will also not exit automatically (parent PID will be present) if the child is still running. (For your question).

So, the solution would be that you add a step where before ending of the main script, child PID is killed; and so as when child will be killed, the parent would also be able to exit successfully.

So, now if your main script finishes before the threshold time, it means no action is required by the child (And also, child will be killed before the main script completes). And if the main script crosses the threshold, the child will send a mail. And when the main script will be about to end, child will be killed first, and the main script can get exited successfully.



Related Topics



Leave a reply



Submit