The Most Reliable Way to Terminate a Family of Processes

The most reliable way to terminate a family of processes

You may want to perform the killing (eventually via a script) in a different login shell to ensure you're not accidentally stopping/killing the very shell/script attempting to do the overall killing before it completes its job :)

The first key strategy is to not directly terminate a process, but to:

  • just "freeze" it first (with kill -STOP <pid>) to prevent it from
    spawning other children (needed to reliably determine its children,
    otherwise you'll miss some as explained in this
    Q&A:https://superuser.com/questions/927836/how-to-deal-with-a-memory-leaking-fork-bomb-on-linux/927967#927967)
  • add it to the list of processes to terminate (later)
  • find the list of its children
  • iterate the whole story on the children, rince repeat

Once the entire ancestry tree based on ppid is frozen you can start locating and freezing ancestries based on process groups - you can still determine these process groups reliably as long as the parents of the processes which changed their process group are still alive (since their ppid is not changed) - add these groups to a list of pgids to be nuked and freeze any new ppid-based process subtrees you may find in these groups like above:

  • if their parents are still alive they should be frozen already as
    they're in the frozen ppid-based ancestry tree
  • if they're orphans they will be killed when the entire pgid will be nuked

Related processes can be discovered by session ID in a manner very similar to the one based on group ID (except killing needs to be done by pid as the kill cmd supports a group ID but not a session ID).

Another way to find potentially related processes would be by their tty, if they have one. But with care - they might not be descendents of the process you want to kill but ancestors or sibblings. You can still freeze the ppid-based subtrees and groups you find this way while you investigate - you can always "thaw" them later (with kill -CONT) if they don't need to be killed.

I don't know how to locate descendant process subtrees decoupled by a processes declaring themselves session leaders (thus changing both their sid and pgid) if their parents died and they have no pty.

Once the entire list of subtrees is frozen processes can be killed (by pid or pgid as needed) or thawed to continue their work if desired.

What's the best way to send a signal to all members of a process group?

You don't say if the tree you want to kill is a single process group. (This is often the case if the tree is the result of forking from a server start or a shell command line.) You can discover process groups using GNU ps as follows:

 ps x -o  "%p %r %y %x %c "

If it is a process group you want to kill, just use the kill(1) command but instead of giving it a process number, give it the negation of the group number. For example to kill every process in group 5112, use kill -TERM -- -5112.

How to kill all child processes after parent process termination?

Here's a probably more portable solution.

The fork(2) system call will return the PID of your child processes, you can store the PIDs, and then you can use kill(2) to send signal to the children and terminates them.

Notice that SIGKILL and SIGTERM signal may require some privileges of the parent process. If it doesn't have such privileges, you can send a SIGCONT to the child process, and modify the SIGCONT signal handler in your child process.

!!! Warning sign

From a signal handler using exit() is not safe. I've just checked the manual man 7 signal and found that it is not async safe. You can use _exit, _Exit or abort

Some pseudo code:

#include <stdio.h> 
#include <unistd.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void* handler(int sig){
_exit(0);
}
int main()
{
pid_t children[6];
for(int i=0;i<6;i++) // loop will run 6 times(there are 6 child processes.)
{
if((children[i] = fork()) == 0)
{
signal(SIGCONT,handler);
printf("Started [son] pid %d from [parent] pid %d\n",getpid(),getppid());

sleep(10); //child waits 10 seconds,then it exitted.

printf("Exitted [son] pid %d from [parent] pid %d\n",getpid(),getppid());

exit(0);
}
}

//parent
sleep(5); //parent will wait 5 seconds than it will exit
for(int i=0;i<6;i++)
kill(children[I],SIGCONT);
printf("Parent terminated\n");
exit(0); //parent terminated.(how can I exit the the other 6 child processes too?)

}

How does Ctrl-C terminate a child process?

Signals by default are handled by the kernel. Old Unix systems had 15 signals; now they have more. You can check </usr/include/signal.h> (or kill -l). CTRL+C is the signal with name SIGINT.

The default action for handling each signal is defined in the kernel too, and usually it terminates the process that received the signal.

All signals (but SIGKILL) can be handled by program.

And this is what the shell does:

  • When the shell running in interactive mode, it has a special signal handling for this mode.
  • When you run a program, for example find, the shell:

    • forks itself
    • and for the child set the default signal handling
    • replace the child with the given command (e.g. with find)
    • when you press CTRL+C, parent shell handle this signal but the child will receive it - with the default action - terminate. (the child can implement signal handling too)

You can trap signals in your shell script too...

And you can set signal handling for your interactive shell too, try enter this at the top of you ~/.profile. (Ensure than you're a already logged in and test it with another terminal - you can lock out yourself)

trap 'echo "Dont do this"' 2

Now, every time you press CTRL+C in your shell, it will print a message. Don't forget to remove the line!

If interested, you can check the plain old /bin/sh signal handling in the source code here.

At the above there were some misinformations in the comments (now deleted), so if someone interested here is a very nice link - how the signal handling works.

What exactly is Python multiprocessing Module's .join() Method Doing?

The join() method, when used with threading or multiprocessing, is not related to str.join() - it's not actually concatenating anything together. Rather, it just means "wait for this [thread/process] to complete". The name join is used because the multiprocessing module's API is meant to look as similar to the threading module's API, and the threading module uses join for its Thread object. Using the term join to mean "wait for a thread to complete" is common across many programming languages, so Python just adopted it as well.

Now, the reason you see the 20 second delay both with and without the call to join() is because by default, when the main process is ready to exit, it will implicitly call join() on all running multiprocessing.Process instances. This isn't as clearly stated in the multiprocessing docs as it should be, but it is mentioned in the Programming Guidelines section:

Remember also that non-daemonic processes will be automatically be
joined.

You can override this behavior by setting the daemon flag on the Process to True prior to starting the process:

p = Process(target=say_hello)
p.daemon = True
p.start()
# Both parent and child will exit here, since the main process has completed.

If you do that, the child process will be terminated as soon as the main process completes:

daemon

The process’s daemon flag, a Boolean value. This must be set before
start() is called.

The initial value is inherited from the creating process.

When a process exits, it attempts to terminate all of its daemonic
child processes.



Related Topics



Leave a reply



Submit