How to Signal an Application Without Killing It in Linux

How to signal an application without killing it in Linux?

From the GNU docs about signal handling:

The SIGUSR1 and SIGUSR2 signals are set aside for you to use any way you want. They're useful for simple interprocess communication, if you write a signal handler for them in the program that receives the signal.
There is an example showing the use of SIGUSR1 and SIGUSR2 in section Signaling Another Process.
The default action is to terminate the process.

The default action for SIGINFO is to do nothing, so it may be more suitable:

SIGINFO: Information request. In 4.4 BSD and the GNU system, this signal is sent to all the processes in the foreground process group of the controlling terminal when the user types the STATUS character in canonical mode; see section Characters that Cause Signals.
If the process is the leader of the process group, the default action is to print some status information about the system and what the process is doing. Otherwise the default is to do nothing.

SIGHUP is emitted when the controlling terminal is closed, but since most daemons are not attached to a terminal it is not uncommon to use it as "reload":

Daemon programs sometimes use SIGHUP as a signal to restart themselves, the most common reason for this being to re-read a configuration file that has been changed.

BTW, your watchdog could read a config file from time to time in order to know if it should relaunch the process.

My personal favorite for a watchdog is supervisor.

$ supervisorctl start someapp
someapp: started

$ supervisorctl status someapp
someapp                RUNNING    pid 16583, uptime 19:16:26

$ supervisorctl stop someapp
someapp: stopped

See if kill -l returns the list of signals on your platform and try some of them, but SIGUSR1 seems like a bad choice.

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

[UPDATE]

Carpetsmoker comments about differences in behavior between Linux and BSDs:

SIGINFO seems to work different on GNU libc & BSD; on BSD, it works as you describe, but on Linux, it either doesn't exist, or is the same as SIGPWR... The GNU libc manual seems incorrect in this regard (your kill -l output also doesn't show SIGINFO)... I don't know why GNU doesn't support it, because I find it to be very useful... – Carpetsmoker

Signal that can be used just by my application

There are signals that are meant for use for user programs. From man signal:

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

SIGSTOP will always stop the program and SIGKILL will always terminate the program.

There are two user-defined signals commonly used for signal communication between processes:

SIGUSR1 ... User-defined signal 1
SIGUSR2 ... User-defined signal 2

And there is also a whole range of real-time signals for use as user-defined signals between SIGRTMIN and SIGRTMAX, that have to be at least 8 signals (ie. SIGRTMAX - SIGRTMIN >= 8) and linux supports 33 signals. These are all for use for user-application to do anything it desires.

Send signal to process from command line

As far as I understood your question you want to signal a process by its name, not by its PID. This can easily be achieved by combining the two commands:

kill -s signal $(ps -C executable)

Does it kill the process that signals?

kill can kill. It doesn't necessarily.

From man kill:

The command kill sends the specified signal to the specified
processes
or process groups.

That means, the kill command is used to **send any signal in general.

If it kills a process it means that it's similar to do exit(0), or
does the process resume after the signal is sent back?

From here:

The SIGKILL signal is used to cause immediate program termination. It
cannot be handled or ignored, and is therefore always fatal. It is
also not possible to block this signal.

If a process receives the SIGKILL signal, it terminates immediately (no destructors called, no cleanup done). The only processes that do not terminate are uninterruptible processes.

A full list of signals available on Linux is found here.

What's the best way to send a signal to all members of a process group?

You don't say if the tree you want to kill is a single process group. (This is often the case if the tree is the result of forking from a server start or a shell command line.) You can discover process groups using GNU ps as follows:

 ps x -o  "%p %r %y %x %c "

If it is a process group you want to kill, just use the kill(1) command but instead of giving it a process number, give it the negation of the group number. For example to kill every process in group 5112, use kill -TERM -- -5112.

c++ application on linux, on hook before kill

~~If I'm right the OOM will send your process a SIGTERM signal, so you can handle it as you want.~~

I was not right, most probably OOM will send you SIGKILL and you can not do anything. But under certain circumstances you will get a SIGTERM before.

(non tested draft)

#include <csignal>
void signal_handler(int signal) {
    // Your handling code here
}

int main() {
    // Install handler (assign handler to signal)
    std::signal(SIGINT, signal_handler);
}

C counterpart:

#include<signal.h>
#include<unistd.h>

void signal_handler(int signo)
{
  if (signo == SIGTERM) {
    // your handling code
  }        
}

int main(void)
{
  if (signal(SIGTERM, signal_handler) == SIG_ERR) {
    printf("\nError installing handler\n");
  }
  // Rest of your application
}

Be careful when handling signals, as you are overriding the default behavior. Your program should not ignore important signals like SIGTERM or SIGINT: the handling function has to do the work of finishing the program or maybe calling the original handler.

On the other hand, you can play with it: if you are sure the problem is the allocated memory you could try to free unused space and try to continue the work (but you need to ensure the signal reason was that).

Signal without any default action except SIGINFO

See man 7 signal for a list of all signals and their default dispositions. Currently I see these as being ignored by default:

   Signal     Value     Action   Comment
   ──────────────────────────────────────────────────────────────
   SIGCHLD    20,17,18    Ign    Child stopped or terminated
   SIGURG     16,23,21    Ign    Urgent condition on socket (4.2BSD)
   SIGWINCH   28,28,20    Ign    Window resize signal (4.3BSD, Sun)

As you can see, there are really not many choices. I would say that of the above, SIGCHLD might be OK if you are sure you have no child processes, or SIGURG if you are sure you have no sockets which might be signaled that way. Finally, SIGWINCH is only appropriate if you are sure your program will not have a controlling terminal which could be resized.

In what order should I send signals to gracefully shutdown processes?

SIGTERM tells an application to terminate. The other signals tell the application other things which are unrelated to shutdown but may sometimes have the same result. Don't use those. If you want an application to shut down, tell it to. Don't give it misleading signals.

Some people believe the smart standard way of terminating a process is by sending it a slew of signals, such as HUP, INT, TERM and finally KILL. This is ridiculous. The right signal for termination is SIGTERM and if SIGTERM doesn't terminate the process instantly, as you might prefer, it's because the application has chosen to handle the signal. Which means it has a very good reason to not terminate immediately: It's got cleanup work to do. If you interrupt that cleanup work with other signals, there's no telling what data from memory it hasn't yet saved to disk, what client applications are left hanging or whether you're interrupting it "mid-sentence" which is effectively data corruption.

For more information on what the real meaning of the signals is, see sigaction(2). Don't confuse "Default Action" with "Description", they are not the same thing.

SIGINT is used to signal an interactive "keyboard interrupt" of the process. Some programs may handle the situation in a special way for the purpose of terminal users.

SIGHUP is used to signal that the terminal has disappeared and is no longer looking at the process. That is all. Some processes choose to shut down in response, generally because their operation makes no sense without a terminal, some choose to do other things such as recheck configuration files.

SIGKILL is used to forcefully remove the process from the kernel. It is special in the sense that it's not actually a signal to the process but rather gets interpreted by the kernel directly.

Don't send SIGKILL. - SIGKILL should certainly never be sent by scripts. If the application handles the SIGTERM, it can take it a second to cleanup, it can take a minute, it can take an hour. Depending on what the application has to get done before it's ready to end. Any logic that "assumes" an application's cleanup sequence has taken long enough and needs to be shortcut or SIGKILLed after X seconds is just plain wrong.

The only reason why an application would need a SIGKILL to terminate, is if something bugged out during its cleanup sequence. In which case you can open a terminal and SIGKILL it manually. Aside from that, the only one other reason why you'd SIGKILL something is because you WANT to prevent it from cleaning itself up.

Even though half the world blindly sends SIGKILL after 5 seconds it's still horribly wrong thing to do.

Sending SIGINT to forked exec process which runs script does not kill it

A SOLUTION:

Add this line on top of your work.sh script trap exit SIGINT to have an explicit SIGINT handler:

#! /bin/bash

trap exit SIGINT

COUNTER=0
while true
do
    ((COUNTER+=1))
    echo "#${COUNTER} Working..."
    sleep 1
done

Running work executable now prints:

#1 Working...
#2 Working...
#3 Working...
#4 Working...
#5 Working...

after which it returns back to shell.

THE PROBLEM:

I found this webpage linked in a comment to this question on Unix stackexchange (For the sake of completeness, here also the webpage linked in the accepted answer.) Here's a quote that might explain what's going on:

bash is among a few shells that implement a wait and cooperative exit approach at handling SIGINT/SIGQUIT delivery. When interpreting a script, upon receiving a SIGINT, it doesn't exit straight away but instead waits for the currently running command to return and only exits (by killing itself with SIGINT) if that command was also killed by that SIGINT. The idea is that if your script calls vi for instance, and you press Ctrl+C within vi to cancel an action, that should not be considered as a request to abort the script.
So imagine you're writing a script and that script exits normally upon receiving SIGINT. That means that if that script is invoked from another bash script, Ctrl-C will no longer interrupt that other script.
This kind of problem can be seen with actual commands that do exit normally upon SIGINT by design.

EDIT:

I found another Unix stackexchange answer that explains it even better. If you look at bash(1) man pages the following is also quite explanatory:

Non-builtin commands run by bash have signal handlers set to the values inherited by the shell from its parent. When job control is not in effect, asynchronous commands ignore SIGINT and SIGQUIT in addition to these inherited handlers.

especially when considering that:

Signals ignored upon entry to the shell cannot be trapped, reset or listed.

Basically, running work.sh runs it in a separate execution environment:

When a simple command other than a builtin or shell function is to be executed, it is invoked in a separate execution environment.

This includes the signal handlers which (if not explicitly present) will ignore SIGINT and SIGQUIT by default.

How to Signal an Application Without Killing It in Linux