Return Code When Oom Killer Kills a Process

Return code when OOM killer kills a process

The Linux OOM killer works by sending SIGKILL. If your process is killed by the OOM it's fishy that WIFEXITED returns 1.

TLPI

To kill the selected process, the OOM killer delivers a SIGKILL
signal.

So you should be able to test this using:

if (WIFSIGNALED(status)) {
if (WTERMSIG(status) == SIGKILL)
printf("Killed by SIGKILL\n");
}

Return code when OS kills your process

A process' return status (as returned by wait, waitpid and system) contains more or less the following:

  • Exit code, only applies if process terminated normally
  • whether normal/abnormal termination occured
  • Termination signal, only applies if process was terminated by a signal

The exit code is utterly meaningless if your process was killed by the OOM killer (which will apparently send you a SIGKILL signal)

for more information, see the man page for the wait command.

Finding which process was killed by Linux OOM killer

Try this out:

grep -i 'killed process' /var/log/messages

Out of memory: kill process

It's happening because your server is running out of memory. To solve this problem you have 2 options.

  1. Update your Server's Ram or use SWAP (But upgrading Physical ram is recommended instead of using SWAP)

  2. Limit Nginx ram use.

To limit nginx ram use open the /etc/nginx/nginx.conf file and add client_max_body_size <your_value_here> under the http configuration block. For example:

worker_processes 1;
http {
client_max_body_size 10M;
...
}

Note: use M for MB, G for GB and T for TB

After OOM Killer , is there a Resurrector ?

No. Once a process is killed by the OOM Killer, it's dead. You can restart it (resources permitting), and if it's something that's managed by the system (via inittab, perhaps), it might get restarted that way.

Edit: As a thought experiment, think about what a resurrection of a process would mean. Even if you could store the entire process state, you wouldn't want to because the process killed might be the REASON for the out-of-memory condition.

So the best you could possibly due would be to store it's startup state (command line, etc). But that's no good either, because again, that may be WHY the system ran out of memory in the first place!

Furthermore, if you resurrected a process in this way, there's no telling what could go wrong. What if the process controls hardware? What if the process controls shouldn't be run more than once? What if it was connected to a tty that isn't there anymore (because the sshd was one of the processes killed)?

There's an ENORMOUS amount of context around a process that the system can't possibly be aware of. The ONLY sensible thing is the thing that the kernel does: kill the sucker and go on.

I suppose you can imagine a hibernate-the-process-to-disk strategy, but given that we're out of memory (including swap), that means either pre-reserving some disk space or deciding to allocate disk space to this on the fly. Either of which strategy may not be capable of dealing with the size of the process in question.

In short: No, you don't get to come back from the OOM killer. It's a killer, you just have to deal with it.



Related Topics



Leave a reply



Submit