Gracefully Shutting Down Sidekiq Processes

Rails: How to restart sidekiq?

So after you find you PID, you can use the below commands: the first will stop the workers from getting new jobs and will let existing jobs complete:

kill -USR1 [PID]

after that, you can kill the process using:

kill -TERM [PID]

Also, there is a page on sidekiq/wiki about this called Signals.

[edit]

Here is the signal page.

[edit]

Check video

For finding PIDs one can use:

ps aux | grep sidekiq

Is there way to run code before Sidekiq is restarted in the middle of a job?

Update:

  • Thanks to @Aaron, and following our discussion (comments below), the ensure block (which is executed by the forked worker-threads) can only be ran for a few unguaranteed milliseconds before the main-thread forcefully terminates these worker-threads, in order for the main-thread to do some "cleanup" up the exception stack, in order to avoid getting SIGKILL-ed by Heroku. Therefore, make sure that your ensure code should be really fast!

TL;DR:

def perform(*args)
# your code here
ensure
process.update_attributes(is_running: false, last_execution_time: Time.now)
end
  • The ensure above is always called regardless if the method "succeeded" or an Exception is raised. I tested this: see this repl code, and click "Run"

  • In other words, this is always called even on a SignalException, even if the signal is SIGTERM (gracefully shutdown signal), but ONLY EXCEPT on SIGKILL (force unrescueable shutdown). You can verify this behaviour by checking my repl code, and then change Process.kill('TERM', Process.pid) to Process.kill('KILL', Process.pid), and then click "run" again (you'll notice that the puts won't be called)

  • Looking at Heroku docs, I quote:

    When Heroku is going to shut down a dyno (for a restart or a new deploy, etc.), it first sends a SIGTERM signal to the processes in the dyno.

    After Heroku sends SIGTERM to your application, it will wait a few seconds and then send SIGKILL to force it to shut down, even if it has not finished cleaning up. In this example, the ensure block does not get called at all, the program simply exits

    ... which means that the ensure block will be called because it's a SIGTERM and not a SIGKILL, only except if the shutting down takes a looong time, which may due to (some reasons I could think of ATM):

    • Something inside your perform code (or any ruby code in the stack; even gems) that also rescued the SignalException, or even rescued the root Exception class because SignalException is a subclass of Exception) but takes a long time cleaning up (i.e. cleaning up connections to DB or something, or I/O stuff that hangs your application)

    • Or, your own ensure block above takes a looong time. I.E when doing the process.update_attributes(...), for some reason the DB temporary hangs / network delay or timeout, then that update might not succeed at all! and will ran out of time, of which from my quote above, after a few seconds after the SIGTERM, the application will be forced to be stopped by Heroku sending a SIGKILL.

... which all means that my solution is still not fully reliable, but should work under normal situations

Quiet vs Stop in Sidekiq

  1. Quiet means don't fetch new jobs from Redis anymore. Current jobs will continue to process indefinitely.
  2. Stop means quiet immediately + force any jobs still processing after the -t timeout (default: 8 seconds) to stop and push those unfinished jobs back to Redis.


Related Topics



Leave a reply



Submit