Rails: How to Restart Sidekiq

Rails: How to restart sidekiq?

So after you find you PID, you can use the below commands: the first will stop the workers from getting new jobs and will let existing jobs complete:

kill -USR1 [PID]

after that, you can kill the process using:

kill -TERM [PID]

Also, there is a page on sidekiq/wiki about this called Signals.

[edit]

Here is the signal page.

[edit]

Check video

For finding PIDs one can use:

ps aux | grep sidekiq

Restart Sidekiq If it fails

In the Sidekiq wiki, there is a section about deployment, including upstart and systemd scripts. You can use those to ensure the process restarts appropriately.

Restarting Sidekiq

Sidekiq comes with the command sidekiqctl, which can stop the PID associated with your Sidekiq process. You pass in the PID file and the # of seconds to wait for all threads to finish.

Sample Usage:

sidekiqctl stop #{rails_root}/tmp/pids/sidekiq_website_crawler.pid 60

Here, 60 represents the number of seconds to wait until all Sidekiq threads are done processing. If 60 seconds pass, and all aren't done, they are killed automatically.

I also recommend using the God gem to monitor, stop, start and restart Sidekiq.

Once you do that, you can use bundle exec god stop to stop all sidekiq threads.

Here is my God file, as an example:

rails_env = ENV['RAILS_ENV'] || "development"
rails_root = ENV['RAILS_ROOT'] || "/home/hwc218/BuzzSumo"
God.watch do |w|
w.dir = "#{rails_root}"
w.name = "website_crawler"
w.interval = 30.seconds
w.env = {"RAILS_ENV" => rails_env}
w.interval = 30.seconds
w.start = "bundle exec sidekiq -C #{rails_root}/config/sidekiq_website_crawler.yml"
w.stop = "sidekiqctl stop #{rails_root}/tmp/pids/sidekiq_website_crawler.pid 60"
w.keepalive

# determine the state on startup
w.transition(:init, { true => :up, false => :start }) do |on|
on.condition(:process_running) do |c|
c.running = true
end
end

# determine when process has finished starting
w.transition([:start, :restart], :up) do |on|
on.condition(:process_running) do |c|
c.running = true
c.interval = 5.seconds
end

# failsafe
on.condition(:tries) do |c|
c.times = 5
c.transition = :start
c.interval = 5.seconds
end
end

# start if process is not running
w.transition(:up, :start) do |on|
on.condition(:process_running) do |c|
c.running = false
end
end

w.restart_if do |restart|
restart.condition(:restart_file_touched) do |c|
c.interval = 5.seconds
c.restart_file = File.join(rails_root, 'tmp', 'restart.txt')
end
end
end

Sidekiq : how to restart sidekiq when deploying project to server?

Each mail should be a separate job.

Is there way to run code before Sidekiq is restarted in the middle of a job?

Update:

  • Thanks to @Aaron, and following our discussion (comments below), the ensure block (which is executed by the forked worker-threads) can only be ran for a few unguaranteed milliseconds before the main-thread forcefully terminates these worker-threads, in order for the main-thread to do some "cleanup" up the exception stack, in order to avoid getting SIGKILL-ed by Heroku. Therefore, make sure that your ensure code should be really fast!

TL;DR:

def perform(*args)
# your code here
ensure
process.update_attributes(is_running: false, last_execution_time: Time.now)
end
  • The ensure above is always called regardless if the method "succeeded" or an Exception is raised. I tested this: see this repl code, and click "Run"

  • In other words, this is always called even on a SignalException, even if the signal is SIGTERM (gracefully shutdown signal), but ONLY EXCEPT on SIGKILL (force unrescueable shutdown). You can verify this behaviour by checking my repl code, and then change Process.kill('TERM', Process.pid) to Process.kill('KILL', Process.pid), and then click "run" again (you'll notice that the puts won't be called)

  • Looking at Heroku docs, I quote:

    When Heroku is going to shut down a dyno (for a restart or a new deploy, etc.), it first sends a SIGTERM signal to the processes in the dyno.

    After Heroku sends SIGTERM to your application, it will wait a few seconds and then send SIGKILL to force it to shut down, even if it has not finished cleaning up. In this example, the ensure block does not get called at all, the program simply exits

    ... which means that the ensure block will be called because it's a SIGTERM and not a SIGKILL, only except if the shutting down takes a looong time, which may due to (some reasons I could think of ATM):

    • Something inside your perform code (or any ruby code in the stack; even gems) that also rescued the SignalException, or even rescued the root Exception class because SignalException is a subclass of Exception) but takes a long time cleaning up (i.e. cleaning up connections to DB or something, or I/O stuff that hangs your application)

    • Or, your own ensure block above takes a looong time. I.E when doing the process.update_attributes(...), for some reason the DB temporary hangs / network delay or timeout, then that update might not succeed at all! and will ran out of time, of which from my quote above, after a few seconds after the SIGTERM, the application will be forced to be stopped by Heroku sending a SIGKILL.

... which all means that my solution is still not fully reliable, but should work under normal situations

Sidekiq retries: How to refresh workers / code?

You can/must restart sidekiq to make code changes visible to it. Redis holding job information and query, not sidekiq.

P.S. In most cases all be ok, but sometimes



Related Topics



Leave a reply



Submit