How to Monitor Delayed_Job With Monit

How to monitor delayed_job with monit

Here is how I got this working.

  1. Use the collectiveidea fork of delayed_job besides being actively maintained, this version has a nice script/delayed_job daemon you can use with monit. Railscasts has a good episode about this version of delayed_job (ASCIICasts version). This script also has some other nice features, like the ability to run multiple workers. I don't cover that here.
  2. Install monit. I installed from source because Ubuntu's version is so ridiculously out of date. I followed these instructions to get the standard init.d scripts that come with the Ubuntu packages. I also needed to configure with ./configure --sysconfdir=/etc/monit so the standard Ubuntu configuration dir was picked up.
  3. Write a monit script. Here's what I came up with:

    check process delayed_job with pidfile /var/www/app/shared/pids/delayed_job.pid

    start program = "/var/www/app/current/script/delayed_job -e production start"

    stop program = "/var/www/app/current/script/delayed_job -e production stop"

    I store this in my soucre control system and point monit at it with include /var/www/app/current/config/monit in the /etc/monit/monitrc file.

  4. Configure monit. These instructions are laden with ads but otherwise OK.
  5. Write a task for capistrano to stop and start. monit start delayed_job and monit stop delayed_job is what you want to run. I also reload monit when deploying to pick up any config file changes.

Problems I ran into:

  1. daemons gem must be installed for script/delayed_job to run.
  2. You must pass the Rails environment to script/delayed_job with -e production (for example). This is documented in the README file but not in the script's help output.
  3. I use Ruby Enterprise Edition, so I needed to get monit to start with that copy of Ruby. Because of the way sudo handles the PATH in Ubuntu, I ended up symlinking /usr/bin/ruby and /usr/bin/gem to the REE versions.

When debugging monit, I found it helps to stop the init.d version and run it from the th command line, so you can get error messages. Otherwise it is very difficult to figure out why things are going wrong.

sudo /etc/init.d/monit stop
sudo monit start delayed_job

Hopefully this helps the next person who wants to monitor delayed_job with monit.

monitoring multiple delayed job workers with monit

You can just replicate the same config you have for the first worker N times.
Suppose you have 5 workers, you'll monitor all of them with the following:

check process delayed_job.0
with pidfile /path/to/shared/pids/delayed_job.0.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job -n 5 start' - user"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job stop' - user"

check process delayed_job.1
with pidfile /path/to/shared/pids/delayed_job.1.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job -n 5 start' - user"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job stop' - user"

check process delayed_job.2
with pidfile /path/to/shared/pids/delayed_job.2.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job -n 5 start' - user"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job stop' - user"

check process delayed_job.3
with pidfile /path/to/shared/pids/delayed_job.3.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job -n 5 start' - user"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job stop' - user"

check process delayed_job.4
with pidfile /path/to/shared/pids/delayed_job.4.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job -n 5 start' - user"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /path/to/current/script/delayed_job stop' - user"

delayed job and monit

I guess that that log directory should be in /var/www/myapp/current/ and DJ is trying to mkdir before writing to it. And most probably, the user 'deploy' do not have the permission to write to /var/www/myapp/current.

rvm monit delayed_job

If you already have properly working monit with other services and just need to add delayed_job daemon for rvm environment you can try to use this conf file (it works for me)

/etc/monit/conf.d/delayed_job.conf (i have ubuntu server)

check process delayed_job with pidfile /{project_folder}/tmp/pids/delayed_job.pid
start program = "RAILS_ENV=production rvm -S /{project_folder}/script/delayed_job start"
stop program = "RAILS_ENV=production rvm -S /{project_folder}/script/delayed_job stop"

Here rvm -S command let script run under current rvm ruby environment

You can try to start daemon with

$ RAILS_ENV=production rvm -S /{project_folder}/script/delayed_job start

command and if daemon will start (check it with changing last word to 'status') than you have good chances with delayed_job.conf file

And do not forget to check if pid file had created in tmp/pids/ folder too

How to properly run monit with different workers

The short form answer is no.

There is no scaling function coming with monit. Every service you want watch has to be configured independently. This also makes a lot of sense if you look at the additional possibilities with monit, like specific load, io, net, etc. to any of the tasks.

delayed_job monitored by God - duplicate processes after restart

I think this must be an issue in the daemons gem that delayed_job job uses for working in the background, because adding this at the top of my God file seems to have fixed things:

ids = ('a'..'z').to_a
workers.times do |num|
num = ids[num]

It seems like there was an issue where the processes named delayed_job.1 and delayed_job.11 (etc) would clash which would cause lots of problems. I haven't really isolated it down too far, but changing it to a different naming convention (delayed_job.a in this case) has fixed things for me now.

Will leave this open in case someone has a better solution/a reason for why this worked.



Related Topics



Leave a reply



Submit