Sidekiq: Ensure All Jobs on the Queue Are Unique

Sidekiq: Ensure all jobs on the queue are unique

My suggestion is to search for prior scheduled jobs based on some select criteria and delete, before scheduling a new one. This has been useful for me when i want a single scheduled job for a particular Object, and/or one of its methods.

Some example methods in this context:

 find_jobs_for_object_by_method(klass, method)

  jobs = Sidekiq::ScheduledSet.new

  jobs.select { |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        job_method == method
  }

end

##
# delete job(s) specific to a particular class,method,particular record
# will only remove djs on an object for that method
#
def self.delete_jobs_for_object_by_method(klass, method, id)

  jobs = Sidekiq::ScheduledSet.new
  jobs.select do |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        job_method == method  &&
        args[0] == id
  end.map(&:delete)

end

##
# delete job(s) specific to a particular class and particular record
# will remove any djs on that Object
#
def self.delete_jobs_for_object(klass, id)

  jobs = Sidekiq::ScheduledSet.new
  jobs.select do |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        args[0] == id
  end.map(&:delete)

end

Sidekiq strange behaviour of unique jobs

Try running it without the sidekiq-unique-jobs gem. It's only been protecting you against dupes for 30 minutes anyway. That gem sets its hashkeys in Redis to auto-expire after 30 minutes (configurable). sidekiq itself sets its jobs to auto-expire in Redis after 24 hours.

I obviously don't see your app, but I'll bet you want to not process the same file very often at all. I would control this at the application layer instead and track my own hashkey doing something similar to what the unique-jobs gem is doing:

hash = Digest::MD5.hexdigest(Sidekiq.dump_json(md5_arguments))

It's also possible that the sidekiq-unique-jobs middleware is also getting in the way of sidekiq knowing if a job properly completed or not. I'll bet that there aren't a lot of folks testing this with long-running jobs in your same configuration.

If you continue to see this behavior without the additional middleware, give resque a try. I've never seen this kind of behavior with that gem, and failed jobs have a helpful retry option in the admin GUI.

The main benefit of sidekiq is that it is multi-threaded. Even so, a concurrency of 25 with large video processes might be pushing it a bit. In my experience, forking is more stable and portable, with less worries about your application's thread-safety (YMMV).

Whatever you do, make sure that you are aware of the auto-expiry TTL settings that these systems put on their data in Redis. The size and nature of your jobs means that jobs could easily back up for 24 hours. These automatic deletions happen at the database layer. There are no callbacks to the application layer to warn if a job has been deleted automatically. In the sidekiq code, for example, they introduced auto-expire behavior to "to avoid any possible leaking." ( reference ) This isn't very encouraging if you really need these jobs to execute.

Avoiding duplicate jobs when using Sidekiq's `unique_for` and `Sidekiq::Limiter.concurrent` in the same worker

One idea:

When the user wants to change Author A, I would enqueue a scheduled, unique UpdateAuthorJob for Author A which updates their info 10 minutes from now. That way, the user can make lots of changes to the author and the system will wait for that 10 minute cooldown period before performing the actual update work, ensuring that you get all the updates as one group.

Sidekiq: Ensure All Jobs on the Queue Are Unique