Is It a Good Idea to Pass a Huge String as an Argument to Sidekiq Worker

Passing Complex Hashes to Sidekiq Jobs

This sounds like a "potayto, potahto" solution. You are not not using the serialisation of Sidekiq, but instead serialize it yourself.

Let's have a look at why sidekiq has this rule:

Even if they did serialize correctly, what happens if your queue backs up and that quote object changes in the meantime? [...]
Don't pass symbols, named parameters, keyword arguments or complex Ruby objects (like Date or Time!) as those will not survive the dump/load round trip correctly.

I like to add a third:

Serializing state makes it impossible to distinguish between persisted and ethereal (in-memory, memoized, lazy-loaded etc) data. E.g. a def sent_mails; @sent_mails ||= Mail.for(user_id: id); end now gets serialized: do you want that?

The solution is also provided by sidekiq:

Don't save state to Sidekiq, save simple identifiers. Look up the objects once you actually need them in your perform method.

The XY problem here

Your real problem is not where or how to serialize state. Because sidekiq warns against serializing state regardless of where and how you do this.

The problem you need to solve is either how to store state somewhere where it can be stored properly. Or to avoid storing the state at all: not in redis/sidekiq, nor in the storage that is giving you problems.

Latency

Is your storage slow? Is it not a validation, a serialisation, some side-effect of storage that is slow?

Can you improve this by making it a two-step: insert the state and update/enrich/validate it async later? If you are using Rails, it won't help you here, or might even work against you, but a common model is to store objects in a special "queue" table or events queue; e.g. kafka is famous for this.

When e.g. storage happens over a slow network to a slow API, this is probably unsolvable, but when storage happens in a local database, there are decades of solutions to improve write performance here that you can use. Both inside your database, or with some specialised queue for state-storage (sidekiq is not such a specialised storage queue) depending on the tech used to store. E.g. Linux will allow you to store through memory, making writes to disk really quick, but removing the guarantee that it was really written to disk.

E.g. In a bookkeeping api, we would store the validated object in PostgreSQL and then have async jobs add expensive attributes to this later (e.g. state that had to be retrieved from legacy APIs or through complex calculations).

E.g. in a write-heavy GIS system, we would store objects into a "to_process_places" table, that was monitored by tooling which processes the Places. It all really depends on your domain, and requirements.

Not using state.

A common solution is not to make objects, but use the actual payload by the customer. Just send the HTTP payload (in rails, the params) along and leave it at that. Maybe merge in a header (like the Request Date) or filter out some data (header tokens or cookies).

If your controller can operate with this data, so can a delayed job. Instead of building objects in the controller, leave that to the delayed job. This can even result in really neat and lean controllers: all they do is (some authentication and authorization and then) call the proper job and pass it a sanitized params.

Obviously this requires trade-offs like not being able to validate in-sync, but to give such info over email, push-notification, or delayed response instead, depending on your requirements (e.g. a large CSV import could just email any validation issues, but a login request might need to get immediate response if the login is invalid).

It also requires some thought: you probably don't want to send the Base64 encoded CSV along to sidekiq, but instead write the file to a (temp) storage and pass the filename/url along instead. This might sound obvious, because it is: file uploads are essentially an implementation of the earlier mentioned "temporary state storage": you don't pass the entire PDF/high-res-header-image/CSV along to sidekiq, but store it somewhere so sidekiq can pick it up later to process it. Why should the other attributes not employ the same pattern if passing them along to sidekiq is problematic?

Is it possible to send Blocks or Procs as arguments to Sidekiq?

This is not possible. There is no serialization format for code in Ruby, only for data.

Rails test triggers Sidekiq warning

Note the part about Symbols.

https://github.com/mperham/sidekiq/wiki/Best-Practices#1-make-your-job-parameters-small-and-simple

The arguments you pass to perform_async must be composed of simple JSON datatypes: string, integer, float, boolean, null(nil), array and hash. This means you must not use ruby symbols as arguments.

Cannot pass params to sidekiq

In your worker:

def perform(params)
start_date = params[:call_log][:warrant_start_date]
end_date = params[:call_log][:warrant_end_date]
...etc
end

And then in your controller:

CallLogWorker.perform_async(params)

So you're parsing the hash params into the worker from the controller and then referring to it in your worker.

It's generally considered good practice to keep the data you pass into Sidekiq jobs as small as possible - see here for best practices. So you could go further and have:

In your worker:

def perform(start_date, end_date)
...job content
end

And in your controller:

CallLogWorker.perform_async(
params[:call_log][:warrant_start_date],
params[:call_log][:warrant_end_date]
)


Related Topics



Leave a reply



Submit