When to Use Kernel Threads VS Workqueues in the Linux Kernel

When to use kernel threads vs workqueues in the linux kernel

As you said, it depends on the task at hand:

Work queues defer work into a kernel thread - your work will always run in process
context. They are schedulable and can therefore sleep.

Normally, there is no debate between work queues or sotftirqs/tasklets; if the deferred work needs to sleep, work queues are used, otherwise softirqs or tasklets are used. Tasklets are also more suitable for interrupt handling (they are given certain assurances such as: a tasklet is never ran later than on the next tick, it's always serialized with regard to itself, etc.).

Kernel timers are good when you know exactly when you want something to happen, and do not want to interrupt/block a process in the meantime. They run outside process context, and they are also asynchronous with regard to other code, so they're the source of race conditions if you're not careful.

Hope this helps.

When to use linux kernel add_timer vs queue_delayed_work

As I stated in my question, queue_delayed_work just uses add_timer internally. So the use is equally.

When should I use REQ_OP_FLUSH in a kernel blockdev driver? (Do REQ_OP_FLUSH bio's flush dirty RAID controller caches?)

Christoph Hellwig on the linux-block mailing list said:

Devices with power fail
protection will advertise that (using VWC flag in NVMe for example) and [the Linux kernel] will never send flushes.

Keith Busch at kernel.org:

You can check the queue attribute, /sys/block/<disk>/queue/write_cache. If the
value is "write through", then the device is reporting it doesn't have a
volatile cache. If it is "write back", then it has a volatile cache.

If this sounds backwards, then consider this using a RAID
controller cache as an example:

A RAID controller with a non-volatile "writeback" cache (from the
controller's perspective, ie, with battery) is a "write through"

device as far as the kernel is concerned because the controller will
return the write as complete as soon as it is in the persistent cache.
A RAID controller with a volatile "writeback" cache (from the
controller's perspective, ie without battery) is a "write back"

device as far as the kernel is concerned because the controller will
return the write as complete as soon as it is in the cache, but the
cache is not persistent! So in that case flush/FUA is necessary.

[ Reference: https://lore.kernel.org/all/273d3e7e-4145-cdaf-2f80-dc61823dd6ea@ewheeler.net/ ]

From personal experience, not all raid controllers will properly set queue/write_cache as indicated by Keith above. If you know your array has a non-volatile cache running in write-back mode then check make sure it is in "write through" so flushes will be dropped:

]# cat /sys/block/<disk>/queue/write_cache
<cache status>

and fix it if it isn't in the proper mode. These settings below might seem backdwards, but if they do, then re-read #1 and #2 above because these are correct:

If you have a non-volatile cache (ie, with BBU):

]# echo "write through" > /sys/block/<disk>/queue/write_cache

If you have a volatile cache (ie, without BBU):

]# echo "write back" > /sys/block/<disk>/queue/write_cache

So the answer to the question about when to flag REQ_OP_FLUSH in your kernel code is this: whenever you think your code should commit to disk. Since the block layer can re-order any bio request,

Send a WRITE IO, wait for its completion
Send a flush, wait for flush completion

and then you are guaranteed to have the IO from #1 on disk.

However, if the device being written has cache_mode in "write through" mode, then the flush will complete immediately and its up to your controller do do its job and keep the non-volatile cache active, even after a power loss (BBU, supercap, flashcache, etc).

When to Use Kernel Threads VS Workqueues in the Linux Kernel