How to Sleep in the Linux Kernel Space

How to sleep in the Linux kernel?

I needed to include <linux/delay.h> to use msleep in kernel space.

How can I pause for 100+ milliseconds in a linux driver module?

#include <linux/delay.h>

...
msleep(100);
...

wakeup a kernel thread that is in sleep using msleep

One way is to use msleep_interruptible instead of msleep in thread A. Dunno for sure, but you may have to have thread B do the wakeup by sending a signal instead of wake_up_process. Try both and see.

You may need some interlock (e.g. spinlock) to prevent a race condition where thread B sends the signal after thread A has already woken up [and has already seen the blink type change].

Or, thread A could remember the old type and just continue with the remaining sleep if there was no change (this compensates for a race without the need to lock)

Here's the kernel code for each:

/**
 * msleep - sleep safely even with waitqueue interruptions
 * @msecs: Time in milliseconds to sleep for
 */
void msleep(unsigned int msecs)
{
    unsigned long timeout = msecs_to_jiffies(msecs) + 1;

    while (timeout)
        timeout = schedule_timeout_uninterruptible(timeout);
}

EXPORT_SYMBOL(msleep);

/**
 * msleep_interruptible - sleep waiting for signals
 * @msecs: Time in milliseconds to sleep for
 */
unsigned long msleep_interruptible(unsigned int msecs)
{
    unsigned long timeout = msecs_to_jiffies(msecs) + 1;

    while (timeout && !signal_pending(current))
        timeout = schedule_timeout_interruptible(timeout);
    return jiffies_to_msecs(timeout);
}

EXPORT_SYMBOL(msleep_interruptible);

UPDATE:

Thanks for pointing to the possible race condition as well. I will try with semaphores/mutexes and see

It's probably not necessary if thread B does the monitoring (e.g. reading /dev/whatever or /proc/whatever) and writes a global [possibly inside the "private" data struct]. You might have to use an atomic fetch/store or CAS, but marking it volatile might be sufficient. This is because B is the [only] writer and A is the [only] reader.

As far as sending signals in kernel space, can we do that? i thought signals are only for userspace. Could you give me an example if that is not the case.

Since msleep_interruptible exists at all, and it looks for pending signals, that's QED right there.

But, here's some code that proves it. The comment in allow_signal has been around since a patch from 2003:

/*
 * Let kernel threads use this to say that they allow a certain signal.
 * Must not be used if kthread was cloned with CLONE_SIGHAND.
 */
int allow_signal(int sig)
{
    if (!valid_signal(sig) || sig < 1)
        return -EINVAL;

    spin_lock_irq(¤t->sighand->siglock);
    /* This is only needed for daemonize()'ed kthreads */
    sigdelset(¤t->blocked, sig);
    /*
     * Kernel threads handle their own signals. Let the signal code
     * know it'll be handled, so that they don't get converted to
     * SIGKILL or just silently dropped.
     */
    current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2;
    recalc_sigpending();
    spin_unlock_irq(¤t->sighand->siglock);
    return 0;
}

EXPORT_SYMBOL(allow_signal);

int disallow_signal(int sig)
{
    if (!valid_signal(sig) || sig < 1)
        return -EINVAL;

    spin_lock_irq(¤t->sighand->siglock);
    current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN;
    recalc_sigpending();
    spin_unlock_irq(¤t->sighand->siglock);
    return 0;
}

EXPORT_SYMBOL(disallow_signal);

Just use the internal call. That's probably do_send_sig_info. Use something innocuous like SIGUSR1 (or you may need to use an "RT" signal because they are queued).

Sending a signal, in the kernel, means that the signal is OR'ed into the task's pending signal mask, and the task is marked runnable (e.g. gets woken up).

The userspace "jump to handler" occurs only when a given task is about to reenter [the last thing before reentering] userspace. If a signal is set, the kernel sets up the userspace stack frame and switches execution to the handler.

In a kernel thread, it does not cause a jump to a "signal handler" as there isn't an equivalent. The kernel thread must look at its pending signal mask to notice it at all. There is no jump. It is not like an interrupt. And, the kernel thread must manually clear the pending mask [otherwise, msleep_interruptible will thereafter always return immediately].

To detect and clear a signal, here's some code:

while (signal_pending(current)) {
    siginfo_t info;
    unsigned long signo;

    signo = dequeue_signal_lock(current, ¤t->blocked, &info));

    switch (signo) {
    case SIGUSR1:
        break;
    }
}

UPDATE #2:

The msleep_interruptible was not wakeable using wake_up_process, but only with send_sig_info. I think msleep_interruptible is nothing but : setstate(TASK_INTERRUPTIBLE) and schedule() with delay, so i had expected wake_up_process to wakeup the thread sleeping with msleep_interruptible.

[AFAIK] You have to send a signal to terminate the sleep early, hence my original comment about using a signal (and using do_send_sig_info). Sometimes it's just trial and error. I might have tried wake_up_process, too. But, when that didn't work, I'd start looking around [by looking at the msleep* code].

But neverthless it would be interesting to understand if it is possible to implement that in a single thread.

Yes. It requires a small bit of restructuring and an extra variable or two.

My blink thread should set LED on for a second

Let's call this blink_interval_on

and depending on the mode it is in, has to set LED off after the 1 second for about 10 seconds or 1 second.

Let's call this blink_interval_off

If we implement it in a single thread, polling will be delayed by those 10 seconds right?

The key change [conceptually] is that the value given to msleep needn't be the entire interval (e.g. blink_interval_*) but can be a smaller fixed interval. Let's call this sleep_fixed.

We want to choose this to provide the responsiveness that you'd like. If it's very small (e.g. one microsecond), we'd be waking up too often. If we choose a large value like 1 second or 10 seconds, the response becomes "sluggish".

So, a good value would be 10-100 milliseconds. Slow enough that we're not hogging resources by frequent wakeups, but fast enough that a user won't notice the difference if the blink mode changes.

Now, we have to keep track of how much time is remaining for a given blink interval [bumping down by sleep_fixed], and flip the LED state when the interval becomes exhausted. That is, we manually keep track of what the full msleep used to do for us.

Can you give me an example for that.

Okay, here's a crude/pseudo-code version to [hopefully] explain what I mean:

// NOTE: all times are in milliseconds

int blink_interval_on;                  // LED "on" interval
int blink_interval_off;                 // LED "off" interval

int blink_interval_remaining;           // time remaining in current interval
int curmode;                            // current blink mode

int blink_on_list[2] = { 1000, 1000 };
int blink_off_list[2] = { 1000, 10000 };

int sleep_fixed;                        // small sleep value

// set_led_on -- set LED on or off
void
set_led_on(int onflg)
{
}

// msleep -- sleep for specified number of milliseconds
void
msleep(int ms)
{
}

// getnewmode -- get desired blink mode
// RETURNS: 0=1 second, 1=10 seconds, etc (-1=stop)
int
getnewmode(void)
{
    int newmode;

    // do whatever is necessary, as you're doing now ...
    newmode = 0;

    return newmode;
}

// ledloop -- main thread for single thread case
void
ledloop(void)
{
    int newmode;
    int sleep_fixed;
    int curstate;
    int sleep_current;

    // force an initial state change
    curmode = -2;

    // this is our "good enough" interval
    sleep_fixed = 10;

    while (1) {
        // look for blink mode changes
        newmode = getnewmode();
        if (newmode < 0)
            break;

        // got a mode change
        // force a change at the bottom
        if (curmode != newmode) {
            curstate = 0;
            curmode = newmode;
            blink_interval_remaining = 0;
        }

        // set up current sleep interval
        sleep_current = sleep_fixed;
        if (sleep_current > blink_interval_remaining)
            sleep_current = blink_interval_remaining;

        // do a sleep
        // NOTE: because the sleep is so short, we can use the simple msleep
        // this is the "good enough" way ...
        if (sleep_current > 0) {
            msleep(sleep_current);
            blink_interval_remaining -= sleep_current;
        }

        // flip the LED state at interval end
        if (blink_interval_remaining <= 0) {
            curstate = ! curstate;
            set_led_on(curstate);

            // set new interval
            if (curstate)
                blink_interval_remaining = blink_on_list[curmode];
            else
                blink_interval_remaining = blink_off_list[curmode];
        }
    }
}

Effects of calling sleep functions like set_current_state()/wait_event() in a device driver?

Kernel code executes in context of "kernel side" of userspace thread that issued the system call. Each userspace thread has such kernel-space counterpart. From scheduler point of view, user- and kernel- parts of the thread are the same entity, thus while kernel thread is scheduled out, user thread is as well.
"Interruptible" and "Non-interruptible" waits differ in way signals are handled. When process is in TASK_INTERRUPTIBLE state and gets a signal, system ensures it wakes from schedule() as soon as possible. After that, process itself must leave wait loop if signal_pending() returns true. In example code in question text, this is not implemented properly unless 'condition' is an expression containing signal_pending() check.
set_current_state() is a setter call for current process state, it just sets a flag and does not do anything else. It should only be used together with proper call to schedule().
wait_event() is a utility that implements all tech details of waiting. Normally, drivers use wait_event*() flavors, direct use of set_current_state() and schedule() are needed only in special cases.

How to Sleep in the Linux Kernel Space