Sched_Fifo Process with Priority of 99 Gets Preempted

Do two SCHED_FIFO tasks with equal priority get processing time within each period in Linux?


Linux documentation says SCHED_FIFO processes can get preempted only by processes with higher priority

This is correct, in addition to this, they can also be preempted if you set RLIMIT_RTTIME (getrlimit(2)) and that limit is reached.

The only other reasons why another SCHED_FIFO process (with the same priority) can be scheduled is if the first sleeps or if it voluntary yields (voluntary preemption).

CFS has nothing to do with SCHED_FIFO, it only takes care of SCHED_NORMAL, SCHED_BATCH and SCHED_IDLE.

Linux SCHED_FIFO not respecting thread priorities

When you start the two threads

// Declare and init thread objects
std::thread thread_1(std::move(task_1), 1, 50000000);
std::thread thread_2(std::move(task_2), 2, 50000000);

they may (!) immediately run and fetch the schedule parameters

// Get information about thread
pthread_getschedparam(pthread_self(), &policy, &sch);

even before you set them with pthread_setschedparam() to another value.
The output might even show 0 and 0, if both threads are scheduled accordingly.


The child threads may (!) both be scheduled after the main thread has set the priority. Then you would get the expected output. But any result is possible.


When you move pthread_getschedparam() to the end of the thread just before the output, you are more likely to get the expected output of 97 and 98. But even then both threads may run until the end, even before the main thread is scheduled to set the priority.

Linux not respecting SCHED_FIFO priority ? ( normal or GDB execution )

There are a few things obviously wrong with your MCVE:

  1. You have a data race on b, i.e. undefined behavior, so anything can happen.

  2. You are expecting that the divisor thread will have finished pthread_setschedparam call before the ratio thread gets to computing the ratio.

    But there is absolutely no guarantee that the first thread will not run to completion long before the second thread is even created.

    Indeed that is what's likely happening under GDB: it must trap thread creation and destruction events in order to keep track of all the threads, and so thread creation under GDB is significantly slower than outside of it.

To fix the second problem, add a counting semaphore, and have both threads randevu after each executed the pthread_setschedparam call.

How does SCHED_FIFO and SCHED_RR interfer with each other?

Conceptually, there is a list of runnable processes associated with each static priority level. These lists can contain both SCHED_FIFO and SCHED_RR processes - the two scheduling policies share the same set of static priorities.

When selecting a process to run, the scheduler takes the process at the head of the non-empty list with the highest static priority, regardless of the scheduling policy of that process.

The scheduling policies affect how the processes move within those lists. For SCHED_FIFO, once a process reaches the head of the list for a given priority it will stay there until it blocks or yields. For SCHED_RR, a runnable process that has exceeded its maximum time quantum will be moved to the end of the list for its static priority.

Is it possible to hang a Linux box with a SCHED_FIFO process?

There's another protection I didn't know about.

If you have just one processor and want a SCHED_FIFO process like this (one that never blocks nor yields the processor voluntarily) to monopolize it, besides giving it a high priority (not really necessary in most cases, but doesn't hurt) you have to:

  1. Set sched_rt_runtime_us to -1 or to the value in sched_rt_period_us
  2. If you have group scheduling configured, set /cgroup/cpu.rt_runtime_us to -1 (in case
    you mount the cgroup filesystem on /cgroup)

Apparently, I had group scheduling configured and wasn't bypassing that last protection.

If you have N processors, and want your N processes to monopolize the processor, you just do the same but launch all of them from your shell (the shell shouldn't get stuck until you launch the last one, since it will have processors to run on). If you want to be really sure each process will go to a different processor, set its CPU affinity accordingly.

Thanks to everyone for the replies.

Which real-time priority is the highest priority in Linux

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
BUG_ON(p->se.on_rq);

p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
else
p->sched_class = &fair_sched_class;
set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().



Related Topics



Leave a reply



Submit