Differenceamong Three Priorities Used in Linux Kernel

Which real-time priority is the highest priority in Linux

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
BUG_ON(p->se.on_rq);

p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
else
p->sched_class = &fair_sched_class;
set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().

Inside the Linux kernel, does the `nice` priority level of a task get modified anywhere else beside these cases?

I discovered that the default settings of a process are initiated in /source/init/init_task.c

No. That's the definition of the init task itself, it's not used to init[ialize] new tasks. The init task is the first userspace task that is ran by the kernel upon startup, and its task_struct is hardcoded for performance and ease of use. Any other task is created through fork(2)/clone(2).

My reasoning is that if I were to modify these default settings in init_task.c, it should cover all cases

It doesn't cover anything really, just the initial priority of init, which can very well overwrite it itself since it runs as root.


The priority of a task can be changed in different ways, but the kernel should not "automatically" change priority of tasks, at least to my knowledge.

Therefore, if you want to "control" the priority in such a way, your best bet would be to change the code of fork (specifically clone_process()), since forking is the only way to create new processes in Linux.

Other than that, there are other syscalls that can end up modifying the priority of a process. Taking a quick glance at the kernel code, at least sched_setscheduler (source), sched_setparam (source), and setpriority (source).

Niceness and priority processes on Linux system

The priority of a process in linux is dynamic: The longer it runs, the lower its priority will be. A process runs when its actually using the CPU - most processes on a typical Linux box just wait for I/O and thus do not count as running.

The priority is taken into account when there are more processes running than CPU cores available: Highest priority wins. But as the winning process looses its proirity over time, other processes will take over the CPU at some point.

nice and renice will add/remove some "points" from priority. A process which has a higher nice value will get lesser CPU time. Root can also set a negative nice value - the process gets more CPU time.

Example: There are two processes (1 and 2) calculating the halting problem and one CPU core in the system. Default is nice 0, so both processes get about half of the CPU time each. Now lets renice process 1 to value 10. Result: Process 2 gets a significant higher amount of cpu time as process 1.

Note: In modern desktops there is plenty CPU time available - they are fast these days. Unfortunately HDDs are still relativeley slow on random I/O, so even a nice process can generate enough I/O traffic to significantly slow down a system.



Related Topics



Leave a reply



Submit