Understanding Renice

Understanding renice

You probably have autogroups enabled.

In that case, in experiment 2 at the top level you have 2 control groups (one per session) competing for CPU, and inside each control group processes compete for CPU.

You can see the current control group and its niceness with:

cat /proc/$$/autogroup

And you can set the niceness with:

echo 19 > /proc/$$/autogroup

How are nice priorities and scheduler policies related to process (thread?) IDs in linux?

Thread IDs come from the same namespace as PIDs. This means that each thread is invididually addressable by its TID - some system calls do apply to the entire process (for example, kill) but others apply only to a single thread.

The scheduler system calls are generally in the latter class, because this allows you to give different threads within a process different scheduler attributes, which is often useful.

Change niceness of all processes by niceness

Something to start with in C:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
#include <stdlib.h>
#include <string.h>

static char *prstatname(char *buf, char **endptr)
{
    /* parse process name */
    char *ptr = buf;
    while (*ptr && *ptr != '(') ++ptr;
    ++ptr;
    if (!ptr) return 0;

    char *name = ptr;
    while (*ptr)
    {
        if (*ptr == ')' && *(ptr+1) && *(ptr+2) && *(ptr+3)
                && *(ptr+1) == ' ' && *(ptr+3) == ' ')
        {
            *ptr = 0;
            *endptr = ptr + 1;
            return name;
        }
        ++ptr;
    }
    return 0;
}

int main(void)
{
    DIR *proc = opendir("/proc");
    if (!proc) return 1;

    struct dirent *ent;

    while ((ent = readdir(proc)))
    {
        /* check whether filename is all numeric, then it's a process id */
        char *endptr;
        int pid = strtol(ent->d_name, &endptr, 10);
        if (*endptr) continue;

        /* combine to '/proc/{pid}/stat' to get information about process */
        char statname[64] = {0,};       
        strcat(statname, "/proc/");
        strncat(statname, ent->d_name, 52);
        strcat(statname, "/stat");

        FILE *pstat = fopen(statname, "r");
        if (!pstat) continue;

        /* try to read process info */
        char buf[1024];
        if (!fgets(buf, 1024, pstat))
        {
            fclose(pstat);
            continue;
        }
        fclose(pstat);

        char *name = prstatname(buf, &endptr);
        if (!name) continue;

        /* nice value is in the 17th field after process name */
        int i;
        char *tok = strtok(endptr, " ");
        for (i = 0; tok && i < 16; ++i) tok = strtok(0, " ");
        if (!tok || i < 16) continue;

        int nice = strtol(tok, &endptr, 10);
        if (*endptr) continue;

        printf("[%d] %s -- nice: %d\n", pid, name, nice);
    }
}

If you understand this program, you can easily modify it to do what you wanted.

Which real-time priority is the highest priority in Linux

I did an experiment to nail this down, as follows:

process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.
process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->se.on_rq);

        p->policy = policy;
        p->rt_priority = prio;
        p->normal_prio = normal_prio(p);
        /* we are holding p->pi_lock already */
        p->prio = rt_mutex_getprio(p);
        if (rt_prio(p->prio))
                p->sched_class = &rt_sched_class;
        else
                p->sched_class = &fair_sched_class;
        set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

With p->prio, a smaller value preempts a larger value.
With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().

Understanding Renice