Real Time Scheduling in Linux

Practical use of Linux real time scheduling priorities (SCHED_FIFO and SCHED_RR)?

sched_setscheduler sets the scheduler of the process, not the thread. See:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/sched_setscheduler.html

If you want to set the scheduler for a thread, you need to use the pthread_attr_setschedpolicy and pthread_attr_setschedparam functions on the attribute object for the new thread before you create it.

I'm not sure how conformant Linux is on honoring these requirements, but you should at least start out by making sure your code is correct to the specification, then adjust it as needed...

Is it possible to monitor how a process is scheduled real-time with Linux?

I want to just see how processes are scheduled in cores. i.e, Process1 released at 0.30, then Process2 started at 0.70 (system timer values) etc.

This is called tracing, and usually done in the kernel after request from user. There are several kernel event tracers in Linux. Try

  • perf sched (man, lwn commit; also http://www.brendangregg.com/perf.html#SchedulerAnalysis) use perf sched record sleep 2 then perf sched script to get log
  • some other perf command for tracing, like perf record -e 'sched:sched_process_*' -a sleep 2 + perf script (from http://www.brendangregg.com/perf.html)
  • trace-cmd (man; based on ftrace - https://lwn.net/Articles/608497/ https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf): trace-cmd record -e sched ./program / trace-cmd record -e sched_switch ..., trace-cmd report (or install and use kernelshark GUI - https://lwn.net/Articles/425583/ http://static.lwn.net/images/2011/ks-success.png)
  • There are also special tracers like LTT/LTTng (wiki, website). It has GUI plugin for Eclipse (Trace Compass): https://wiki.eclipse.org/images/4/49/X-axis-alignment-full-histogram-axis-bottom.png with process graph and CPU graph over time
  • I think, sysdig may trace scheduler too...

Gregg has some info of Linux tracing (with "pony-corn mascot" magic): http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-tracer.html (there should be some presentation about tracing in https://www.slideshare.net/brendangregg)...

Is something like this possible by monitoring kernel folders such as '/proc'

There is no inotify for /proc (it has no real directories or real files inside: https://stackoverflow.com/a/24898733), so you can't monitor for changes, you can only reread some /proc (or some /sys) periodically.

Soft Real Time Linux Scheduling

(1) pthread_mutex_setprioceiling

(2) A newly created thread inherits the schedule and priority of its creating thread unless it's thread attributes (e.g. pthread_attr_setschedparam / pthread_attr_setschedpolicy) are directed to do otherwise when you call pthread_create.

(3) Since you don't know what causes it now it is in fairness hard for anyone say with assurance.

Which real-time priority is the highest priority in Linux

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
BUG_ON(p->se.on_rq);

p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
else
p->sched_class = &fair_sched_class;
set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().



Related Topics



Leave a reply



Submit