how to shield a cpu from the linux scheduler (prevent it scheduling threads onto that cpu)?
The answer is to use cpusets. The python cpuset utility makes it easy to configure them.
Basic concepts
3 cpusets
root
: present in all configurations and contains all cpus (unshielded)system
: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)user
: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)
The shield
command manages these 3 cpusets.
During setup it moves all movable tasks into the unshielded cpuset (system
) and during teardown it moves all movable tasks into the root
cpuset.
After setup, the subcommand lets you move tasks into the shield (user
) cpuset, and additionally, to move special tasks (kernel threads) from root
to system
(and therefore out of the user
cpuset).
Commands:
First we create a shield. Naturally the layout of the shield will be machine/task dependent. For example, say we have a 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters, and we leave the kernel threads running in the root
cpuset (ie: across all cpus)
$ cset shield --cpu 1-3
Some kernel threads (those which aren't bound to specific cpus) can be moved into the system
cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)
$ cset shield --kthread on
Now let's list what's running in the shield (user
) or unshielded (system
) cpusets: (-v
for verbose, which will list the process names) (add a 2nd -v
to display more than 80 characters)
$ cset shield --shield -v
$ cset shield --unshield -v -v
If we want to stop the shield (teardown)
$ cset shield --reset
Now let's execute a process in the shield (commands following '--'
are passed to the command to be executed, not to cset
)
$ cset shield --exec mycommand -- -arg1 -arg2
If we already have a running process which we want to move into the shield (note we can move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))
$ cset shield --shield --pid 1234
$ cset shield --shield --pid 1234,1236
$ cset shield --shield --pid 1234,1237,1238-1240
Advanced concepts
cset set/proc
- these give you finer control of cpusets
Set
Create, adjust, rename, move and destroy cpusets
Commands
Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"
$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1
Change "my_cpuset1" to only use cpus 1 and 3
$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1
Destroy a cpuset
$ cset set --destroy --set=my_cpuset1
Rename an existing cpuset
$ cset set --set=my_cpuset1 --newname=your_cpuset1
Create a hierarchical cpuset
$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1
List existing cpusets (depth of level 1)
$ cset set --list
List existing cpuset and its children
$ cset set --list --set=my_cpuset1
List all existing cpusets
$ cset set --list --recurse
Proc
Manage threads and processes
Commands
List tasks running in a cpuset
$ cset proc --list --set=my_cpuset1 --verbose
Execute a task in a cpuset
$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2
Moving a task
$ cset proc --toset=my_cpuset1 --move --pid 1234
$ cset proc --toset=my_cpuset1 --move --pid 1234,1236
$ cset proc --toset=my_cpuset1 --move --pid 1238-1340
Moving a task and all its siblings
$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads
Move all tasks from one cpuset to another
$ cset proc --move --fromset=my_cpuset1 --toset=system
Move unpinned kernel threads into a cpuset
$ cset proc --kthread --fromset=root --toset=system
Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)
$ cset proc --kthread --fromset=root --toset=system --force
Hierarchy example
We can use hierarchical cpusets to create prioritised groupings
- Create a
system
cpuset with 1 cpu (0) - Create a
prio_low
cpuset with 1 cpu (1) - Create a
prio_met
cpuset with 2 cpus (1-2) - Create a
prio_high
cpuset with 3 cpus (1-3) - Create a
prio_all
cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)
To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc
$ cset set --cpu=0 --set=system
$ cset set --cpu=0-3 --set=prio_all
$ cset set --cpu=1-3 --set=/prio_all/prio_high
$ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med
$ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low
prevent linux thread from being interrupted by scheduler
How do you tell the thread scheduler in linux to not interrupt your thread for any reason?
Can't really be done, you need a real time system for that. The closes thing you'll get with linux is to
set the scheduling policy to a realtime scheduler, e.g. SCHED_FIFO, and also set the PTHREAD_EXPLICIT_SCHED attribute. See e.g. here , even now though, e.g. irq handlers and other other stuff will interrupt your thread and run.
However, if you only care about the threads in your own process not being able to do anything, then yes, having them block on a mutex your running thread holds is sufficient.
The hard part is to coordinate all the other threads to grab that mutex whenever your thread needs to do its thing.
insuring CPU time after waking up - linux scheduler
You said you are observing delays while writing. I think in this situation you can useschedule_timeout
function. Device drivers use this technique while writing to register so that they dont lockup the system. Recently, I have come across a problem where writing to register is causing delays; I am planning to do schedule_timeout
in my case too.
Setting priority, scheduling mode will not help here.
linux c: what's the common use case of sched_setaffinity function? I don't find it useful
It's very helpful in some computationally intensive real time processes related to DSP(Digital Signal Processing).
Let's say One real time DSP related process PROCESS0 is running on core CPU0. Because of some scheduling algorithms CPU0 pre-emption need to happen such that process0 has to run on another CPU. This switching of realtime process is a overhead. Hence affinity. We direct to kernel that the process0 should run on CPU0.
pthread.h - Is voluntary CPU yield the only trigger to scheduling another user-level thread?
No, any call that also blocks the current thread will schedule another tread. This includes library calls such as sleep(), read(), select(), pthread_mutex_lock() and many others.
Note that pthread is not a pure user level thread implementation on linux, it maps 1 user mode thread to 1 kernel task.
Whether the cpu scheduling is based on processes or threads in linux?
Partially based on quantum which is an amount of basic unit of time the thread will execute for. Also I believe there is a priority level so multiple threads are competing for time on the cpu. They wait in line with other threads of same priority level and then run till they are out of quantum. Then they are sent to the back. It's not exact answer but a high level summary.
Also I'm more familar with windows but I think it is same in principles. The process is not an executable code but a unit of storage. So it would be by thread. Linux I read has a more complicated scheduling algorithm than windows (more overhead possibly as a trade off), but it is completely possible I speculate that threads of same process compete for cpu time. The difference is there is no necessary context switch cause the thread sharing process use same address space.
This would explain the diminished returns when using more threads than the physical number of cores(threads on intel). The threads of a process have a small chance of ever running at the same time. Instead they compete. So if you have 4000 threads it means the time any single one of them is running is reduced by 1/4000. However if you were to use the 4000 threads to operate on a single synchronous problem, using a shared storage to load current state you could then get performance gain by having a larger percentage of cpu time as the probability of any of the 4000 threads running is higher.
Related Topics
Check Free Disk Space for Current Partition in Bash
Reading Data from PDF Files into R
Append Line to /Etc/Hosts File with Shell Script
Linux, Why Can't I Write Even Though I Have Group Permissions
How to Set Up Autocompletion for Git Commands
Rtmp: Is There Such a Linux Command Line Tool
/Lib64/Ld-Linux-X86-64.So.2: No Such File or Directory Error
Maximum Resident Set Size Does Not Make Sense
How to Test for If Two Files Exist
Testing Out of Disk Space in Linux
Excluding Directory When Creating a .Tar.Gz File
Installing Node.Js on Debian 6.0
How to Resume Interrupted Download Automatically in Curl
How to Check Syslog in Bash on Linux
Can't Install Python-Dev on Centos 6.5