What Scheduling Algorithms Does Linux Kernel Use

Scheduling algorithms in linux kernel

In terms of real-time schedulers, linux provides FIFO and Round-Robin (POSIX SCHED_FIFO and SCHED_RR). There is also a time-sharing scheduler called the Completely Fair Scheduler (CFS), which is complex enough that you might call it multiple schedulers (i.e., multi-core, and several styles/flavors of preemption for different loads). CFS is described in some detail in the kernel documentation (sched-design-CFS) but a careful read of the relevant sources is recommended. CFS is linux's implementation of POSIX's SCHED_OTHER policy.

Additionally, there are patches that add other scheduling policies like a new real-time Earliest Deadline First (EDF) scheduler.

Apart from a number of books on the topic, many of which are immediately out of date at publication time, you can refer to the kernel documentation here or in your kernel tree under docs/scheduler. It's also useful to peruse the LKML archives for discussion on the topic.

If you are just getting started or want to play with a wider range of real-time schedulers, you might consider UNC's LITMUS-RT framework.

Scheduling policies in Linux Kernel

Yes, Linux supports no less then 4 different scheduling methods for tasks: SCHED_BATCH, SCHED_FAIR, SCHED_FIFO and SCHED_RR.

Regardless of scheduling method, all tasks also have a fixed hard priority (which is 0 for batch and fair and from 1- 99 for the RT schedulign methods of FIFO and RR). Tasks are first and foremost picked by priority - the highest priority wins.

However, with several tasks available for running with the same priority, that is where the scheduling method kicks in: A fair task will only run for its allotted weighted (with the weight coming from a soft priority called the task nice level) share of the CPU time with regard to other fair tasks, a FIFO task will run for a fixed time slice before yielding to another task (of the same priority - higher priority tasks always wins) and RR tasks will run till it blocks disregarding other tasks with the same priority.

Please note what I wrote above is accurate but not complete, because it does not take into account advance CPU reservation features, but it give the details about different scheduling method interact with each other.

Operating System Scheduling Algorithms

Why wouldn't you use SCHED_RR? You said it yourself: low CPU usage. You could even nice the process when you expect to do some heavy I/O so you're scheduled less often than other processes.

In general, though, why not let the OS do what it's best at, and just worry about writing efficient code? The OS will know you're doing a blocking I/O call and will put your thread/task in a waitqueue and select another task to run. You don't need to worry about those details.

Understading the Linux Kernel Scheduler

Check out c operator precedence http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence

The -> operator has a higher precedence than the prefix ++, so this particular condition could be written:

if (--(p->rt.time_slice))

In other words, it is the timeslice which is being decremented, not the pointer.

The queued parameter may appear useless here, but it has a reason to be there. Specifically notice where task_tick_rt() is called from. Its only reference is when it is assigned to the .task_tick function pointer in the rt_sched_class instance of struct sched_class:
http://lxr.free-electrons.com/source/kernel/sched/rt.c#L1991

So we see that each scheduling algorithm has its own struct sched_class function vector which the kernel will call into for scheduling services. If we look at other algorithms, we see the CFS (Completely Fair Scheduling) algorithm also has its own instance of struct sched_class, named fair_sched_class:
http://lxr.free-electrons.com/source/kernel/sched/fair.c#L6179

The .task_tick member in the CFS case points to task_tick_fair():
http://lxr.free-electrons.com/source/kernel/sched/fair.c#L5785

Note task_tick_fair() does make use of the queued parameter. So when the .task_tick member is called into (here and here), a 0 or a 1 is passed in for the queued parameter. So while task_tick_rt() doesn't use it, the queued parameter must still be their so the function pointer types in the struct sched_class function vector all match up.

In short, the struct sched_class function vector specifies the interface between scheduling algorithms and the rest of the kernel. The queued parameter is there should a given algorithm choose to use it, but in the case of Round Robin, it is simply ignored.

Is the scheduler built into the kernel a program or a process?

You have 2 similar questions (The opinion that the scheduler built into the kernel is the program and the opinion that it is the process and I want to know how to implement the cpu scheduling process in Linux operating system) so I'll answer for both of these here.

The answer is that it doesn't work that way at all. The scheduler is not called by user mode processes by using system calls. The scheduler isn't a system call. There are timers that are programmed to throw interrupts after some time has elapsed. Timers are accessed using registers that are memory in RAM often called memory mapped IO (MMIO). You write to some position in RAM specified by the ACPI tables (https://wiki.osdev.org/ACPI) and it will allow to control the chips in the CPU or external PCI devices (PCI is everything nowadays).

When the timer reaches 0, it will trigger an interrupt. Interrupts are thrown by hardware (the CPU). The CPU thus includes special mechanism to let the OS determine the position at which it will jump on interrupt (https://wiki.osdev.org/Interrupt_Descriptor_Table). Interrupts are used by the CPU to notify the OS that an event happened. Without interrupts, the OS would have to reserve at least one core of the processor for a special kernel process that would constantly poll the registers of peripherals and other things. It would be impossible to implement. Also, if user mode processes did the scheduler system call by themselves, the kernel would be slave to user mode because it wouldn't be able to tell if a process is finished and processes could be selfish over CPU time.

I didn't look at the source code but I think the scheduler is also often called on some IO completion (also on interrupt but not always on timer interrupt). I am quite sure that the scheduler must not be preempted. That is interrupts (and other things) will be disabled while the schedule() function runs.

I don't think you can call the scheduler a process (not even a kernel thread). The scheduler can be called by kernel threads that are created by interrupts due to bottom half processing. In bottom half processing, the top "half" of the interrupt handler runs fast and efficiently while the bottom "half" is added to the queue of processes and runs when the scheduler decides it should be scheduled. This has the effect of creating some kernel threads. The scheduler can thus be called from kernel threads but not always from bottom half of interrupts. There has to be a mechanism to call the scheduler without the scheduler having to schedule the task itself. Otherwise, the kernel will stop functioning.

Linux kernel scheduling

This problem actually is one of the major reasons why it is rarely used in common environments, since SJF algorithm requires accurate estimate of the runtime of all processes, which is only given in specialized environments.

In common situations you can only get estimated and inaccurate length of process running time, for example, by recording the length of previous CPU bursts of the same process, and use mathematical approximation methods to calculate how long it will run next time.