Where Is Hardware Timer Interrupt

where is hardware timer interrupt?

Well, yes, if we are talking about the traditional 8254 PIT timer, it is at IRQ0, which is vector 32. But that is not generally used as the timer in the Linux operating system on modern machines. [Note that the vector assignment of 32 is really quite arbitrary. It is set when programming the 8259 (PIC) or APIC - but it's not a bad choice, since 32 is the first vector AFTER the reserved ones. It's certainly better than mixing the hardware interrupts with exception vectors, as DOS would do - so there was no way to tell a General Protection fault (vector 13 in the table above) from a INTR 5 (also vector 13, as the INT0 was mapped to Vector 8, and 5 + 8 = 13). From memory, INTR5 wasn't particularly well used - something like LPT2: (Second parallel port). But it's still a good idea to not overlap them... Henc the "reserved" for the vectors 20 to 31.

The IRQ that actually controls the timing of the system is most likely a Local APIC timer, and it's vector is not fixed in hardware in the same way as the original PC.

Also, with the advent of "message signalled interrupts", it is entirely possible to have (much) more than 256 interrupt vectors.

I don't agree with the wording "vector 0-19 are non-maskable interrupts". Aside from NMI (vector 2), they are all EXCEPTIONS (aka TRAPS or FAULTS) - that is, an event driven by some error condition in the system - vector zero is the result of an integer divide by zero, vector 1 is a "single step" instruction interrupt [and a few other "debug" traps, such as "write to any address matching an enabled debug register"], vector 3 is the result of a "int3" instruction (opcode 0xcc), vector 4 is the result of executing "INTO"(that's 'o' as in overflow, not 0 as in zero). When accessing a piece of memory not marked as present in the page-tables, vector 14 is used. They are indeed "non-maskable", but they are, with a few exceptions, direct consequences of the instructon executing at the time - so they are synchronous to the program itself.

The exceptions are the "Double fault" exception and "machine check fault".

Double fault is when the processor detects a fault during the handling of another exception - typically because the operating system has done something daft, like set the stack to somewhere invalid, and thus gets a page-fault, tries to use the stack to store the page-fault return address and that fails because the stack is not accessible. Double fault handlers, thus, tends to be set as "task switch interrupts", and load a new stack to make sure the double fault can continue. If the double fault handler can't run correctly, the processor will "triple-fault". This usually means "reboot" on PC platforms. Double faults are normally not recoveriable - the handler will (try to) provide some information about what happened, and how it got into this state, but once that's done, the system either reboots or waits for someone to come and push the reset button.

Machine check fault is where the processor detects an unrecoverable error - such as irrecoverable memory error or a cache parity error, etc. These are typically also non-recoverable, but not DIRECTLY coupled with the instruction being executed, but more on a combination of different events (memory read of an address where the memory content has gone bad, or similar).

How hardware timers work and affect on software performance?

A hardware timer is, at its core, just a count-up counter and a set of comparators (or a count-down counter that uses the borrow of the MSb as an implicit comparison with 0).

Picture it as a register with a specialized operation Increment (or Decrement) that is started at every cycle of a clock (the easiest kind of counter with this operation is the Ripple-counter).

Each cycle the counter value is also fed to the comparator, previously loaded with a value, and its output will be the input to the CPU (as an interrupt or in a specialized pin).

In the case of a count-down counter, the borrow from the MSb acts as the signal that the value rolled over zero.

These timers have usually more functions, like the ability to stop after they reach the desired value (one-shot), to reset (periodic), to alternate the output state low and high (square wave generator), and other fancy features.

There is no limit on how many timers you can put on a package, of course, albeit simple circuits, they still have a cost in terms of money and space.

Most MCUs have one or two timers, when two, the idea is to use one for generic scheduling and the other for high-priority tasks orthogonal to the OS scheduling.

It's worth noting that having many hardware timers (to be used by the software) is useless unless there are also many CPUs/MCUs since it's easier to use software timers.

On x86 the HPET timer is actually made of at most 32 timers, each with 8 comparators, for a total of 256 timers as seen from the software POV.

The idea was to assign each timer to a specific application.

Applications in an OS don't use the hardware timers directly, because there can possibly be a lot of applications but just one or two timers.

So what the OS does is share the timer.

It does this by programming the timer to generate an interrupt every X units of time and by registering an ISR (Interrupt Service Routine) for such an event.

When a thread/task/program sets up a timer, the OS appends the timer information (periodic vs one-shot, period, ticks left, and callback) to a priority queue using the absolute expiration time as the key (see Peter Cordes comments below) or a list for simple OSes.

Each time the ISR is called the OS will peek at the queue and see if the element on top is expired.

What happens when a software timer is expired is OS-dependent.

Some embedded and small OS may call the timer's callback directly from the context of the ISR.

This is often true if the OS doesn't really have a concept of thread/task (and so of context switch).

Other OSes may append the timer's callback to a list of "to be called soon" functions.

This list will be walked and processed by a specialized task. This is how FreeRTOS does it if the timer task is enabled.

This approach keeps the ISR short and allows programming the hardware timer with a shorter period (in many architectures interrupts are ignored while in an ISR, either by the CPU automatically masking interrupts or by the interrupt controller).

IIRC Windows does something similar, it posts an APC (Async Procedure Call) in the context of the thread that set the software timer just expired. When the thread is scheduled the APC will (as a form of a window's message or not, depending on the specific API used) call the callback. If the thread was waiting on the timer, I think it is just set in the ready state. In any case, it's not scheduled right away but it may get a priority boost.

Where the ISR will return is still OS-dependent.

An OS may continue executing the interrupted thread/task until it's scheduled out. In this case, you don't have step 4 immediately after step 3, instead, thread3 will run until its quantum expires.

On the other way around, an OS may signal the end of the ISR to the hardware and then schedule the thread with the callback.

This approach doesn't work if two or more timers expire in the same tick, so a better approach would be to execute a rescheduling, letting the schedule pick up the most appropriate thread.

The scheduling may also take into account other hints given by the thread during the creation of the software timer.

The OS may also just switch context, execute the callback and get back to the ISR context where it continues peeking at the queue.

The OS may even do any of that based on the period of the timer and other hints.

So it works pretty much like you imagined, except that a thread may not be called immediately upon the timer's expiration.

Updating a timer is not expensive.

While all in all the total work is not much, the timer ISR is meant to be called many many times a second.

In fact, I'm not even sure an OS will allow you to create such a huge number (500k) of timers.

Windows can manage a lot of timers (and their backing threads) but probably not 500k.

The main problem with having a lot of timers is that even if each one performs little work, the total work performed may be too much to keep up with the rate of ticking.

If each X units (e.g. 1ms) of time 100 timers expire, you have X/100 units of time (e.g. 10us) to execute each callback and the callback's code may just be too long to execute in that slice of time.

When this happens the callbacks will be called less often than desired.

More CPU/cores will allow some callback to execute in parallel and would alleviate the pressure.

In general, you need different timers if they run at different rates, otherwise, a single timer that walks a data structure filled with elements of work/data is fine.

Multi-threading can provide concurrency if your tasks are IO-bounded (files, network, input, and so on) or parallelism if you have a multi-processor system.

Simulate a Hardware Timer Interrupt in C

The closest thing in POSIX terms is probably signal handlers; SIGALRM is fired asynchronously within the process in much the same way that an ISR is. There's significant differences in what's safe to do, though, so I wouldn't go too far with the analogy.

Is using a hardware timer the only way to implement process scheduling?

No. Basically there are two basic methods of implementing multithreading in an operating system:

1) Preemtive Multitasking

With preemtive multitasking you can usw interrupt source to trigger your task switch. Most of the time one does task switching inside the timer ISR(Interrupt Service Routine) in case a long running task is executed and no other hardware events have happened. In case other hardware events have happened one might also do a task switch to blocking threads with higher priority to allow handling of hardware events.

2) Cooperative Multitasking

In cooperative Multitasking the operating system switches threads whenever a system call is executed. This can either be a special system call that allows an application to explicitly trigger a task switch (like Yield used in early multitasking systems like Windows 3.11, classical Mac OS, etc.). One can also implement cooperative multitasking completely inside user mode.

Today most operating systems take a hybrid approach - they react to hardware events (in case a long running thread never calls the systems routines and no other I/O happens this would be the timer) but they may also switch tasks in a cooperative way whenever applications perform syscalls or call system supplied libraries.

AVR Timer and Hardware interrupts

The falling edge sets the interrupt flag even if you disable it. This is called a "pending" interrupt. As soon as the interrupt is enabled, its service routine is called (given that all other enabling conditions are met).

You need to clear this pending flag before you enable the interrupt.

Does the sleep() function cause a timer interrupt upon completion?

Does the sleep() function cause a timer interrupt upon completion?

Maybe.

For keeping track of time delays there's 2 common ways it could be implemented:

a) A timer IRQ occurs at a fixed frequency (e.g. maybe every 1 millisecond). When the IRQ occurs the OS checks if any time delays expired and deals with them. In this case there's a compromise between precision and overhead (to get better precision you need to increase the "IRQs per second" which increases the overhead of dealing with all the IRQs).

b) The OS re-configures the timer to generate an IRQ when the soonest delay should expire whenever necessary (when the soonest delay is cancelled, a sooner delay is created, or the soonest delay expires). This has no "precision vs. overhead" compromise, but has more overhead for re-configuring the timer hardware. This is typically called "tickless" (as there's no regular/fixed frequency "tick").

Note that modern 80x86 systems have a local APIC timer per CPU that supports "IRQ on TSC deadline". For "tickless", this means you can normally get better than 1 nanosecond precision without much need for locks (using "per CPU" structures to keep track of time delays); and the cost of re-configuring the timer is very small (as the timer hardware is built directly into the CPU itself).

For "tickless" (which is likely much better for modern systems) you would end up with a timer IRQ when "sleep()" expires most of the time (unless some other delay expires at the same/similar time).

Does this mean a program using sleep() once awoken will likely cause another program running on one of the CPUs (in a multi-processor) to be removed in favor of the recently awoken program?

Whether a recently unblocked task preempts immediately depends on:

a) The scheduler design. For some schedulers (e.g. naive "round robin") it may never happen immediately.

b) The priorities of the unblocked task and the currently running task/s.

c) Optimizations. Task switches cost overhead so attempts to minimize the number of task switches (e.g. postponing/skipping a task switch if some other task switch is likely to happen soon anyway) are practical. There's also complexity involving load balancing, power management, cache efficiency, memory (NUMA, etc) and other things that may be considered.

Embedded system interrupts

The hardware circuitry that constitutes the timer peripheral within the microcontroller is able to perform a comparison and toggle an output in CTC mode. This logic is performed in hardware, without relying on the CPU to execute software instructions. Therefore, the CTC mode compare and toggle occurs in parallel with whatever the CPU happens to be executing.

I don't understand what you mean by the timer "counts more". More as in more often or faster rate? More as in greater total counts? Regardless, I think the answer is no. The timer counts at the rate of the input clock that is driving it. In CTC mode the timer counts up to the comparison value that you have configured it for.

Is timer interrupt independent of whether system is in kernel mode or user mode?

The simple answer is that neither the execution of the hardware clock interrupt service routine, nor the scheduling of the dynamic timer handlers are affected by the mode the system was in before the hardware clock interrupt. The reason is that the clock timer interrupt is a hardware interrupt that is serviced immediately, regardless of whether the execution is in kernel or user context (assuming that the timer interrupt is enabled that is), and the interrupt service routine for the clock timer interrupt itself raises the software interrupt that runs the dynamic timer handlers.

Caveat: 1) I haven't actually proved this empirically. 2) This does not apply to tickless kernels or highres timers.

The Linux kernel code uses the word "timer" to mean several different things:

the hardware timer or clock interrupt that gives the kernel its "ticks"
dynamic timers - software timers used by the kernel and drivers
interval timers - (setitimer and alarm system calls) software timers for user mode processes

The hardware clock or tick timer

On systems that use a hardware clock to provide the "tick", the clock timer interrupt is an architecture dependent hardware interrupt. For example, look for "timer_interrupt" in arch/powerpc/kernel/head_booke.h and then see the interrupt service routine (ISR) timer_interrupt implementation in arch/powerpc/kernel/time.c. This ISR executes immediately when the timer interrupt occurs, regardless of current execution context. This hardware interrupt differs from other hardware interrupts though, in that when it returns, processing does not return to the prior context. Instead, the scheduler is entered.

For a system that is set to produce 1000 clock interrupts per second, there is a chance that clock interrupts will sometimes be masked when other interrupts are being serviced. This is usually called the "lost ticks" problem. Without compensating for lost ticks, a loaded system could have a slowed sense of time. On some architectures the kernel compensates for lost ticks by using a finer grained hardware increment counter, whose value is read and recorded every clock timer interrupt. By comparing the increment counter value of the current tick against the increment counter value of the previous tick, the kernel can tell if a tick has been lost.

The software timers

The list of dynamic timer handlers (the type you set with the linux/timer.h) of dynamic timers that have expired is set at the end of the clock timer interrupt, before it returns. The sequence is (approximately):

[arch dependent]:timer_interrupt( )
kernel/time/tick-common.c:tick_handle_periodic( )
kernel/time/tick-common.c:tick_periodic( )
kernel/timer.c:update_process_times( )
kernel/timer.c:run_local_timers( )
kernel/softirq.c:raise_softirq(TIMER_SOFTIRQ)

I have omitted the initialilzations that set the handler for the timer_interrupt to tick_handle_periodic, and the handler for TIMER_SOFTIRQ.

The call to raise_softirq(TIMER_SOFTIRQ) generates a software interrupt that is serviced immediately. The ISR for the interrupt runs the dynamic timer queue. The timer handlers run in softirq context, with hardware interrupts enabled. When the ISR returns, the scheduler is called. This means that if there are a lot of timers set, whatever process happens to be next in the run queue will be delayed.

If there were lost ticks, then the execution of the timer handlers could be delayed, however, the delay does not depend on the contect of execution prior to running the clock timer interrupt.

Note about dynamic timer accuracy

"...the kernel cannot ensure that timer functions will start right at their expiration times. It can only ensure that they are executed either at the proper time or after with a delay of up to a few hundred milliseconds." Understanding the Linux Kernel, Bovet and Cesati, 3rd edition, O'reilly.

So, if you need better timer accuracy, you need to use highres timers.

References: Software interrupts and realtime

Where Is Hardware Timer Interrupt