How Many Instructions Does Linux Kernel Need in Order to Handle an Interrupt on an Arm Cortex A9

How does a CPU handle asynchronous interrupts?

Interrupts are definitely always taken on instruction boundaries, even if that means discarding partial progress and restarting execution after interrupt return, at least on x86 and ARM microarchs. (Some instructions are interruptible, like rep movsb has a way to update registers. AVX2 gathers are also interruptible, or at least could be; the mask-updating rules might only ever get applied for synchronous exceptions encountered on one element).

Interrupting instruction in the middle of execution
Interrupting an assembly instruction while it is operating

There's some evidence that Intel CPUs let one more instruction retire before taking an interrupt, at least for profiling interrupts (from the PMU); those are semi-synchronous but for some events don't have a fixed spot in the program where they must be taken, unlike page faults which have to fault on the faulting instruction.

A multi-uop instruction that's already partially retired would have to be allowed to finish executing and retire the whole instruction, to reach the next consistent architectural state where an interrupt could possibly be taken.

(Another possible reason for letting an instruction finish executing before taking an interrupt is to avoid starvation.)

Otherwise yes, the ROB and RS are discarded and execution is rolled back to the retirement state. Keeping interrupt latency low is generally desirable, and a large ROB could hold a lot of cache-miss and TLB-miss loads making the worst-case interrupt latency really bad, so a malicious process could hurt the capabilities of a real-time OS.

When an interrupt occurs, what happens to instructions in the pipeline?
Estimating of interrupt latency on the x86 CPUs
(maybe) Reliability of Xcode Instrument's disassembly time profiling mentions performance event sampling.

What is the irq latency due to the operating system?

Mats and Nemanja give some good information on interrupt latency. There ~~are two~~ is one more issue I would add, to the three given by Mats.

Other simultaneous/near simultaneous interrupts.
OS latency added due to masking interrupts. Edit: This is in Mats answer, just not explained as much.

If a single core is processing interrupts, then when multiple interrupts occur at the same time, usually there is some resolution priority. However, interrupts are often disabled in the interrupt handler unless priority interrupt handling is enabled. So for example, a slow NAND flash IRQ is signaled and running and then an Ethernet interrupt occurs, it may be delayed until the NAND flash IRQ finishes. Of course, if you have priorty interrupts and you are concerned about the NAND flash interrupt, then things can actually be worse, if the Ethernet is given priority.

The second issue is when mainline code clears/sets the interrupt flag. Typically this is done with something like,

mrs   r9, cpsr
biceq r9, r9, #PSR_I_BIT

Check arch/arm/include/asm/irqflags.h in the Linux source for many macros used by main line code. A typical sequence is like this,

lock interrupts;
manipulate some flag in struct;
unlock interrupts;

A very large interrupt latency can be introduced if that struct results in a page fault. The interrupts will be masked for the duration of the page fault handler.

The Cortex-A9 has lots of lock free instructions that can prevent this by never masking interrupts; because of better assembler instructions than swp/swpb. This second issue is much like the IRQ latency due to ldm/stm type instructions (these are just the longest instructions to run).

Finally, a lot of the technical discussions will assume zero-wait state RAM. It is likely that the cache will need to be filled and if you know your memory data rate (maybe 2-4 machine cycles), then the worst case code path would multiply by this.

Whether you have SMP interrupt handling, priority interrupts, and lock free main line depends on your kernel configuration and version; these are issues for the OS. Other issues are intrinsic to the CPU/SOC interrupt controller, and to the interrupt code itself.

How multiple interrupt handler share address 0x00000018

An ARM CPU typically has two pins (FIQ and IRQ) that are asserted by devices when they want to generate an interrupt. When this happens, the CPU simply switches modes and jumps to address 0x00000018.

However, because there are usually more devices than the number of interrupt pins, there's usually an interrupt controller that goes between the CPU and the devices. You can think of this as a hub for connecting more interrupts to the CPU. The interrupt controller can be configured to assert FIQ for certain kinds of interrupts it receives.

The interrupt handler usually asks the interrupt controller which pin caused the interrupt, then calls the appropriate handler.

Here's a stripped down version without error checking of interrupt handler code that I used in a small project.

#include <types.h>
#include <irq.h>

static void (*irq_handlers[32])(void);

void __attribute__((interrupt)) handle_irq() {
    int irq = irq_hw_get_and_ack();

    if (irq_handlers[irq]) {
        irq_handlers[irq]();
    }
}

void setup_irq() {
    irq_hw_init();
    cpu_enable_irq();
}

void irq_request(int irq, void (*func)(void)) {
    irq_handlers[irq] = func;
    irq_hw_enable(irq);
}

void irq_unrequest(int irq) {
    irq_hw_disable(irq);
    irq_handlers[irq] = NULL;
}

How Many Instructions Does Linux Kernel Need in Order to Handle an Interrupt on an Arm Cortex A9

How does a CPU handle asynchronous interrupts?

What is the irq latency due to the operating system?

How multiple interrupt handler share address 0x00000018

Related Topics

Leave a reply