Critical Timing in an Arm Linux Kernel Driver

Critical Timing in an ARM Linux Kernel Driver

The ARM cache on an IMX25 (ARM926) is 16K Code, 16K Data L1 with a 32byte length or eight instructions. With the DDR-SDRAM controller running at 133Mhz and a 16bit bus the transfer rate is about 300MB/s. A cache fill should only take about 100nS, not 9uS; this is about 100 times too long.

However, you have four other issues with Linux.

TLB misses and a page table walk.
Data aborts.
DMA masters stealing.
FIQ interrupts.

It is unlikely that the LCD master is stealing enough bandwidth, unless you have a huge display. Is your display larger than 1/4VGA? If not, this is only 10% of the memory bandwidth and this will pipeline with the processor. Do you have either Ethernet or USB active? These peripherals are higher data rate and could cause this type of contention with SDRAM.

All of these issues maybe avoided by writing your toggler PC relative and copying it to the IRAM. See: iram_alloc.c; this file should be portable to older versions of Linux. The XBAR switch allows fetches from SDRAM and IRAM simultaneously. The IRAM can still be a target of other DMA masters. If you are really pressed, move the code to the ETB buffers which no other master in the system can access.

The TLB miss can actually be quite steep as it may need to run several single beat SDRAM cycles; still this should be under 1uS. You have not posted code, so it is possible that a variable and/or other is causing a data fault which is not maskable.

If you have any drivers using the FIQ, they may still be running even though you have masked the normal IRQ interrupts. For instance, the ALSA driver for this system normally uses the FIQ.

Both the ETB and the IRAM are 32-bit data paths and low wait state. Either one will probably give better response than the DDR-SDRAM.

We have achieved sub micro-second response by using a FIQ and IRAM to toggle GPIOs on an IMX258 with another protocol using bit banging.

Interrupt performance on linux kernel with RT patches - should be better?

What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).

This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.

Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.

Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.

From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).

The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).

You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.

How SMP schedule work in Linux kernel? (ARM architecture)

The ARM SMP systems support two types of interrupts. SPI (shared peripheral interrupt) and PPI (peripheral private interrupts). The PPI is a per-CPU interrupt source. A special case for SMP of the PPI is an SGI (software generated interrupt); this is a CPU-to-CPU interrupt that is used to signal from one CPU to another in the SMP world (called IPI).^Note1

A PPI timer can be used to allow each CPU to use 'tickless scheduling'; that is timer interrupts are scheduled via knowledge of future time events (google timing wheel, look at the NO_HZ documentation, etc). The current Linux kernel doesn't use this specific PPI timer for scheduling. It is only used as a delay loop time source. Instead the Global PPI timer is used. This timer can interrupt each CPU selectively, but the register set is global to all CPUs. A particular CPU may schedule an interrupt for itself; with the time base being global.

The complication is that tasks must be migrated from one CPU to another in order to balance work among CPUs. Also, the Linux kernel's core code/scheduler is written for multiple CPUs (or architectures) and they may not have these per-CPU interrupt sources. An definitive answer may depend on your kernel version and the scheduler used (or more generally kernel configuration). Generally, a busy CPU will do the migration, other CPUs may wake on a timer tick just to see if a task in it's set should run (maybe a migrated process). If NO_HZ is in effect, some CPUs may not wake at all; they will get an IPI in the case of migration.

In any case, there is nothing that is ARM specific in the CPU scheduling besides the clock source. It is possible for an ARM SMP system to not have the a global PPI timer. In this case, every CPU may wake to service an interrupt, but the majority may sleep immediately. This could happen on any system due to a bad timer/interrupt controller design or a bad system configuration. However, even in these cases, the code would not call into the scheduler except where needed.

See: Linux Scheduler on SMP (which maybe a duplicate although the answer is not great IMO), IBMs completely fair scheduler article and O'Reillys Linux Kernel scheduler chapter.

Note1: This is actually GIC (or generic interrupt controller) terminology. However, most ARM SMP systems use this interrupt controller. It is bundled with Cortex-A CPUs and came as an external soft-component for some ARMv6 systems. It is possible for an ARM SMP systems to use another controller, but it is probably extremely rare or non-existent.

Edit: There are two ARM on-chip timers; these are useful as every Cortex-A has them compared to SOC vender timers. One of them is used instead of a 'counting loop' for a delay. This works better in the case of interrupts. I don't think it is critical to understand SMP scheduling, you may ignore that comment and just know that that source file is not used for scheduling. It was the first one I looked at. If you find it really distracting, I will remove that information.

See this paper on timing wheels; it is about 'IP'/networking, but the concept of NO_HZ is similar. Ie. Don't interrupt every 10mS, just to increment ticks. In the NO_HZ case, each CPU can set a future wake-up time based on what sort of requests drivers and sub-systems have given. Ie, schedule_work() needs to be run in 175ms, then the timer is set to that value for the CPU and we don't wake-up 17 times (if the system tick is 10mS), but just increment ticks by 17. Some CPUs may need a timeout to evict the current process to run another for multi-tasking as well, so the scheduler itself may set a timer.

Critical Timing in an Arm Linux Kernel Driver