Critical Timing in an ARM Linux Kernel Driver
The ARM cache on an IMX25 (ARM926) is 16K Code, 16K Data L1 with a 32byte length or eight instructions. With the DDR-SDRAM controller running at 133Mhz and a 16bit bus the transfer rate is about 300MB/s. A cache fill should only take about 100nS, not 9uS; this is about 100 times too long.
However, you have four other issues with Linux.
- TLB misses and a page table walk.
- Data aborts.
- DMA masters stealing.
- FIQ interrupts.
It is unlikely that the LCD master is stealing enough bandwidth, unless you have a huge display. Is your display larger than 1/4VGA? If not, this is only 10% of the memory bandwidth and this will pipeline with the processor. Do you have either Ethernet or USB active? These peripherals are higher data rate and could cause this type of contention with SDRAM.
All of these issues maybe avoided by writing your toggler PC relative and copying it to the IRAM. See: iram_alloc.c; this file should be portable to older versions of Linux. The XBAR switch allows fetches from SDRAM and IRAM simultaneously. The IRAM can still be a target of other DMA masters. If you are really pressed, move the code to the ETB buffers which no other master in the system can access.
The TLB miss can actually be quite steep as it may need to run several single beat SDRAM cycles; still this should be under 1uS. You have not posted code, so it is possible that a variable and/or other is causing a data fault which is not maskable.
If you have any drivers using the FIQ, they may still be running even though you have masked the normal IRQ interrupts. For instance, the ALSA driver for this system normally uses the FIQ.
Both the ETB and the IRAM are 32-bit data paths and low wait state. Either one will probably give better response than the DDR-SDRAM.
We have achieved sub micro-second response by using a FIQ and IRAM to toggle GPIOs on an IMX258 with another protocol using bit banging.
Interrupt performance on linux kernel with RT patches - should be better?
What you experience here is interrupt jitter. This is to be expected on Linux, because the kernel regularly disables the interrupts for various tasks (entering a spinlock, handling an interrupt, etc.).
This will happen, regardless wether you have PREEMPT_RT or not, so expecting to generate 74kHz signal with regular interrupts is pretty much unrealistic.
Now, ARM has higher priority interrupts called FIQs, that will never be masked or disabled.
Linux doesn't use FIQ, and is not built to deal with the fact that an FIQ could be used, so you won't be able to use the generic kernel framework.
From Linux driver development point of view however, it's not really different as long as you keep this in mind: you have to write a handler, and associate it to an IRQ. You'll also have to poke into the interrupt controller to make it generate a FIQ for the interrupt you want to use (the details on how to change it are platform-dependant. Some platforms have functions to do that (like imx25 and mxc_set_irq_fiq), some others don't. imx23/28 don't, so you'll have to do it by hand).
The only thing that the functions to setup a fiq handler only work with a assembly-written handler, so you'll have to rewrite your handler in assembly (with your current code, it should be trivial though).
You can grab additional details to the blog post Alexandre posted (http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/), where you'll find working code, samples, and explanations on how it all works together.
How SMP schedule work in Linux kernel? (ARM architecture)
The ARM SMP systems support two types of interrupts. SPI (shared peripheral interrupt) and PPI (peripheral private interrupts). The PPI is a per-CPU interrupt source. A special case for SMP of the PPI is an SGI (software generated interrupt); this is a CPU-to-CPU interrupt that is used to signal from one CPU to another in the SMP world (called IPI).Note1
A PPI timer can be used to allow each CPU to use 'tickless scheduling'; that is timer interrupts are scheduled via knowledge of future time events (google timing wheel, look at the NO_HZ
documentation, etc). The current Linux kernel doesn't use this specific PPI timer for scheduling. It is only used as a delay loop time source. Instead the Global PPI timer is used. This timer can interrupt each CPU selectively, but the register set is global to all CPUs. A particular CPU may schedule an interrupt for itself; with the time base being global.
The complication is that tasks must be migrated from one CPU to another in order to balance work among CPUs. Also, the Linux kernel's core code/scheduler is written for multiple CPUs (or architectures) and they may not have these per-CPU interrupt sources. An definitive answer may depend on your kernel version and the scheduler used (or more generally kernel configuration). Generally, a busy CPU will do the migration, other CPUs may wake on a timer tick just to see if a task in it's set should run (maybe a migrated process). If NO_HZ
is in effect, some CPUs may not wake at all; they will get an IPI in the case of migration.
In any case, there is nothing that is ARM specific in the CPU scheduling besides the clock source. It is possible for an ARM SMP system to not have the a global PPI timer. In this case, every CPU may wake to service an interrupt, but the majority may sleep immediately. This could happen on any system due to a bad timer/interrupt controller design or a bad system configuration. However, even in these cases, the code would not call into the scheduler except where needed.
See: Linux Scheduler on SMP (which maybe a duplicate although the answer is not great IMO), IBMs completely fair scheduler article and O'Reillys Linux Kernel scheduler chapter.
Note1: This is actually GIC (or generic interrupt controller) terminology. However, most ARM SMP systems use this interrupt controller. It is bundled with Cortex-A CPUs and came as an external soft-component for some ARMv6 systems. It is possible for an ARM SMP systems to use another controller, but it is probably extremely rare or non-existent.
Edit: There are two ARM on-chip timers; these are useful as every Cortex-A has them compared to SOC vender timers. One of them is used instead of a 'counting loop' for a delay. This works better in the case of interrupts. I don't think it is critical to understand SMP scheduling, you may ignore that comment and just know that that source file is not used for scheduling. It was the first one I looked at. If you find it really distracting, I will remove that information.
See this paper on timing wheels; it is about 'IP'/networking, but the concept of NO_HZ
is similar. Ie. Don't interrupt every 10mS, just to increment ticks. In the NO_HZ
case, each CPU can set a future wake-up time based on what sort of requests drivers and sub-systems have given. Ie, schedule_work()
needs to be run in 175ms, then the timer is set to that value for the CPU and we don't wake-up 17 times (if the system tick is 10mS), but just increment ticks by 17. Some CPUs may need a timeout to evict the current process to run another for multi-tasking as well, so the scheduler itself may set a timer.
Related Topics
Is It Safe to Issue Blocking Write() Calls on the Same Tcp Socket from Multiple Threads
Bluetooth Low Energy in C - Using Bluez to Create a Gatt Server
File Size in Human Readable Format
How to Run a Command as a Specific User in an Init Script
How to Gzip Standard in to a File and Also Print Standard in to Standard Out
Add Column to End of CSV File Using 'Awk' in Bash Script
Is Clock_Gettime() Adequate for Submicrosecond Timing
How to Read the Second-To-Last Line in a File Using Bash
Adding Users to Sudoers Through Shell Script
Pass Private Key Password to Openvpn Command Directly in Ubuntu 10.10
How to Compile Linux Kernel with Something Other Than Gcc
Sed: How to Delete Lines Matching a Pattern That Contains Forward Slashes
How Are Percpu Pointers Implemented in the Linux Kernel
Go Http Server Testing Ab VS Wrk So Much Difference in Result