What Is The Downside of Updating Arm Ttbr(Translate Table Base Register)

What is the downside of updating ARM TTBR(Translate Table Base Register)?

Updating the TTBR (translation table base register)^Note1 with the MMU enables has many perils. There are interrupts, page faults, TLB (MMU-cache) and both L1 and L2 caches to be considered. On different systems, the caches maybe PIPT or VIVT (physically or virtually tagged), there may or may not exist L1 nor L2 caches.

People seem overly concerned about the MMU and TLB for efficiency. They are always dwarfed by the primary L1/L2 caches in performance considerations. It is a smaller impact to update the MMU tables and perform TLB flushes than it is to have un-needed evictions from the L1/L2 code and data caches. At a minimum a TLB is worth 1/4KB or over 1/100 cache lines (cost to repopulate). In some cases, the TLB entry maybe 1MB.

Some data/code in the L1/L2 user space may need to be evicted on context switches. However, for very frequent small work-loads, a user context switch may keep code and data in the L1/L2. For example a media player doing large CPU intensive decoding and some cron task checking to see no new email is on a server. The switch to and back from the 'cron' task may result in code remaining in the L2 cache for the video decoding to use.

What is the downside of updating ARM TTBR?

Unless the from/to tables are identical you have to keep the system view of memory consistent for the duration of the update.^Note2 This will naturally cause IRQ latency and complexity of implementation as you need to sync up many sub-systems. Also, the Linux MM (memory management) code is architecture agnostic. It handles a great variety of MMU sub-systems. The goal is never to optimize locally (at the architecture level) but optimize globally at the generic layers.

Note1: The TTBR is a pointer to a physical 16k aligned memory region that is the first level of the ARM MMU. Each entry is 1MB (on 32bit systems) and may point to another table; often called L2.

Note2: You might do this in a boot loader or places where you are migrating system level code between memory devices. Ie, update the TTBR with identical tables is of no consequence by itself. It is when the tables differ that weird things will happen.

In ARMv8, where is a process's root page table is saved?

TL;DR - Observing a TTBRx switch on a system can be difficult due to ASID/DACR/pid facilities on the ARM CPU. Ie, the page tables are annotated with 'process information' and a single register accessible from priveledge mode updates on a context switch for a majority of the cases. This keeps cache entries and TLB fresh.

As per ARM64 TTBR0/1, there are two table base registers. This is also relevant to ARMv7-A systems. As well, you have an ASID. There are several ASIDs and if your system does not have a lot of active processes, the TTBR1 will not change as the kernel will only flip the active domain (single register write). This is the 'fast path' in check_and_switch_context().

It you have a highly active system with >16 processes contending/active, then you will take the slow path which updated TTBR0/1. This ends up calling cpu_do_switch_mm(), which you can see does the update.

References:

Downside of TTBR updates
Master class
ARM Domains
Update ARM MMU translation table

pid was a ARMv5 mechanics, which was not accepted into the mainline kernel. DACR (domains (ARMv6)) and ASID are very similar, where ASID is a slight evolution of DACR. A pid was a single value, whereas 'domains' allow a process to have several address space maps; so processes can overlap with shared library code for instance. TLB and cache are annotated with domain information (as well as worlds for TrustZone).

Does ARM use physcial address or a virtual address when entering the vector table?

When paging is enabled and an exception occurs does a translation table walk occur to access the exception vector table at address 0x00000000?

Almost all ARM CPUs have a means to configure the exception table address. So in most systems, the exception vector table is not at address 0x00000000. However, the MMU is enabled when exceptions are taken. The TLB (an MMU/page table cache) will contain the vector table physical address.

In some SOCs the boot vector table maybe at 0x0, but this is usually reconfigured by the boot code.

If paging is still enabled then how do user mode processes and the vector table both share address 0x00000000 - the TTBR (translation table base register) does not get updated on exception entry and the TTBR is not a banked register (we are not talking here about switching between secure and non-secure worlds).

If you want the vector table at address 0x00000000, then it is what user space will see unless you prohibit it. Prohibiting access to 0x0 maybe a desired design to prevent NULL pointer use. Many OSes do not have user space run from 0x0, but an address like 0x8000.

Having user space fault based on a parameter can be very useful as you can trap NULL pointer access while a process is being developed. I would recommend always leaving this on, but some people allow NULL access for production code.

If no then we must enter exceptions using physical addressing in which case is paging now disabled?

No paging is enabled as the cache is probably on as well. The load/store unit of the CPU would be more complex if some accesses are physical and others are virtual; especially as caches are populated by a virtual address in traditional ARM CPUs.

While forking a process, why does Linux kernel copy the content of kernel page table for every newly created process?

Each process having its own copy of page table for kernel part(higher 1GB) is to avoid L1 page table switching(i.e. avoid updating TTBR) when user/kernel land is being switched. Note that user/kernel land switch happens quite frequently.

Why avoiding updating TTBR? Details can be found here:
What is the downside of updating ARM TTBR(Translate Table Base Register)?

Linux kernel ARM Translation table base (TTB0 and TTB1)

The TTBR registers are used together to determine addressing for the full 32-bit or 40-bit address space. Which register is used for what address ranges is controlled via the tXsz bits in the TTBCR. There is an entry for t0sz corresponding to TTBR0 and t1sz for TTBR1.

The page tables addressed by each TTBRx register are independent, but you typically find most Linux implementations just use TTBR0. Linux expects to be able to use a 3G/1G address space partitioning scheme, which is not supported by ARM. If you look at page B3-1345 of the ARMv7 Architecture Reference Manual, you'll see that the value of t0sz and t1sz determine the address ranges supported by TTBR0 and TTBR1 respectively. To add confusion to disorientation, it is even possible to have disjoined address spaces where TTBR0 and TTBR1 support ranges that are not contiguous, resulting in a hole in the system address space. Good times!

To answer your main question though, it is recommended by ARM that TTBR0 be used to store the offset to the page tables used by USER processes, and TTBR1 be used to store the offset to the page tables used by the KERNEL. I have yet to see a single implementation that actually does this. Almost exclusively TTBR0 is used in all cases, with TTBR1 containing a duplicate copy of the L1 tables.

So how does this work? The value of TTBR is stored as part of the process state and simply restored each time a process with switched out. This is how it is expected to work. Originally, TTBR1 would hold a constant value for the kernel tables and never be replaced or swapped out, whereas TTBR0 would be changed each time you context switch between processes. Apparently most Linux implementations for ARM have decided to just basically eliminate the use of TTBR1 and stick to using TTBR0 for everything.

If you want to test this theory on your device, try whacking TTBR1 and watch nothing happen. Then try whacking TTBR0 and watch your system crash. I've yet to encounter a single instance that didn't result in this exact same result. Long story short, TTBR1 is useless by Linux, and TTBR0 is used almost exclusively and simply swapped out.

Now, once you get to LPAE support, throw all this away and start over again. This is the implementation where you will start to see the value of t0sz and t1sz being something other than zero, and hence N as well.

What Is The Downside of Updating Arm Ttbr(Translate Table Base Register)