Arm Linux Kernel Page Table

ARM Linux Page tables layout

As you say, in the ARM short-descriptor format each second-level page table is 1KB in size. Even with the associated shadow page table that only makes 2KB, meaning 50% of every page allocated for second-level tables would be entirely wasted.

Linux just pretends that the section size is 2MB, rather than the actual 1MB of the hardware, by allocating first-level entries in pairs, so that the corresponding pair of second-level tables can be kept together in a single page, avoid that wastage, and keep the management of page table memory really simple.

Arm64 Linux Page Table Walk

I finally solved the problem.

Actually, my code is correct. The only part I missed is a page table entry check.

According to the page table design of ARMv8, ARM uses 4 levels page table for 4kb granule case. Each level (level 0-3 defined in the link) is implemented as pgd, pud, pmd, and ptep in Linux code.

In the ARM architecture, each level can be either block entry or the table entry (see the AArch64 Descriptor Format Section in the link).

If the memory address belongs to a 4kb table entry, then it needs to be traced down till level 3 entry (ptep). However, for the address belongs to a larger chunk, the corresponding table entry may save in the pgd, pud, or pmd level.

By checking the last 2 bits of the entry in each level, you know it's block entry or not and you only keep tracing down for the block entry.

Here is how to improve my code above:

Retrieving the descriptor based on the page table pointer desc = *pgd and then checking the last 2 bits of the descriptor.

If the descriptor is a block entry (0x01) then you need to extract the lower level entry as my code shows above.
If you already get the table entry (0x11) at any level, then you can stop there and translate the VA to PA based on the descriptor desc you just get.

int find_physical_pte(void *addr)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *ptep;
unsigned long long address;

address = (unsigned long long)addr;

pgd = pgd_offset(current->mm, address);
printk(KERN_INFO "\npgd is: %p\n", (void *)pgd);
printk(KERN_INFO "pgd value: %llx\n", *pgd);
if (pgd_none(*pgd) || pgd_bad(*pgd))
return -1;
//check if (*pgd) is a table entry. Exit here if you get the table entry.

pud = pud_offset(pgd, address);
printk(KERN_INFO "\npud is: %p\n", (void *)pud);
printk(KERN_INFO "pud value: %llx\n", (*pud).pgd);
if (pud_none(*pud) || pud_bad(*pud))
return -2;
//check if (*pud) is a table entry. Exit here if you get the table entry.

pmd = pmd_offset(pud, address);
printk(KERN_INFO "\npmd is: %p\n", (void *)pmd);
printk(KERN_INFO "pmd value: %llx\n",*pmd);
if (pmd_none(*pmd) || pmd_bad(*pmd))
return -3;
//check if (*pmd) is a table entry. Exit here if you get the table entry.

ptep = pte_offset_kernel(pmd, address);
printk(KERN_INFO "\npte is: %p\n", (void *)ptep);
printk(KERN_INFO "pte value: %llx\n",*ptep);
if (!ptep)
return -4;

return 1;
}

In ARMv8, where is a process's root page table is saved?

TL;DR - Observing a TTBRx switch on a system can be difficult due to ASID/DACR/pid facilities on the ARM CPU. Ie, the page tables are annotated with 'process information' and a single register accessible from priveledge mode updates on a context switch for a majority of the cases. This keeps cache entries and TLB fresh.


As per ARM64 TTBR0/1, there are two table base registers. This is also relevant to ARMv7-A systems. As well, you have an ASID. There are several ASIDs and if your system does not have a lot of active processes, the TTBR1 will not change as the kernel will only flip the active domain (single register write). This is the 'fast path' in check_and_switch_context().

It you have a highly active system with >16 processes contending/active, then you will take the slow path which updated TTBR0/1. This ends up calling cpu_do_switch_mm(), which you can see does the update.

References:

  • Downside of TTBR updates
  • Master class
  • ARM Domains
  • Update ARM MMU translation table

pid was a ARMv5 mechanics, which was not accepted into the mainline kernel. DACR (domains (ARMv6)) and ASID are very similar, where ASID is a slight evolution of DACR. A pid was a single value, whereas 'domains' allow a process to have several address space maps; so processes can overlap with shared library code for instance. TLB and cache are annotated with domain information (as well as worlds for TrustZone).



Related Topics



Leave a reply



Submit