ARM Linux Page tables layout
As you say, in the ARM short-descriptor format each second-level page table is 1KB in size. Even with the associated shadow page table that only makes 2KB, meaning 50% of every page allocated for second-level tables would be entirely wasted.
Linux just pretends that the section size is 2MB, rather than the actual 1MB of the hardware, by allocating first-level entries in pairs, so that the corresponding pair of second-level tables can be kept together in a single page, avoid that wastage, and keep the management of page table memory really simple.
Arm64 Linux Page Table Walk
I finally solved the problem.
Actually, my code is correct. The only part I missed is a page table entry check.
According to the page table design of ARMv8, ARM uses 4 levels page table for 4kb granule case. Each level (level 0-3 defined in the link) is implemented as pgd, pud, pmd, and ptep
in Linux code.
In the ARM architecture, each level can be either block entry or the table entry (see the AArch64 Descriptor Format Section in the link).
If the memory address belongs to a 4kb table entry, then it needs to be traced down till level 3 entry (ptep
). However, for the address belongs to a larger chunk, the corresponding table entry may save in the pgd, pud, or pmd
level.
By checking the last 2 bits of the entry in each level, you know it's block entry or not and you only keep tracing down for the block entry.
Here is how to improve my code above:
Retrieving the descriptor based on the page table pointer desc = *pgd
and then checking the last 2 bits of the descriptor.
If the descriptor is a block entry (0x01) then you need to extract the lower level entry as my code shows above.
If you already get the table entry (0x11) at any level, then you can stop there and translate the VA to PA based on the descriptor desc
you just get.
int find_physical_pte(void *addr)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *ptep;
unsigned long long address;
address = (unsigned long long)addr;
pgd = pgd_offset(current->mm, address);
printk(KERN_INFO "\npgd is: %p\n", (void *)pgd);
printk(KERN_INFO "pgd value: %llx\n", *pgd);
if (pgd_none(*pgd) || pgd_bad(*pgd))
return -1;
//check if (*pgd) is a table entry. Exit here if you get the table entry.
pud = pud_offset(pgd, address);
printk(KERN_INFO "\npud is: %p\n", (void *)pud);
printk(KERN_INFO "pud value: %llx\n", (*pud).pgd);
if (pud_none(*pud) || pud_bad(*pud))
return -2;
//check if (*pud) is a table entry. Exit here if you get the table entry.
pmd = pmd_offset(pud, address);
printk(KERN_INFO "\npmd is: %p\n", (void *)pmd);
printk(KERN_INFO "pmd value: %llx\n",*pmd);
if (pmd_none(*pmd) || pmd_bad(*pmd))
return -3;
//check if (*pmd) is a table entry. Exit here if you get the table entry.
ptep = pte_offset_kernel(pmd, address);
printk(KERN_INFO "\npte is: %p\n", (void *)ptep);
printk(KERN_INFO "pte value: %llx\n",*ptep);
if (!ptep)
return -4;
return 1;
}
In ARMv8, where is a process's root page table is saved?
TL;DR - Observing a TTBRx switch on a system can be difficult due to ASID/DACR/pid facilities on the ARM CPU. Ie, the page tables are annotated with 'process information' and a single register accessible from priveledge mode updates on a context switch for a majority of the cases. This keeps cache entries and TLB fresh.
As per ARM64 TTBR0/1, there are two table base registers. This is also relevant to ARMv7-A systems. As well, you have an ASID. There are several ASIDs and if your system does not have a lot of active processes, the TTBR1 will not change as the kernel will only flip the active domain (single register write). This is the 'fast path' in check_and_switch_context()
.
It you have a highly active system with >16 processes contending/active, then you will take the slow path which updated TTBR0/1. This ends up calling cpu_do_switch_mm()
, which you can see does the update.
References:
- Downside of TTBR updates
- Master class
- ARM Domains
- Update ARM MMU translation table
pid was a ARMv5 mechanics, which was not accepted into the mainline kernel. DACR (domains (ARMv6)) and ASID are very similar, where ASID is a slight evolution of DACR. A pid was a single value, whereas 'domains' allow a process to have several address space maps; so processes can overlap with shared library code for instance. TLB and cache are annotated with domain information (as well as worlds for TrustZone).
Related Topics
How to Find Hadoop Hdfs Directory on My System
Can Not Connect to Linux "Abstract" Unix Socket
How to Dynamically Allocate Memory Using Assembly and System Calls Under Linux
Avrisp Mkii Doesn't Work with Avrdude on Linux
How to Make Cscope Display Full File Paths During Search
Why Is "Echo Foo | Read a ; Echo $A" Not Working as Expected
How to Execute Script in The Current Shell on Linux
Handling Input Confirmations in Linux Shell Scripting
Unshare --Pid /Bin/Bash - Fork Cannot Allocate Memory
How to Change Rvm Install Location
What Is the Concept of Vruntime in Cfs
Ld_Library_Path Doesn't Seem to Work
Searching for a String in Multiple Files on Linux
Access Permissions of /Dev/Mem
Sed Replacement Not Working When Using Variables
What Is the Explanation of This X86 Hello World Using 32-Bit Int 0X80 Linux System Calls from _Start