Why Is the Page Size of Linux (X86) 4 Kb, How Is That Calculated

Why is the page size of Linux (x86) 4 KB, how is that calculated?

The default page size is fixed by what the MMU (memory management unit) of the CPU supports. In 32-bit protected mode x86 supports two kinds of pages:

  • normal ones, 4 KiB
  • huge ones, 4 MiB

Not all x86 processors support large pages. One needs to have a CPU with Page Size Extension (PSE) capabilities. This excludes pre-Pentium processors. Virtually all current-generation x86 CPUs implements it.

4 KiB is widely popuplar page granularity in other architectures too. One could argue that this size comes from the division of a 32-bit virutal address into two 10-bit indexes in page directories/tables and the remaining 12 bits give the 4 KiB page size.

Calculating Page Table Size

Since we have a virtual address space of 2^32 and each page size is 2^12, we can store (2^32/2^12) = 2^20 pages. Since each entry into this page table has an address of size 4 bytes, then we have 2^20*4 = 4MB. So the page table takes up 4MB in memory.

What is page table entry size?

A page table is a table of conversions from virtual to physical addresses that the OS uses to artificially increase the total amount of main memory available in a system.

Physical memory is the actual bits located at addresses in memory (DRAM), while virtual memory is where the OS "lies" to processes by telling them where it's at, in order to do things like allow for 2^64 bits of address space, despite the fact that 2^34 bits is the most RAM normally used. (2^32 bits is 4 gigabytes, so 2^34 is 16 gb.)
Most default page table sizes are 4096 kb for each process, but the number of page table entries can increase if the process needs more process space. Page table sizes can also initially be allocated smaller or larger amounts or memory, it's just that 4 kb is usually the best size for most processes.

Note that a page table is a table of page entries. Both can have different sizes, but page table sizes are most commonly 4096 kb or 4 mb and page table size is increased by adding more entries.

How long is a virtual page number in a 48-bits system with 4KB page size

Normally the virtual page number would be everything except the low bits of the address that give the offset within the page. With 4 KB (= 2**12 byte) pages, the offset is the low 12 bits, so the virtual page number is the remaining 48-12=36 bits. That's 0x5641ba0c7 in your example. (This number is likely to be broken up further as indexes into several levels of page tables.)

There's nothing universal about 20 bits; it's just that if you have 32-bit virtual addresses then 32-12=20, so the high 20 bits give the page number in that case.

How to get linux kernel page size programmatically

This is what I finally did:

  • Re-work my current module to take a new module parameter called page_shift and used that to calculate the PAGE_SIZE (PAGE_SIZE = 1 << PAGE_SHIFT)
  • Created a module loader wrapper which gets the current system PAGE_SHIFT using getconf API from libc. This wrapper gets the current system page shift and pass it as a module parameter.

Right now the module is being loaded on different architectures with different PAGE_SIZE without any problems.

Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)?

I believe you are talking about x86-64, my answer is based on that architecture.


When operating in 64-bit mode the CPU uses a revamped feature to translate virtual addresses into physical addresses known as PAE - Physical address extension.

Originally invented to break the 4GiB limit while still using 32-bit pointers, this feature involves the use of 4 level of tables.

Each table gives a pointer to the next table, down to the rightmost one that gives the upper bits of physical address. To get an idea look at this picture from the AMD64 Architecture Programming Manual:

4-Level paging, PAE, in long mode

The rationale behind all those tables is sparsity: the metadata for translating virtual addresses into physical addresses is huge - if we were to use 4KiB pages only we'd need 264 - 12 = 252 entries to cover the whole 64-bit address space.

Tables allow for a sparse approach, only the entries necessary are populated in memory.

This design is reflected in how the virtual address is divided (and thus, indirectly, in the number of levels), only runs of 9 bits are used to index the tables at each level.

Starting from bit 12 included, this gives: level 1 -> 12-20, level 2 -> 21-29, level 3 -> 30-38, level 4 -> 39-47.

This explains the current implementation limit of only 48 bits of virtual address space.

Note that at the instruction level, where logical addresses are used, we have full support for 64 bits addresses.

Full support is also available at the segmentation level, the part that translates logical addresses into linear addresses.

So the limitation comes from PAE.

My personal opinion is that AMD rushed to be the first to ship an x86 CPU with 64-bit support and reused PAE, patching it with a new level of indirection to translate up to 48 bits.

Note that both Intel and AMD allow a future implementation to use 64 bits for the virtual address (probably with more tables).

However, both companies set a hard limit of 52 bit for the physical address.
Why?

The answer can still be found in how paging work.

In 32-bit mode, each entry in each table is 32 bits wide; the low bits are used as flags (since the alignment requirements make them useless for the translation process) but the higher bits were all used for the translation, giving a 32/32 virtual/physical translation.

It's important to stress out that all the 32 bits were used, while some of the lower bits were not used as flags, Intel marked them as "Ignored" or "Available" meaning with that that the OS was free to use them.

When Intel introduced PAE, they needed 4 more bits (PAE was 36 bits back then) and the logical thing to do was to double the size of each entry since this creates a more efficient layout than a, say, 40-bit table entry.

This gave Intel a lot of spare space and they marked it as reserved (This can be better observed in older versions of the Intel SDM manuals, like this one).

With time, new attributes were needed in an entry, the most famous one being the XD/NX bit.

Protection keys are also a, relatively new, feature that takes space in an entry.
This shows that a full 64/64 bits virtual/physical translation is not possible anymore with the current ISA.

For a visual reference, here is the format of the 64-bit PAE table entries:

Intel 64-bit PAE table entries

It shows that a 64-bit physical address is not possible (for huge pages there still is a way to fix this but given the layout of the bits that seems unlikely) but doesn't explain why AMD set the limit to 52 bits.

Well, it's hard to say.

Certainly, the size of the physical address space has some hardware cost associated with it: more pins (though with the integrated memory controller, this is mitigated as the DDR specs multiplex a lot of signals) and more space in the caches/TLBs.

In this question (similar but not enough make this a duplicate) an answer cities Wikipedia, that in turn allegedly cites AMD, claiming that AMD's engineers set the limit to 52 bits after due considerations of benefits and costs.

I share what Hans Passant wrote more than 6 years ago: the current paging mechanisms are not suitable for a full 64-bit physical addressing and that's probably the reason why both Intel and AMD never bothered keeping the high bits in each entry reserved.

Both companies know that as the technology will approach the 52-bit limit it will also be very different from its current form.

By that time they will have designed a better mechanism for memory in general, so they avoided over-engineering the existing one.



Related Topics



Leave a reply



Submit