Is Kernel Virtual Memory pages are swappable
Kernel space pages don't get page-{in,out} by design and are pinned to memory. The pages in the kernel can usually be trusted from a security point of view, while the user space pages should NOT be trusted.
For this reason you don't have to worry about accessing kernel buffers directly in your code. While its not the same the user space buffers, without worrying about handling page faults.
Kernel space pages cannot page-out by design, as you may want to consider what would your application do when the page containing the instructions for handling a page fault gets page-out!
physical storage of the kernel data
- First GB of physical memory mapped to high GB of virtual addresses linearly. But kernel can modify this mappings.
- Yes, it is.
- No, linux kernel is not swappable. Only user processes memory can be swapped out.
Note that this is only valid for 32-bit systems. Mappings on 64-bit systems are different.
Allocate swappable memory in linux kernel
You can create a file in the internal shm
shared memory filesystem.
const char *name = "example";
loff_t size = PAGE_SIZE;
unsigned long flags = 0;
struct file *filp = shmem_file_setup(name, size, flags);
/* assert(!IS_ERR(filp)); */
The file isn't actually linked, so the name isn't visible. The flags may include VM_NORESERVE
to skip accounting up-front, instead accounting as pages are allocated. Now you have a shmem
file. You can map a page like so:
struct address_space *mapping = filp->f_mapping;
pgoff_t index = 0;
struct page *p = shmem_read_mapping_page(mapping, index);
/* assert(!IS_ERR(filp)); */
void *data = page_to_virt(p);
memset(data, 0, PAGE_SIZE);
There is also shmem_read_mapping_page_gfp(..., gfp_t)
to specify how the page is allocated. Don't forget to put the page back when you're done with it.
put_page(p);
Ditto with the file.
fput(filp);
How does kernel know, which pages in the virtual address space correspond to a swapped out physical page frame?
Linux:
When swap file is used the Page Table Entry gets updated with one marked as invalid and holding information about where it is saved in the swap file. That is: an index to the swap_info
array and an offset within the swap_map
.
Example from (an a bit old) Page Table Entry type (pte_t
) on a x86. Some
of the bits are used as flags by the hardware:
Bit Function
_PAGE_PRESENT Page is resident in memory and not swapped out
_PAGE_PROTNONE Page is resident but not accessable
_PAGE_RW Set if the page may be written to
_PAGE_USER Set if the page is accessible from user space
_PAGE_DIRTY Set if the page is written to
_PAGE_ACCESSED Set if the page is accessed
Table 3.1: Page Table Entry Protection and Status Bits
See also another SO answer with a diagram of the x86-64 page table format. When the low bit = 0, the hardware ignores all the other bits, so the kernel can use them for anything. Even in a "present" entry, there are some guaranteed-ignored bits that aren't reserved for future hardware use, so the kernel can use them for its own purposes.
Presumably other architectures are similar.
In simple terms: A process points to a page, the page get updated. Thus the processes are, in effect, also updated. When the physical page get requested it is swapped in and thus all processes as well. The point being that the Page Table Entry is not removed when memory is swapped out.
You might find some of this useful:
- Gustavo Duarte: How The Kernel Manages Your Memory.
The kernel documentation included book of Mel Gorman (2007):
- 11.2 Mapping Page Table Entries to Swap Entries
3.2 Describing a Page Table Entry
Red Hat on VM's Life of a page.
Related Topics
How to Monitor Cwnd and Ssthresh Values for a Tcp Connection
Why Does The Call Latency on Clock_Gettime(Clock_Realtime, ..) Vary So Much
How to Diagnose a Python Process Chewing CPU in Linux
Git: Can't Push (Strange Config Issue)
Why Having to Use Non-Blocking Fd in a Edge Triggered Epoll Function
Graphics Card Memory and Virtual Address Space of a Process
How to Add Export Statement in a Bash_Profile File
How to Open Another File in Background Vim from Bash Command-Line
How to Delete/Remove Certificates from Mono Certificate Stores My and Trust
Linux: Checking If a Socket/Pipe Is Broken Without Doing a Read()/Write()
How to Translate X11 Keycode Back to Scancode or Hid Usage Id Reliably
Do I Need to "Enable" a Pcie Memory Region in a Linux 3.12 Driver
How to Sort The Output of "Grep -L" Chronologically by Newest Modification Date Last