How to Tell Linux to Keep a Page and Not Evict It

How can I tell Linux to keep a page and not evict it?

I think you are looking for mlock.

How can I tell Windows to keep a page and not evict it?

Have a look at the VirtualLock function:

Locks the specified region of the process's virtual address space into physical memory, ensuring that subsequent access to the region will not incur a page fault.

There's an example in this page: Creating Guard Pages.

How to guarantee that the memory page cannot be evicted if the corresponding memory request is being processed in CPU?

Speaking from a Linux perspective, there are a few mechanisms that help make sure this scenario can be handled safely.

1) The way Linux handles page replacement policies. Linux uses several LRU lists for its page cache. For file-backed and anonymous pages, each has a respective active and inactive list of pages. When a page is used, the CPU will mark the page as used through its PAGE_BIT_ACCESSED flag in the corresponding page table entry (pte). When this happens, Linux can put the page in the active list. After some time, the page replacement handler tries to kick the page out, but the page must go through a few rounds and steps before being kicked out. It's accessed bit must be turned off, then it must be put on the inactive list. Each time the handler executes one of these steps, it lets the page live and moves on to the rest of the list looking for pages to evict, perhaps until it reaches our page again. Since each of these steps involves several iterations through the LRU lists, presumably any CPU operations that made our page initially active will have ended by the time the handler comes back around to our page to evict it. If, on the other hand, the CPU accessed the page again while our page was on the inactive list waiting to be evicted, its accessed bit will be turned on and it can again be placed on the active list. The key here is that Linux generally tries to keep a certain number of pages in the inactive list so that when it's time to evict, only inactive pages are chosen. This is a way to prevent your situation from occurring, where an active page is kicked out.

2) Swapping. Suppose your scenario actually happened. The way you describe eviction makes it seem like active memory that's evicted is a fatal error. In reality, when a page is evicted either it is a file-backed page that is written back to disk or an anonymous page that is written to swap space (assuming you have swap space). Thus, barring any power failures, memory won't be lost during eviction. Additionally any cache entries that hold data corresponding to this page will be invalidated. So when the CPU tries to read/write to a page, it will see the cache entries as being invalid raising a page fault. As with any other page fault, the page would be brought back in to memory and up through the cache/memory hierarchy. (A good explanation of CPU's behavior during a page fault)

3) Page locking. If you truly needed to lock pages, Linux has a way to do that. From within the Linux kernel, there are page locking functions (like lock_page()) that are used during I/O operations or page/page_table operations. There is also the mlock() family of functions that can be called from user space to lock specific memory into RAM. Notice that mlock doesn't guarantee that the memory stays in the same physical address, just that it stays in RAM (guaranteeing at worst a soft page fault).

Are .text pages swapped-out?

All pages in the end are considered for being swapped out. In Linux it starts by swapping out freeing cache pages followed by clean non-recently used pages (which just requires unmapping rather than a write to the swap device). After this it will try to flush dirty file backed pages in memory to their respective backing device before finally reaching the point where it must starts swapping anonymously backed process pages (includes stack, data that can be edited, heap, etc....). Any non-kernel page is always a candidate for being swapped out it just depends on the memory pressure on the system.

Pages that already have a backing store are simply unmapped or if they are dirty are flushed to their backing store. They are not written to swap for obvious reasons.

How does the OS update the appropriate page table when it evicts a victim page?

Actually, the thing you are asking about called reverse mapping. For example, in Linux, you can find usefull function try_to_unmap_anon Inside page descriptor there is a field called mapping. This field is anon_vma for annonymous pages. As you can see this is not just ordinary struct, but also a list entry. There might be several anon_vmas for one page (see try_to_unmap_anon):

list_for_each_entry(vma, &anon_vma->head, anon_vma_node)

exactly one per page mapping. All this vmas linked into list. That is how kernel knows which processes (and their page tables) are in play.
Now about how kernel determines the virtual address ... again the answer could be found here: vma_address

233         pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
234 unsigned long address;
235
236 address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);

So we can answer your question shortly now: in order not to do the page tables scan, kernel stores everything it needs (for quick fetch) in page descriptor (struct page).

Read file without evicting from OS page cache

Using posix_fadvise you can hint the OS that it should drop certain file blocks from the cache. Together with information from mincore that tells us which blocks are currently cached we can alter applications to work without disturbing the buffer cache.

This delightful workaround for [un]implemented kernel features is described in detail:

http://insights.oetiker.ch/linux/fadvise/

[Edit] Implications of kernel read-ahead

For full read performance, you should make sure to only drop the pages you've already read. Otherwise you'll drop the pages that the kernel helpfully reads in advance :). (I think this should be detected as a readahead mis-predict, which would disable it and at least avoid lots of wasted IO. But read-ahead is seriously helpful, so you want to avoid disabling it).

Also, I bet if you test the pages just ahead of your last read then they always show as in-core. It won't tell you whether anyone else was using them or not. All it will show is that kernel read-ahead is working :).

The code in the linked rsync patch shold be fine (ignoring the "array of all the fds" hack). It tests the whole file before the first read. That's reasonable because it only requires an in-core allocation of 1 byte per 4kB file page.

Can I tell Linux not to swap out a particular processes' memory?

You can do this via the mlockall(2) system call under Linux; this will work for the whole process, but do read about the argument you need to pass.

Do you really need to pull the whole thing in-core? If it's a java app, you would presumably lock the whole JVM in-core. I don't know of a command-line method for doing this, but you could write a trivial program to call fork, call mlockall, then exec.

You might also look to see if one of the access pattern notifications in madvise(2) meets your needs. Advising the VM subsystem about a better paging strategy might work out better if it's applicable for you.

Note that a long time ago now under SunOS, there was a mechanism similar to madvise called vadvise(2).



Related Topics



Leave a reply



Submit