Mmap: Will the Mapped File Be Loaded into Memory Immediately

mmap: will the mapped file be loaded into memory immediately?

No, yes, maybe. It depends.

Calling mmap generally only means that to your application, the mapped file's contents are mapped to its address space as if the file was loaded there. Or, as if the file really existed in memory, as if they were one and the same (which includes changes being written back to disk, assuming you have write access).

No more, no less. It has no notion of loading something, nor does the application know what this means.

An application does not truly have knowledge of any such thing as memory, although the virtual memory system makes it appear like that. The memory that an application can "see" (and access) may or may not correspond to actual physical memory, and this can in principle change at any time, without prior warning, and without an obvious reason (obvious to your application).

Other than possibly experiencing a small delay due to a page fault, an application is (in principle) entirely unaware of any such thing happening and has little or no control over it1.

Applications will, generally, load pages from mapped files (including the main executable!) on demand, as a consequence of encountering a fault. However, an operating system will usually try to speculatively prefetch data to optimize performance.

In practice, calling mmap will immediately begin to (asynchronously) prefetch pages from the beginning of the mapping, up to a certain implementation-specified size. Which means, in principle, for small files the answer would be "yes", and for larger files it would be "no".

However, mmap does not block to wait for completion of the readahead, which means that you have no guarantee that any of the file is in RAM immediately after mmap returns (not that you have that guarantee at any time anyway!). Insofar, the answer is "maybe".

Under Linux, last time I looked, the default prefetch size was 31 blocks (~127k) -- but this may have changed, plus it's a tuneable parameter. As pages near or at the end of the prefetched area are touched, more pages are being prefetched asynchronously.

If you have hinted MADV_RANDOM to madvise, prefetching is "less likely to happen", under Linux this completely disables prefetch.

On the other hand, giving the MADV_SEQUENTIAL hint will asynchronously prefetch "more aggressively" beginning from the beginning of the mapping (and may discard accessed pages quicker). Under Linux, "more aggressively" means twice the normal amount.

Giving the MADV_WILLNEED hint suggests (but does not guarantee) that all pages in the given range are loaded as soon as possible (since you're saying you're going to access them). The OS may ignore this, but under Linux, it is treated rather as an order than a hint, up to the process' maximum RSS limit, and an implementation-specified limit (if I remember correctly, 1/2 the amount of physical RAM).

Note that MADV_DONTNEED is arguably implemented wrongly under Linux. The hint is not interpreted in the way specified by POSIX, i.e. you're OK with pages being paged out for the moment, but rather that you mean to discard them. Which makes no big difference for readonly mapped pages (other than a small delay, which you said would be OK), but it sure does matter for everything else.

In particular, using MADV_DONTNEED thinking Linux will release unneeded pages after the OS has written them lazily to disk is not how things work! You must explicitly sync, or prepare for a surprise.

Having called readahead on the file descriptor prior to calling mmap (or alternatively, having had read/written the file previously), the file's contents will in practice indeed be in RAM immediately.

This is, however, only an implementation detail (unified virtual memory system), and subject to memory pressure on the system.

Calling mlock will -- assuming it succeeds2 -- immediately load the requested pages into RAM. It blocks until all pages are physically present, and you have the guarantee that the pages will stay in RAM until you unlock them.



1 There exist functionality to query (mincore) whether any or all of the pages in a particular range are actually present at the very moment, and functionality to hint the OS about what you would like to see happening without any hard guarantees (madvise), and finally functionality to force a limited subset of pages to be present in memory (mlock) for privilegued processes.

2 It might not, both for lack of privilegues and for exceeding quotas or the amount of physical RAM present.

Why does the memory mapped file ever need to be flushed when access is RDWR?

The program maps a region of memory using mmap. It then modifies the mapped region. The system isn't required to write those modifications back to the underlying file immediately, so a read call on that file (in ioutil.ReadAll) could return the prior contents of the file.

The system will write the changes to the file at some point after you make the changes.
It is allowed to write the changes to the file any time after the changes are made, but by default makes no guarantees about when it writes those changes. All you know is that (unless the system crashes), the changes will be written at some point in the future.

If you need to guarantee that the changes have been written to the file at some point in time, then you must call msync.

The mmap.Flush function calls msync with the MS_SYNC flag. When that system call returns, the system has written the modifications to the underlying file, so that any subsequent call to read will read the modified file.

The COPY option sets the mapping to MAP_PRIVATE, so your changes will never be written back to the file, even if you using msync (via the Flush function).

Read the POSIX documentation about mmap and msync for full details.

Memory mapping and file I/O

It depends on the platform, but in general it'll be treated like other memory (swapped out when not in use, swapped in when required), except that instead of using the normal swap files/partitions it swaps from the original file on disk.

mmap and memory usage

Mmap can help you in some ways, I'll explain with some hypothetical examples:

First thing: Let's say you're running out of memory, and your application that have a 100MB chunk of malloc'ed memory get 50% of it swapped out, that means that the OS had to write 50MB to the swapfile, and if you need to read it back, you have written, occupied and then read it back again 50MB of your swapfile.

In case the memory was just mmap'ed, the operating system will not write that piece of information to the swapfile (as it knows that that data is identical to the file itself), instead, it will just scratch 50MB of information (again: supposing you have not written anything for now) and that's that. If you ever need that memory to be read again, the OS will fetch the contents not from the swapfile, but from the original file you've mmaped, so if any other program needs 50MB of swap, they're available. Also there is not overhead with swapfile manipulation at all.

Let's say you read a 100MB chunk of data, and according to the initial 1MB of header data, the information that you want is located at offset 75MB, so you don't need anything between 1~74.9MB! You have read it for nothing but to make your code simpler. With mmap, you will only read the data you have actually accessed (rounded 4kb, or the OS page size, which is mostly 4kb), so it would only read the first and the 75th MB. I think it's very hard to make a simpler and more effective way to avoid disk reading than mmaping files.
And if by some reason you need the data at offset 37MB, you can just use it! You don't have to mmap it again, as the whole file is accessible in memory (of course limited by your process' memory space).

All files mmap'ed are backed up by themselves, not by the swapfile, the swapfile is made to grant data that doesn't have a file to back up, which usually is data malloc'ed or data that is backed up by a file, but it was altered and [can not/shall not] be written back to it before the program actually tells the OS to do so via a msync call.

Beware that you don't need to map the whole file in the memory, you can map any amount (2nd arg is "size_t length") starting from any place (6th arg - "off_t offset"), but unless your file is likely to be enormous, you can safely map 1GB of data with no fear, even if the system only packs 64mb of physical memory, but that's for reading, if you plan on writing then you should be more conservative and map only the stuff that you need.

Mapping files will help you making your code simpler (you already have the file contents on the memory, ready to use, with much less memory overhead since it's not anonymous memory) and faster (you will only read the data that your program accessed).

mmap a 10 GB file and load it into memory

Read the man-page for mmap:

MAP_POPULATE (since Linux 2.5.46)


Populate (prefault) page tables for a mapping. For a file
mapping, this causes read-ahead on the file. Later accesses
to the mapping will not be blocked by page faults.
MAP_POPULATE is supported for private mappings only since
Linux 2.6.23

Issue your request, and be prepared for a short wait (unless you exceed the processes limits) (depending on disk-bandwidth and cache).

How does mmap work when 2 programs map the same file


So, what happens when 2 programs map the same file? Are there 2 entries in the page table, one for each program?

In modern operating systems, each process has its own page table for its memory, that may point to pages of physical memory shared with other user and kernel processes.

With MAP_SHARED, this mapping is shared: updates to the mapping are visible to other processes that map this file, and are carried through to the underlying file. The file may not actually be updated until msync(2) or munmap() is called.

This seems very interesting, but there are numerous caveats:

  • The actual pages mmapped by both processes for the same file may reside at the same address or at a different address in each process, storing pointers into this shared memory may not allow the other process to use them as they might point to inconsistent addresses.

  • The implementation may use the same physical memory pages for both mappings or not: for subtile reasons (cache strategies, out of sync reading...), even if it is the same physical memory, modifications done by one process to its memory may not be immediately reflected in the memory of the other process.

So the modification may or may not be visible to the other processes mmapping the file nor reading it via read or the FILE* stream API.

If one of the processes calls msync(), the modifications should be visible in all maps and for all yet unread portions of the file, bearing in mind that the FILE* streaming APIs may have buffered some data in internal unshared buffers: modifications in this area will not be reflected.

Conclusion: it is risky and unreliable to use these mechanisms to implement inter process communication. The behavior may depend on system specific characteristics such as the OS strategies, the CPU and cache architectures, the type of RAM in use, the clock speed, and who knows what else. It is safer to rely on proven APIs that may indeed be implemented using mmapped memory, but only if it is know to provide the correct semantics.



Related Topics



Leave a reply



Submit