Increasing a File's Size Using Mmap

Increasing a file's size using mmap

On POSIX systems at least, mmap() cannot be used to increase (or decrease) the size of a file. mmap()'s function is to memory map a portion of a file. It's logical that the thing you request to map should actually exist! Frankly, I'm really surprised that you would actually be able to do this under MS Windows.

If you want to grow a file, just ftruncate() it before you mmap() it.

Fast resize of a mmap file

It is hard for me to imagine a case where you don't know the upper bound on how large the file can be. Assuming that's true, you could "reserve" the address space for the maximum size of the file by providing that size when the file is first mapped in with mmap(). Of course, any accesses beyond the actual size of the file will cause an access violation, but that's how you want it to work anyway -- you could argue that reserving the extra address space ensures the access violation rather than leaving that address range open to being used by other calls to things like mmap() or malloc().

Anyway, the point is with my solution, you never move the address range, you only change its size and now your locking is around the data structure that provides the current valid size to each thread.

My solution doesn't work if you have so many files that the maximum mapping for each file runs you out of address space, but this is the age of the 64-bit address space so hopefully your maximum mapping size is no problem.

(Just to make sure I wasn't forgetting something stupid, I did write a small program to convince myself creating the larger-than-file-size mapping gives an access violation when you try to access beyond the file size, and then works fine once you ftruncate() the file to be larger, all with the same address returned from the first mmap() call.)

Can I mmap a file with length greater than the size of the file?

Can I pass a length greater than the size of file fd to mmap? After doing so, can I write and read the memory that exceeds the size of the file but is within length?

This is all documented in the mmap POSIX specification:

The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its end.
References within the address range starting at pa and continuing for
len bytes to whole pages following the end of an object shall result
in delivery of a SIGBUS signal.

  1. Yes, you can mmap length that is greater than the size of file, and
  2. Access to any pages beyond the end of the file, except the last (possibly partial) page will result in SIGBUS.

mmap file with larger fixed length with zero padding?

This is one of the few reasonable use cases for MAP_FIXED, to remap part of an existing mapping to use a new backing file.

A simple solution here is to unconditionally mmap 64 MB of anonymous memory (or explicitly mmap /dev/zero), without MAP_FIXED and store the resulting pointer.

Next, mmap 64 MB or your actual file size (whichever is less) of your actual file, passing in the result of the anonymous/zero mmap and passing the MAP_FIXED flag. The pages corresponding to your file will no longer be anonymous/zero mapped, and instead will be backed by your file's data; the remaining pages will be backed by the anonymous/zero pages.

When you're done, a single munmap call will unmap all 64 MB at once (you don't need to separately unmap the real file pages and the zero backed pages).

Extremely simple example (no error checking, please add it yourself):

// Reserve 64 MB of contiguous addresses; anonymous mappings are always zero backed
void *mapping = mmap(NULL, 64 * 1024 * 1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

// Open file and check size
struct stat sb;
int fd = open(myfilename, O_RDONLY);
fstat(fd, &sb);
// Use smaller of file size or 64 MB
size_t filemapsize = sb.st_size > 64 * 1024 * 1024 ? 64 * 1024 * 1024 : sb.st_size;
// Remap up to 64 MB of pages, replacing some or all of original anonymous pages
mapping = mmap(mapping, filemapsize, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0);
close(fd);

// ... do stuff with mapping ...
munmap(mapping, 64 * 1024 * 1024);

Why mmap a file result in using more memory than file's size?

One thing is certain: the results depend on the state of the system and not only on the running application. On my machine, the increase in RES was 136 kB the first two times I run the program, but the subsequent runs didn't involve any increase at all - probably the OS already had the whole file in cache. Interestingly, the values themeselves differed significantly between runs. In the first run the jump in RES was from 344 to 480 kB, but the latter runs had a RES value of 348 kB all the time. There was a similar change in SHR: a jump of 136 kB first time and no change later.

I was able to force the original case (with the 136 kB jump) at will by overwriting the file which is later mapped with with zeros using dd before running the app.

I looked at pmaps output but it was exactly the same in both cases and didn't change after the call to mmap().

I can't reproduce the oversized RES jump here, but here's what you can do. Suppose your binary is compiled as a.out. Insert a 10 second sleep right after the mmap() and another 10 second sleep just before munmap(). This gives a time window to dump interesting information. We will read from /proc which exactly files are resident in memory. In order to do this, open up two tabs in your terminal, in one run

./a.out

and then immediately in the other tab:

for ((i=0;i<4;i++)); do cat /proc/$(ps -fe | egrep '[a]\.out' | awk '{print $2}')/smaps > smaps.$i; sleep 5; done

This will create 4 snapshots of the program's map states in four separate files. The difference between one of the consecutively numbered snapshot should show what changes during the surge in RES size. On my machine during a sample run, the difference was between snapshots 1 and 2, and the change was [note I changed the name of mapped file but it's not important here]:

user@machine:~$ diff -u smaps.{1,2}
--- smaps.1 2012-04-19 00:01:46.000000000 +0200
+++ smaps.2 2012-04-19 00:01:51.000000000 +0200
@@ -84,13 +84,13 @@
MMUPageSize: 4 kB
b782f000-b7851000 r--p 00000000 08:05 429102 /tmp/tempfile
Size: 136 kB
-Rss: 0 kB
-Pss: 0 kB
+Rss: 136 kB
+Pss: 136 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
-Private_Clean: 0 kB
+Private_Clean: 136 kB
Private_Dirty: 0 kB
-Referenced: 0 kB
+Referenced: 136 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB

What happens is exactly what should: the mapped file is initially not resident at all and 136 kB are resident later on.

On your system, the diff should lead you to the source of the additional change in RES - you should be able to find out the name of the other file(s) whose Rss value changes. Some entries are not files, but other memory areas, for example you may find markers such as [heap] and [stack]. This should also prove or disprove nos' suggestion about system libraries being loaded and stack usage growing.

How to increase the size of memory region allocated with mmap()

On Linux, use the mremap(2) Linux-specific system call without MREMAP_MAYMOVE to extend the existing mapping, without considering the option of remapping those physical pages to a different virtual address where there's enough room for the larger mapping.

It will return an error if some other mapping already exists for the pages you want to grow into. (Unlike mmap(MAP_FIXED) which will silently replace those mappings.)

If you're writing in asm, portability to non-Linux is barely relevant; other OSes will have different call numbers and maybe ABIs, so just look up __NR_mremap in asm/unistd.h, and get the flags patterns from sys/mman.h.


With just portable POSIX calls, mmap() with a non-NULL hint address = right after you existing mapping, but without MAP_FIXED; it will pick that address if the pages are free (and as @datenwolf says, merge with the earlier mapping into one long extent). Otherwise it will pick somewhere else. (Then you have to munmap that mapping that ended up not where you wanted it.)

There is a Linux-specific mmap option: MAP_FIXED_NOREPLACE will return an error instead of mapping at an address different from the hint. Kernels older than 4.17 don't know about that flag and will typically treat it as if you used no other flags besides MAP_ANONYMOUS, so you should check the return value against the hint.

Do not use MAP_FIXED_NOREPLACE | MAP_FIXED; that would act as MAP_FIXED on old kernels, and maybe also on new kernels that do know about MAP_FIXED_NOREPLACE.

Assuming you know the start of the mapping you want to extend, and the desired new total size, mremap is a better choice than mmap(MAP_FIXED_NOREPLACE). It's been supported since at least Linux 2.4, i.e. decades, and keeps the existing mapping flags and permissions automatically (e.g. MAP_PRIVATE, PROT_READ|PROT_WRITE)

If you only knew the end address of the existing mapping, mmap(MAP_FIXED_NOREPLACE) might be a good choice.

Increase the size of the memory mapped file

The documentation says two things.

Firstly (in the "Remarks" section),

If an application specifies a size for the file mapping object that is larger than the size of the actual named file on disk and if the page protection allows write access (that is, the flProtect parameter specifies PAGE_READWRITE or PAGE_EXECUTE_READWRITE), then the file on disk is increased to match the specified size of the file mapping object. If the file is extended, the contents of the file between the old end of the file and the new end of the file are not guaranteed to be zero; the behavior is defined by the file system.

This basically means that your file on disk gets resized when you map it to a memory region larger than the file with the call to CreateFileMapping(), and fills it up with unspecified stuff.

Secondly (in the "Return Value" section),

If the object exists before the function call, the function returns a handle to the existing object (with its current size, not the specified size), and GetLastError returns ERROR_ALREADY_EXISTS.

To me, this means your call to resize_file() will have no effect if your file is already mapped. You have to unmap it, call resize_file(), and then remap it, which may or may not be what you want.

How do you Mmap() a file bigger than 2GB in Go?

Look in http://golang.org/src/pkg/syscall/syscall_unix.go at the Mmap method on mmapper. You should be able to copy that code and adapt it as required.

Of course you won't be able to mmap to a []byte, since slice lengths are defined to be "int" (which is 32-bit everywhere at the moment). You could mmap to a larger element type (e.g. []int32), or just muck with the pointer to the memory, but it won't be a drop-in replacement to syscall.Mmap.



Related Topics



Leave a reply



Submit