How to Provide Extend-On-Write Functionality for Memory Mapped Files in Linux

How to provide extend-on-write functionality for memory mapped files in Linux?

This is very similar to a homework I once did. Basically I had a list of "pages" and a list of "frames", with associated information. Using SIGSEGV I would catch faults and alter the memory protection bits as necessary. I'll include parts that you may find useful.

Create mapping. Initially it has no permissions.

int w_create_mapping(size_t size, void **addr)
{

    *addr = mmap(NULL,
            size * w_get_page_size(),
            PROT_NONE,
            MAP_ANONYMOUS | MAP_PRIVATE,
            -1,
            0
    );

    if (*addr == MAP_FAILED) {
        perror("mmap");
        return FALSE;
    }

    return TRUE;
}

Install signal handler

int w_set_exception_handler(w_exception_handler_t handler)
{
    static struct sigaction sa;
    sa.sa_sigaction = handler;
    sigemptyset(&sa.sa_mask);
    sigaddset(&sa.sa_mask, SIGSEGV);
    sa.sa_flags = SA_SIGINFO;

    if (sigaction(SIGSEGV, &sa, &previous_action) < 0)
        return FALSE;

    return TRUE;
}

Exception handler

static void fault_handler(int signum, siginfo_t *info, void *context)
{
    void *address;      /* the address that faulted */

    /* Memory location which caused fault */
    address = info->si_addr;

    if (FALSE == page_fault(address)) {
        _exit(1);
    }
}

Increasing protection

int w_protect_mapping(void *addr, size_t num_pages, w_prot_t protection)
{
    int prot;

    switch (protection) {
    case PROTECTION_NONE:
        prot = PROT_NONE;
        break;
    case PROTECTION_READ:
        prot = PROT_READ;
        break;
    case PROTECTION_WRITE:
        prot = PROT_READ | PROT_WRITE;
        break;
    }

    if (mprotect(addr, num_pages * w_get_page_size(), prot) < 0)
        return FALSE;

    return TRUE;
}

I can't publicly make it all available since the team is likely to use that same homework again.

memory map file with growing size

After some experimentations, I found a way to make it work.

First mmap the file with PROT_NONE and a large enough size. For 64-bit systems, it can be as large 1L << 46 (64TB). This does NOT consume physical memory* (at least on Linux). It will consume address space (virtual memory) for this process.

void* ptr = mmap(NULL, (1L << 40), PROT_NONE, MAP_SHARED, fd, 0);

Then, give read (and/or write) permission to the part of memory within file length using mprotect. Note that size need to be aligned with page size (which can be obtained by sysconf(_SC_PAGESIZE), usually 4096).

mprotect(ptr, aligned_size, PROT_READ | PROT_WRITE);

However, if file size is not page-size aligned, reading the portion within mapped region (with PROT_READ permission) but beyond file length will trigger a bus error, as documented on mmap manual.

Then you can use either file descriptor fd or the mapped memory to read and write file. Remember to use fsync or msync to persist the data after writing to it. The memory-mapped page with PROT_READ permission should get the latest file content (if you write to it)**. The newly mapped page with mprotect will also get the newly updated page.

Depending on the application, you might want to use ftruncate to make the file size aligned to system page size for the best performance. You might also want to use madvise with MADV_SEQUENTIAL to improve performance when reading those pages.

*This behavior is not mentioned on the manual of mmap. However, since PROT_NONE implies those pages are not accessible in anyway, it's trivial for any OS implementation to not allocating any physical memory to it at all.

**This behavior of memory region mapped before a file write getting updated after the write is completed (fsync or msync) is also not mentioned on the manual (or at least I did not see it). But it seems to be the case at least on recent Linux kernels (4.x onward).

How to dynamically expand a Memory Mapped File

Once you map a file in memory, you cannot increase its size. This is a known limitation of memory mapped files.

...you must calculate or estimate the size of the finished file because file mapping objects are static in size; once created, their size cannot be increased or decreased.

One strategy would be to use chunks stored in non-persisted memory mapped files of a given size, say 1GB or 2GB. You would manage these through a top level ViewAccessor of your own design (probably doing basic passthru of the methods you need from the MemoryMappedViewAccessor).

Edit: or you could just create a non-persisted memory mapped file of a maximal size you expect to use (say 8GB to start, with a parameter to tune it on start-up of your application) and retrieve MemoryMappedViewAccessor's per logical chunk. The non-persisted file will not use physical resources until each view is requested.

memory mapped files

If you want your changes to be reflected in the on-disk file, you must map the file as MAP_SHARED, not MAP_PRIVATE.

Additionally, you cannot extend the file simply by writing beyond the end of the mapping. You must use ftruncate() to extend the file to the new size, then change the mapping to include the new portion of the file. The portable way to change the mapping is to unmap the mapping then recreate it with the new size; on Linux you can instead use mremap().

Your len and len_file variables should be of type size_t, and you should use memcpy() rather than strcat(), since you know exactly the length of the string, exactly where you want to copy it, and you don't want to copy the null-terminator.

The following modification of your code works on Linux (using mremap()) :

#define _GNU_SOURCE
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include<sys/mman.h>
#include<fcntl.h>
#define FILEMODE S_IRWXU | S_IRGRP | S_IROTH
#define MAX 150

int main(int argc,char *argv[])
{
    int fd, ret;
    size_t len_file, len;
    struct stat st;
    char *addr;
    char buf[MAX];

    if (argc < 2)
    {
        printf("Usage a.out <filename>\n");
        return EXIT_FAILURE;
    }

    if ((fd = open(argv[1],O_RDWR | O_CREAT, FILEMODE)) < 0)
    {
        perror("Error in file opening");
        return EXIT_FAILURE;
    }

    if ((ret = fstat(fd,&st)) < 0)
    {
        perror("Error in fstat");
        return EXIT_FAILURE;
    }

    len_file = st.st_size;

    /*len_file having the total length of the file(fd).*/

    if ((addr = mmap(NULL,len_file,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0)) == MAP_FAILED)
    {
        perror("Error in mmap");
        return EXIT_FAILURE;
    }

    while ((fgets(buf,MAX,stdin)) != NULL)
    {
        len = len_file;
        len_file += strlen(buf);
        if (ftruncate(fd, len_file) != 0)
        {
            perror("Error extending file");
            return EXIT_FAILURE;
        }
        if ((addr = mremap(addr, len, len_file, MREMAP_MAYMOVE)) == MAP_FAILED)
        {
            perror("Error extending mapping");
            return EXIT_FAILURE;
        }
        memcpy(addr+len, buf, len_file - len);
        printf( "Val:%s\n",addr ) ; //Checking purpose
    }
    if((msync(addr,len,MS_SYNC)) < 0)
        perror("Error in msync");

    if (munmap(addr,len) == -1)
        perror("Error in munmap");

    if (close(fd))
        perror("Error in close");

    return 0;
}

appending to a memory-mapped file

Boost.IOStreams has fixed-size only memory mapped files, so it won't help with your specific problem. Linux has an interface mremap which works as follows:

void *new_mapping = mremap(mapping, size, size + GROWTH, MREMAP_MAYMOVE);
if (new_mapping == MAP_FAILED)
    // handle error
mapping = new_mapping;

This is non-portable, however (and poorly documented). Mac OS X seems not to have mremap.

In any case, you don't need to reopen the file, just munmap it and mmap it again:

void *append(int fd, char const *data, size_t nbytes, void *map, size_t &len)
{
    // TODO: check for errors here!
    ssize_t written = write(fd, data, nbytes);
    munmap(map, len);
    len += written;
    return mmap(NULL, len, PROT_READ, 0, fd, 0);
}

A pre-allocation scheme may be very useful here. Be sure to keep track of the file's actual length and truncate it once more before closing.

Is boost memory mapped file zeroed on Linux

A memory mapped file contains whatever was in the file.

If it is a new file, it has been extended to the right size and the extension will contain zeros. Extending a file is usually done with the ftruncate function.

The ftruncate manpage says:

   If  the  file  previously  was larger than this size, the extra data is
   lost.  If the file previously was shorter,  it  is  extended,  and  the
   extended part reads as null bytes ('\0').

So yes, zeros are guaranteed.

Increasing a file's size using mmap

On POSIX systems at least, mmap() cannot be used to increase (or decrease) the size of a file. mmap()'s function is to memory map a portion of a file. It's logical that the thing you request to map should actually exist! Frankly, I'm really surprised that you would actually be able to do this under MS Windows.

If you want to grow a file, just ftruncate() it before you mmap() it.

mmap for writing sequential log file for speed?

I wrote my bachelor thesis about the comparism of fwrite VS mmap ("An Experiment to Measure the Performance Trade-off between Traditional I/O and Memory-mapped Files"). First of all, for writing, you don't have to go for memory-mapped files, espacially for large files. fwrite is totally fine and will nearly always outperform approaches using mmap. mmap will give you the most performance boosts for parallel data reading; for sequential data writing your real limitation with fwrite is your hardware.

In my examples remapSize is the initial size of the file and the size by which the file gets increased on each remapping.
fileSize keeps track of the size of the file, mappedSpace represents the size of the current mmap (it's length), alreadyWrittenBytes are the bytes that have already been written to the file.

Here is the example initalization:

void init() {
  fileDescriptor = open(outputPath, O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600); // Open file
  result = ftruncate(fileDescriptor, remapSize); // Init size
  fsync(fileDescriptor); // Flush
  memoryMappedFile = (char*) mmap64(0, remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create mmap
  fileSize = remapSize; // Store mapped size
  mappedSpace = remapSize; // Store mapped size
}

Ad Q1:

I used an "Unmap-Remap"-mechanism.

Unmap

first flushes (msync)
and then unmaps the memory-mapped file.

This could look the following:

void unmap() {
  msync(memoryMappedFile, mappedSpace, MS_SYNC); // Flush
  munmap(memoryMappedFile, mappedSpace)
}

For Remap, you have the choice to remap the whole file or only the newly appended part.

Remap basically

increases the file size
creates the new memory map

Example implementation for a full remap:

void fullRemap() {
  ftruncate(fileDescriptor, mappedSpace + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  memoryMappedFile = (char*) mmap64(0, mappedSpace + remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create new mapping on the bigger file
  fileSize += reampSize;
  mappedSpace += remapSize; // Set mappedSpace to new size
}

Example implementation for the small remap:

void smallRemap() {
  ftruncate(fileDescriptor, fileSize + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  remapAt = alreadyWrittenBytes % pageSize == 0 
            ? alreadyWrittenBytes 
            : alreadyWrittenBytes - (alreadyWrittenBytes % pageSize); // Adjust remap location to pagesize
  memoryMappedFile = (char*) mmap64(0, fileSize + remapSize - remapAt, PROT_WRITE, MAP_SHARED, fileDescriptor, remapAt); // Create memory-map
  fileSize += remapSize;
  mappedSpace = fileSize - remapAt;
}

There is a mremap function out there, yet it states

This call is Linux-specific, and should not be used in programs
intended to be portable.

Ad Q2:

I'm not sure if I understood that point right. If you want to tell the kernel "and now load the next page", then no, this is not possible (at least to my knowledge). But see Ad Q3 on how to advise the kernel.

Ad Q3:

You can use madvise with the flag MADV_SEQUENTIAL, yet keep in mind that this does not enforce the kernel to read ahead, but only advices it.

Excerp form the man:

This may cause the kernel to aggressively read-ahead

Personal conclusion:

Do not use mmap for sequential data writing. It will just cause much more overhead and will lead to much more "unnatural" code than a simple writing alogrithm using fwrite.

Use mmap for random access reads to large files.

This are also the results that were obtained during my thesis. I was not able to achieve any speedup by using mmap for sequential writing, in fact, it was always slower for this purpose.

Mapping file into memory and writing beyong end of file

The definitive reference in these matters is POSIX, which in its rationale section for mmap has to say:

The mmap() function can be used to map a region of memory that is
larger than the current size of the object. [... snip discussion on
sending SIGBUS if possible, i.e. when accessing a page beyond the end
of the file ...] written data may be lost and read data may not
reflect actual data in the object.

So, POSIX says that doing this can result in lost data. Also, the portability is questionable at best (think about no-MMU systems, the interaction with hugepages, platforms with different pagesizes...)

How to Provide Extend-On-Write Functionality for Memory Mapped Files in Linux