How to provide extend-on-write functionality for memory mapped files in Linux?
This is very similar to a homework I once did. Basically I had a list of "pages" and a list of "frames", with associated information. Using SIGSEGV
I would catch faults and alter the memory protection bits as necessary. I'll include parts that you may find useful.
Create mapping. Initially it has no permissions.
int w_create_mapping(size_t size, void **addr)
{
*addr = mmap(NULL,
size * w_get_page_size(),
PROT_NONE,
MAP_ANONYMOUS | MAP_PRIVATE,
-1,
0
);
if (*addr == MAP_FAILED) {
perror("mmap");
return FALSE;
}
return TRUE;
}
Install signal handler
int w_set_exception_handler(w_exception_handler_t handler)
{
static struct sigaction sa;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask, SIGSEGV);
sa.sa_flags = SA_SIGINFO;
if (sigaction(SIGSEGV, &sa, &previous_action) < 0)
return FALSE;
return TRUE;
}
Exception handler
static void fault_handler(int signum, siginfo_t *info, void *context)
{
void *address; /* the address that faulted */
/* Memory location which caused fault */
address = info->si_addr;
if (FALSE == page_fault(address)) {
_exit(1);
}
}
Increasing protection
int w_protect_mapping(void *addr, size_t num_pages, w_prot_t protection)
{
int prot;
switch (protection) {
case PROTECTION_NONE:
prot = PROT_NONE;
break;
case PROTECTION_READ:
prot = PROT_READ;
break;
case PROTECTION_WRITE:
prot = PROT_READ | PROT_WRITE;
break;
}
if (mprotect(addr, num_pages * w_get_page_size(), prot) < 0)
return FALSE;
return TRUE;
}
I can't publicly make it all available since the team is likely to use that same homework again.
memory map file with growing size
After some experimentations, I found a way to make it work.
First mmap
the file with PROT_NONE
and a large enough size. For 64-bit systems, it can be as large 1L << 46
(64TB). This does NOT consume physical memory* (at least on Linux). It will consume address space (virtual memory) for this process.
void* ptr = mmap(NULL, (1L << 40), PROT_NONE, MAP_SHARED, fd, 0);
Then, give read (and/or write) permission to the part of memory within file length using mprotect
. Note that size need to be aligned with page size (which can be obtained by sysconf(_SC_PAGESIZE)
, usually 4096).
mprotect(ptr, aligned_size, PROT_READ | PROT_WRITE);
However, if file size is not page-size aligned, reading the portion within mapped region (with PROT_READ
permission) but beyond file length will trigger a bus error, as documented on mmap
manual.
Then you can use either file descriptor fd
or the mapped memory to read and write file. Remember to use fsync
or msync
to persist the data after writing to it. The memory-mapped page with PROT_READ
permission should get the latest file content (if you write to it)**. The newly mapped page with mprotect
will also get the newly updated page.
Depending on the application, you might want to use ftruncate
to make the file size aligned to system page size for the best performance. You might also want to use madvise
with MADV_SEQUENTIAL
to improve performance when reading those pages.
*This behavior is not mentioned on the manual of mmap
. However, since PROT_NONE
implies those pages are not accessible in anyway, it's trivial for any OS implementation to not allocating any physical memory to it at all.
**This behavior of memory region mapped before a file write getting updated after the write is completed (fsync
or msync
) is also not mentioned on the manual (or at least I did not see it). But it seems to be the case at least on recent Linux kernels (4.x onward).
How to dynamically expand a Memory Mapped File
Once you map a file in memory, you cannot increase its size. This is a known limitation of memory mapped files.
...you must calculate or estimate the size of the finished file because file mapping objects are static in size; once created, their size cannot be increased or decreased.
One strategy would be to use chunks stored in non-persisted memory mapped files of a given size, say 1GB or 2GB. You would manage these through a top level ViewAccessor
of your own design (probably doing basic passthru of the methods you need from the MemoryMappedViewAccessor
).
Edit: or you could just create a non-persisted memory mapped file of a maximal size you expect to use (say 8GB to start, with a parameter to tune it on start-up of your application) and retrieve MemoryMappedViewAccessor
's per logical chunk. The non-persisted file will not use physical resources until each view is requested.
memory mapped files
If you want your changes to be reflected in the on-disk file, you must map the file as MAP_SHARED
, not MAP_PRIVATE
.
Additionally, you cannot extend the file simply by writing beyond the end of the mapping. You must use ftruncate()
to extend the file to the new size, then change the mapping to include the new portion of the file. The portable way to change the mapping is to unmap the mapping then recreate it with the new size; on Linux you can instead use mremap()
.
Your len
and len_file
variables should be of type size_t
, and you should use memcpy()
rather than strcat()
, since you know exactly the length of the string, exactly where you want to copy it, and you don't want to copy the null-terminator.
The following modification of your code works on Linux (using mremap()
) :
#define _GNU_SOURCE
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include<sys/mman.h>
#include<fcntl.h>
#define FILEMODE S_IRWXU | S_IRGRP | S_IROTH
#define MAX 150
int main(int argc,char *argv[])
{
int fd, ret;
size_t len_file, len;
struct stat st;
char *addr;
char buf[MAX];
if (argc < 2)
{
printf("Usage a.out <filename>\n");
return EXIT_FAILURE;
}
if ((fd = open(argv[1],O_RDWR | O_CREAT, FILEMODE)) < 0)
{
perror("Error in file opening");
return EXIT_FAILURE;
}
if ((ret = fstat(fd,&st)) < 0)
{
perror("Error in fstat");
return EXIT_FAILURE;
}
len_file = st.st_size;
/*len_file having the total length of the file(fd).*/
if ((addr = mmap(NULL,len_file,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0)) == MAP_FAILED)
{
perror("Error in mmap");
return EXIT_FAILURE;
}
while ((fgets(buf,MAX,stdin)) != NULL)
{
len = len_file;
len_file += strlen(buf);
if (ftruncate(fd, len_file) != 0)
{
perror("Error extending file");
return EXIT_FAILURE;
}
if ((addr = mremap(addr, len, len_file, MREMAP_MAYMOVE)) == MAP_FAILED)
{
perror("Error extending mapping");
return EXIT_FAILURE;
}
memcpy(addr+len, buf, len_file - len);
printf( "Val:%s\n",addr ) ; //Checking purpose
}
if((msync(addr,len,MS_SYNC)) < 0)
perror("Error in msync");
if (munmap(addr,len) == -1)
perror("Error in munmap");
if (close(fd))
perror("Error in close");
return 0;
}
appending to a memory-mapped file
Boost.IOStreams has fixed-size only memory mapped files, so it won't help with your specific problem. Linux has an interface mremap
which works as follows:
void *new_mapping = mremap(mapping, size, size + GROWTH, MREMAP_MAYMOVE);
if (new_mapping == MAP_FAILED)
// handle error
mapping = new_mapping;
This is non-portable, however (and poorly documented). Mac OS X seems not to have mremap
.
In any case, you don't need to reopen the file, just munmap
it and mmap
it again:
void *append(int fd, char const *data, size_t nbytes, void *map, size_t &len)
{
// TODO: check for errors here!
ssize_t written = write(fd, data, nbytes);
munmap(map, len);
len += written;
return mmap(NULL, len, PROT_READ, 0, fd, 0);
}
A pre-allocation scheme may be very useful here. Be sure to keep track of the file's actual length and truncate it once more before closing.
Is boost memory mapped file zeroed on Linux
A memory mapped file contains whatever was in the file.
If it is a new file, it has been extended to the right size and the extension will contain zeros. Extending a file is usually done with the ftruncate function.
The ftruncate manpage says:
If the file previously was larger than this size, the extra data is
lost. If the file previously was shorter, it is extended, and the
extended part reads as null bytes ('\0').
So yes, zeros are guaranteed.
Increasing a file's size using mmap
On POSIX systems at least, mmap()
cannot be used to increase (or decrease) the size of a file. mmap()
's function is to memory map a portion of a file. It's logical that the thing you request to map should actually exist! Frankly, I'm really surprised that you would actually be able to do this under MS Windows.
If you want to grow a file, just ftruncate()
it before you mmap()
it.
mmap for writing sequential log file for speed?
I wrote my bachelor thesis about the comparism of fwrite VS mmap ("An Experiment to Measure the Performance Trade-off between Traditional I/O and Memory-mapped Files"). First of all, for writing, you don't have to go for memory-mapped files, espacially for large files. fwrite
is totally fine and will nearly always outperform approaches using mmap
. mmap
will give you the most performance boosts for parallel data reading; for sequential data writing your real limitation with fwrite
is your hardware.
In my examples remapSize
is the initial size of the file and the size by which the file gets increased on each remapping.fileSize
keeps track of the size of the file, mappedSpace
represents the size of the current mmap (it's length), alreadyWrittenBytes
are the bytes that have already been written to the file.
Here is the example initalization:
void init() {
fileDescriptor = open(outputPath, O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600); // Open file
result = ftruncate(fileDescriptor, remapSize); // Init size
fsync(fileDescriptor); // Flush
memoryMappedFile = (char*) mmap64(0, remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create mmap
fileSize = remapSize; // Store mapped size
mappedSpace = remapSize; // Store mapped size
}
Ad Q1:
I used an "Unmap-Remap"-mechanism.
Unmap
- first flushes (
msync
) - and then unmaps the memory-mapped file.
This could look the following:
void unmap() {
msync(memoryMappedFile, mappedSpace, MS_SYNC); // Flush
munmap(memoryMappedFile, mappedSpace)
}
For Remap, you have the choice to remap the whole file or only the newly appended part.
Remap basically
- increases the file size
- creates the new memory map
Example implementation for a full remap:
void fullRemap() {
ftruncate(fileDescriptor, mappedSpace + remapSize); // Make file bigger
fsync(fileDescriptor); // Flush file
memoryMappedFile = (char*) mmap64(0, mappedSpace + remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create new mapping on the bigger file
fileSize += reampSize;
mappedSpace += remapSize; // Set mappedSpace to new size
}
Example implementation for the small remap:
void smallRemap() {
ftruncate(fileDescriptor, fileSize + remapSize); // Make file bigger
fsync(fileDescriptor); // Flush file
remapAt = alreadyWrittenBytes % pageSize == 0
? alreadyWrittenBytes
: alreadyWrittenBytes - (alreadyWrittenBytes % pageSize); // Adjust remap location to pagesize
memoryMappedFile = (char*) mmap64(0, fileSize + remapSize - remapAt, PROT_WRITE, MAP_SHARED, fileDescriptor, remapAt); // Create memory-map
fileSize += remapSize;
mappedSpace = fileSize - remapAt;
}
There is a mremap function
out there, yet it states
This call is Linux-specific, and should not be used in programs
intended to be portable.
Ad Q2:
I'm not sure if I understood that point right. If you want to tell the kernel "and now load the next page", then no, this is not possible (at least to my knowledge). But see Ad Q3 on how to advise the kernel.
Ad Q3:
You can use madvise
with the flag MADV_SEQUENTIAL
, yet keep in mind that this does not enforce the kernel to read ahead, but only advices it.
Excerp form the man:
This may cause the kernel to aggressively read-ahead
Personal conclusion:
Do not use mmap
for sequential data writing. It will just cause much more overhead and will lead to much more "unnatural" code than a simple writing alogrithm using fwrite
.
Use mmap
for random access reads to large files.
This are also the results that were obtained during my thesis. I was not able to achieve any speedup by using mmap
for sequential writing, in fact, it was always slower for this purpose.
Mapping file into memory and writing beyong end of file
The definitive reference in these matters is POSIX, which in its rationale section for mmap has to say:
The mmap() function can be used to map a region of memory that is
larger than the current size of the object. [... snip discussion on
sending SIGBUS if possible, i.e. when accessing a page beyond the end
of the file ...] written data may be lost and read data may not
reflect actual data in the object.
So, POSIX says that doing this can result in lost data. Also, the portability is questionable at best (think about no-MMU systems, the interaction with hugepages, platforms with different pagesizes...)
Related Topics
How to Take Advantage of The Vdso Object with Your Own Programming Language
How to Get The Percent of Packets Received from Ping in Bash
How to You Configure The Command Prompt in Linux to Show Current Directory
Matlab - Run File Without Opening Gui, Then Quit
What Is The Maximum Allowed Depth of Sub-Folders
How to Send Esc/Pos Commands to Thermal Printer in Linux
Gcloud - How to Automate Installation of Gcloud on a Server
Too Many Open Files Error on Lucene
Find The Number of Files in a Directory
/Etc/Lsb-Release Vs /Etc/Os-Release
How to Build Git with Static Linking
Cross-Platform Build Under Windows Targeting Linux Using Cmake
What Do I Need to Debug Pthreads
How to Install Python Modules in a Docker Image
How to Find The Reason for a Dead Process Without Log File on Unix