Mmap() for Large File I/O

mmap() for large file I/O?

On a 32-bit machine your process is limited to 2-3 GB of user address space. This means that (allowing for other memory use) you won't be able to map more than ~1 GB of your file at a time. This does NOT mean that you cannot use mmap() for very large files - just that you need to map only part of the file at a time.

That being said, mmap() can still be a large win for large files. The most significant advantage is that you don't waste memory for keeping the data TWICE - one copy in the system cache, one copy in a private buffer of your application - and CPU time to make those copies. It can be an even more major speedup for random access - but the "random" part must be limited in range to your current mapping(s).

Mmap() an entire large file

MAP_PRIVATE mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.

I also note that you're mapping with PROT_WRITE, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY - this itself may be another problem for you; you must specify O_RDWR if you want to use PROT_WRITE with MAP_SHARED.

As for PROT_WRITE only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE - or, if you only need to read, PROT_READ.

On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED results in an EPERM error (since I'm not allowed to write to the backing file opened with O_RDONLY). Changing to PROT_READ and MAP_SHARED allows the mapping to succeed.

If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap and remap with MAP_PRIVATE the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.

Alternately, you can write 1 to /proc/sys/vm/overcommit_memory. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.

When should I use mmap for file access?

mmap is great if you have multiple processes accessing data in a read only fashion from the same file, which is common in the kind of server systems I write. mmap allows all those processes to share the same physical memory pages, saving a lot of memory.

mmap also allows the operating system to optimize paging operations. For example, consider two programs; program A which reads in a 1MB file into a buffer creating with malloc, and program B which mmaps the 1MB file into memory. If the operating system has to swap part of A's memory out, it must write the contents of the buffer to swap before it can reuse the memory. In B's case any unmodified mmap'd pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap'd from. (The OS can detect which pages are unmodified by initially marking writable mmap'd pages as read only and catching seg faults, similar to Copy on Write strategy).

mmap is also useful for inter process communication. You can mmap a file as read / write in the processes that need to communicate and then use synchronization primitives in the mmap'd region (this is what the MAP_HASSEMAPHORE flag is for).

One place mmap can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. In this case you may have to map the file in smaller chunks than you would like to make it fit.

Another potential awkwardness with mmap as a replacement for read / write is that you have to start your mapping on offsets of the page size. If you just want to get some data at offset X you will need to fixup that offset so it's compatible with mmap.

And finally, read / write are the only way you can work with some types of files. mmap can't be used on things like pipes and ttys.

Randomly read large file with mmap and huge pages

Linux does not support usage of huge pages with page cache (same as with other OSes).

The most important reason for that is that page cache is used (shared) by every process in the system and by the kernel itself.

Consider the following scenario: your process maps file using 2MB huge pages, but then another process maps it using regular 4KB pages. The only way to do this is to switch your processes to 4KB pages on the fly, hence it was pointless to start with 2MB pages in the first place.

What you actually need is to ask the kernel to start prefetching data using either fadvise with FADV_WILLNEED or madvise with MADV_WILLNEED. Doing a syscall is not "free", but if you know you are going to access 2MB region soon, they should be perfect.

For additional information you can read this to get more insight about what kernel developers think (thought) about huge pages.

How to use file input/output functions efficiently on large files (using limited size of memory)

Here's an efficient way to do this:

Open your source-file and access your data with mmap(). This way you are accessing the OS disk-cahe directly and you eliminate copying the memory from kernel mode to user mode. If your files are really big, it is best to use smaller mmapp-ed views to prevent the creation of large page-tables.

Depending on the number of distinct patterns you are using, you have the following options:

If the number of patterns is small enough to fit in memory:

  • If the values are sparse: store them in a map with pattern/count pairs.
  • If the values are somewhat continuous, store the counts in a vector, where the position is the value of your pattern, based on an offset if needed.

If the number of patterns can get big:

(you're talking about 1 billion patterns - depends on how unique they are), you could create a mmap-ed outputfile and store the counts there, but make sure that all the values (or pairs) are the same width, i.e. store everything in binary (you can use this just as you would use an array).

If most of the values are distinct, store them at the position of your pattern-value - for example, if pattern (32bit?) + count is 8 bytes, store them at position pattern-value * 8 for quick access. In case there are large gaps in your pattern-values, but you want to avoid inserting an moving data, consider using a (temporary) sparse file to store the values directly at the right position.

If you only needed a count, you could store the counts (32bit) only, at their specific position, but if you need a sort you'll also need the pattern values somehow.

To sort them, I would prefer using radix sort.

mmap file with larger fixed length with zero padding?

This is one of the few reasonable use cases for MAP_FIXED, to remap part of an existing mapping to use a new backing file.

A simple solution here is to unconditionally mmap 64 MB of anonymous memory (or explicitly mmap /dev/zero), without MAP_FIXED and store the resulting pointer.

Next, mmap 64 MB or your actual file size (whichever is less) of your actual file, passing in the result of the anonymous/zero mmap and passing the MAP_FIXED flag. The pages corresponding to your file will no longer be anonymous/zero mapped, and instead will be backed by your file's data; the remaining pages will be backed by the anonymous/zero pages.

When you're done, a single munmap call will unmap all 64 MB at once (you don't need to separately unmap the real file pages and the zero backed pages).

Extremely simple example (no error checking, please add it yourself):

// Reserve 64 MB of contiguous addresses; anonymous mappings are always zero backed
void *mapping = mmap(NULL, 64 * 1024 * 1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

// Open file and check size
struct stat sb;
int fd = open(myfilename, O_RDONLY);
fstat(fd, &sb);
// Use smaller of file size or 64 MB
size_t filemapsize = sb.st_size > 64 * 1024 * 1024 ? 64 * 1024 * 1024 : sb.st_size;
// Remap up to 64 MB of pages, replacing some or all of original anonymous pages
mapping = mmap(mapping, filemapsize, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0);
close(fd);

// ... do stuff with mapping ...
munmap(mapping, 64 * 1024 * 1024);


Related Topics



Leave a reply



Submit