Linux: Large int array: mmap vs seek file?
I'd say performance should be similar if access is truly random. The OS will use a similar caching strategy whether the data page is mapped from a file or the file data is simply cached without an association to RAM.
Assuming cache is ineffective:
- You can use
fadvise
to declare your access pattern in advance and disable readahead. - Due to address space layout randomization, there might not be a contiguous block of 4 TB in your virtual address space.
- If your data set ever expands, the address space issue might become more pressing.
So I'd go with explicit reads.
mmaping large files(for persistent large arrays)
All pointers that are stored inside the mmap'd region should be done as offsets from the base of the mmap'd region, not as real pointers! You won't necessarily be getting the same base address when you mmap the region on the next run of the program. (I have had to clean up code that made incorrect assumptions about mmap region base address constancy).
When should I use mmap for file access?
mmap
is great if you have multiple processes accessing data in a read only fashion from the same file, which is common in the kind of server systems I write. mmap
allows all those processes to share the same physical memory pages, saving a lot of memory.
mmap
also allows the operating system to optimize paging operations. For example, consider two programs; program A
which reads in a 1MB
file into a buffer creating with malloc
, and program B which mmaps
the 1MB file into memory. If the operating system has to swap part of A
's memory out, it must write the contents of the buffer to swap before it can reuse the memory. In B
's case any unmodified mmap
'd pages can be reused immediately because the OS knows how to restore them from the existing file they were mmap
'd from. (The OS can detect which pages are unmodified by initially marking writable mmap
'd pages as read only and catching seg faults, similar to Copy on Write strategy).
mmap
is also useful for inter process communication. You can mmap
a file as read / write in the processes that need to communicate and then use synchronization primitives in the mmap'd
region (this is what the MAP_HASSEMAPHORE
flag is for).
One place mmap
can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap
has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. In this case you may have to map the file in smaller chunks than you would like to make it fit.
Another potential awkwardness with mmap
as a replacement for read / write is that you have to start your mapping on offsets of the page size. If you just want to get some data at offset X
you will need to fixup that offset so it's compatible with mmap
.
And finally, read / write are the only way you can work with some types of files. mmap
can't be used on things like pipes and ttys.
Mmap() an entire large file
MAP_PRIVATE
mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED
mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.
I also note that you're mapping with PROT_WRITE
, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY
- this itself may be another problem for you; you must specify O_RDWR
if you want to use PROT_WRITE
with MAP_SHARED
.
As for PROT_WRITE
only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE
- or, if you only need to read, PROT_READ
.
On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED
results in an EPERM
error (since I'm not allowed to write to the backing file opened with O_RDONLY
). Changing to PROT_READ
and MAP_SHARED
allows the mapping to succeed.
If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap
and remap with MAP_PRIVATE
the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.
Alternately, you can write 1
to /proc/sys/vm/overcommit_memory
. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.
In the following case, which one is better ? fread() or mmap()?
My question is, which method is more efficient ? fread() or mmap() ?
First of all, let's look how fread and mmap works on linux:fread
:
Let's say we work with ext4 file system (without encryption),
fread use some internal buffer and if no data in it,
it calls read
, read
execute "system call"
and after some time we jump to:
fs/read_write.c::vfs_read
and after more work we reach
mm/filemap.c::generic_file_read_iter
And in this function we fill inode page cache and read to this
page cache data.
So we do the basically the same as "mmap" does.
The difference that that in fread
case we not directly
works with pages, we just copy portion of data from kernel
inode page cache to user space buffer,
in mmap
we have page cache directly in program
memory space. Plus in fread
when no page in "inode page cache"
we just read it, but in mmap
that cause "page fault",
and only after that we read it.
Both variants use "read pages ahead" strategy.
The possible difference may be in "cache" policy,
we can control it in "mmap" case with madvise
and flags of mmap
.
So I suppose the answer is "they are almost the same in terms of speed in sequence read case like yours".
How can I keep multiple copies of a very large dataset in memory?
This sounds like a good use case for mmap
.
The mmap
function can be used to take an open file and map it to a region of memory. Reads and writes to the file via the returned pointer are handled internally, although you can periodically flush to disk manually. This will allow you to manipulate a data structure larger than the physical memory of the system.
This also has the advantage that you don't need to worry about moving data back and forth from disk manually. The kernel will take care of it for you.
So for each of these large arrays, you can create a memory mapping backed by a file on disk.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#define DATA_LEN 30000000000LL
int main()
{
int array1_fd = open("/tmp/array1", O_RDWR | O_CREAT | O_TRUNC, 0644);
if (array1_fd < 0) {
perror("open failed");
exit(1);
}
// make sure file is big enough
if (lseek(array1_fd, DATA_LEN, SEEK_SET) == -1) {
perror("seek to len failed");
exit(1);
}
if (write(array1_fd, "x", 1) == -1) {
perror("write at end failed");
exit(1);
}
if (lseek(array1_fd, 0, SEEK_SET) == -1) {
perror("seek to 0 failed");
exit(1);
}
char *array1 = mmap(NULL, DATA_LEN, PROT_READ | PROT_WRITE, MAP_SHARED, array1_fd, 0);
if (array1 == MAP_FAILED) {
perror("mmap failed");
exit(1);
}
// Use array1
munmap(array1, DATA_LEN);
close(array1_fd);
return 0;
}
The important part of the mmap
call is the MAP_SHARED
flag. This means that updates to the mapped memory region are carried through to the underlying file descriptor.
memory map file with growing size
After some experimentations, I found a way to make it work.
First mmap
the file with PROT_NONE
and a large enough size. For 64-bit systems, it can be as large 1L << 46
(64TB). This does NOT consume physical memory* (at least on Linux). It will consume address space (virtual memory) for this process.
void* ptr = mmap(NULL, (1L << 40), PROT_NONE, MAP_SHARED, fd, 0);
Then, give read (and/or write) permission to the part of memory within file length using mprotect
. Note that size need to be aligned with page size (which can be obtained by sysconf(_SC_PAGESIZE)
, usually 4096).
mprotect(ptr, aligned_size, PROT_READ | PROT_WRITE);
However, if file size is not page-size aligned, reading the portion within mapped region (with PROT_READ
permission) but beyond file length will trigger a bus error, as documented on mmap
manual.
Then you can use either file descriptor fd
or the mapped memory to read and write file. Remember to use fsync
or msync
to persist the data after writing to it. The memory-mapped page with PROT_READ
permission should get the latest file content (if you write to it)**. The newly mapped page with mprotect
will also get the newly updated page.
Depending on the application, you might want to use ftruncate
to make the file size aligned to system page size for the best performance. You might also want to use madvise
with MADV_SEQUENTIAL
to improve performance when reading those pages.
*This behavior is not mentioned on the manual of mmap
. However, since PROT_NONE
implies those pages are not accessible in anyway, it's trivial for any OS implementation to not allocating any physical memory to it at all.
**This behavior of memory region mapped before a file write getting updated after the write is completed (fsync
or msync
) is also not mentioned on the manual (or at least I did not see it). But it seems to be the case at least on recent Linux kernels (4.x onward).
Is that possible to mmap a very big file and using qsort?
If the file will fit in a contiguous mapping in your address space, you can do this. If it won't, you can't.
As to the differences:
- if the file just about fits, and then you add some more data, the mmap will fail. A normal external sort won't suddenly stop working because you have a little more data.
- if you don't map it with
MAP_PRIVATE
, sorting will mutate the original file. A normal external sort won't (necessarily) - if you do map it with
MAP_PRIVATE
, you could crash at any time if the VM doesn't have room to duplicate the whole file. Again, a strictly external sort's memory requirements don't scale linearly with the data size.
tl;dr
It is possible, it may fail unpredictably and unrecoverably, you almost certainly shouldn't do it.
Related Topics
Keep the Window's Name Fixed in Tmux
What Is the Meaning of Each Line of the Assembly Output of a C Hello World
Limit on File Name Length in Bash
Linux Tool to Send Raw Data to a Tcp Server
Which Real-Time Priority Is the Highest Priority in Linux
Dropping of Connections with Tcp_Tw_Recycle
Shell Script to Kill the Process Listening on Port 3000
How to Check If There Are Symbolic Links Pointing to a Directory
Bash Scripting - How to Set the Group That New Files Will Be Created With
Check If Directory Mounted with Bash
Is \D Not Supported by Grep's Basic Expressions
Convert Binary Data to Hexadecimal in a Shell Script
Cmake:Set Environment Variables from a Script
Adding Support for Menuconfig/Kconfig in My Project
Examining C/C++ Heap Memory Statistics in Gdb