How to mmap() a large file without risking the OOM killer?
After much further experimentation, I determined that the OOM-killer was visiting me not because the system had run out of RAM, but because RAM would occasionally become sufficiently fragmented that the kernel couldn't find a set of physically-contiguous RAM pages large enough to meet its immediate needs. When this happened, the kernel would invoke the OOM-killer to free up some RAM to avoid a kernel panic, which is all well and good for the kernel but not so great when it kills a process that the user was relying on to get his work done. :/
After trying and failing to find a way to convince Linux not to do that (I think enabling a swap partition would avoid the OOM-killer, but doing that is not an option for me on these particular machines), I came up with a hack work-around; I added some code to my program that periodically checks the amount of memory fragmentation reported by the Linux kernel, and if the memory fragmentation starts looking too severe, preemptively orders a memory-defragmentation to occur, so that the OOM-killer will (hopefully) not become necessary. If the memory-defragmentation pass doesn't appear to be improving matters any, then after 20 consecutive attempts, we also drop the VM Page cache as a way to free up contiguous physical RAM. This is all very ugly, but not as ugly as getting a phone call at 3AM from a user who wants to know why their server program just crashed. :/
The gist of the work-around implementation is below; note that DefragTick(Milliseconds)
is expected to be called periodically (preferably once per second).
// Returns how safe we are from the fragmentation-based-OOM-killer visits.
// Returns -1 if we can't read the data for some reason.
static int GetFragmentationSafetyLevel()
{
int ret = -1;
FILE * fpIn = fopen("/sys/kernel/debug/extfrag/extfrag_index", "r");
if (fpIn)
{
char buf[512];
while(fgets(buf, sizeof(buf), fpIn))
{
const char * dma = (strncmp(buf, "Node 0, zone", 12) == 0) ? strstr(buf+12, "DMA") : NULL;
if (dma)
{
// dma= e.g.: "DMA -1.000 -1.000 -1.000 -1.000 0.852 0.926 0.963 0.982 0.991 0.996 0.998 0.999 1.000 1.000"
const char * s = dma+4; // skip past "DMA ";
ret = 0; // ret now becomes a count of "safe values in a row"; a safe value is any number less than 0.500, per me
while((s)&&((*s == '-')||(*s == '.')||(isdigit(*s))))
{
const float fVal = atof(s);
if (fVal < 0.500f)
{
ret++;
// Advance (s) to the next number in the list
const char * space = strchr(s, ' '); // to the next space
s = space ? (space+1) : NULL;
}
else break; // oops, a dangerous value! Run away!
}
}
}
fclose(fpIn);
}
return ret;
}
// should be called periodically (e.g. once per second)
void DefragTick(Milliseconds current_time_in_milliseconds)
{
if ((current_time_in_milliseconds-m_last_fragmentation_check_time) >= Milliseconds(1000))
{
m_last_fragmentation_check_time = current_time_in_milliseconds;
const int fragmentationSafetyLevel = GetFragmentationSafetyLevel();
if (fragmentationSafetyLevel < 9)
{
m_defrag_pending = true; // trouble seems to start at level 8
m_fragged_count++; // note that we still seem fragmented
}
else m_fragged_count = 0; // we're in the clear!
if ((m_defrag_pending)&&((current_time_in_milliseconds-m_last_defrag_time) >= Milliseconds(5000)))
{
if (m_fragged_count >= 20)
{
// FogBugz #17882
FILE * fpOut = fopen("/proc/sys/vm/drop_caches", "w");
if (fpOut)
{
const char * warningText = "Persistent Memory fragmentation detected -- dropping filesystem PageCache to improve defragmentation.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "3");
fclose(fpOut);
m_fragged_count = 0;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/drop_caches to drop filesystem PageCache!";
printf("%s\n", errorText);
}
}
FILE * fpOut = fopen("/proc/sys/vm/compact_memory", "w");
if (fpOut)
{
const char * warningText = "Memory fragmentation detected -- ordering a defragmentation to avoid the OOM-killer.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "1");
fclose(fpOut);
m_defrag_pending = false;
m_last_defrag_time = current_time_in_milliseconds;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/compact_memory to trigger a memory-defragmentation!";
printf("%s\n", errorText);
}
}
}
}
How do you Mmap() a file bigger than 2GB in Go?
Look in http://golang.org/src/pkg/syscall/syscall_unix.go at the Mmap method on mmapper. You should be able to copy that code and adapt it as required.
Of course you won't be able to mmap to a []byte, since slice lengths are defined to be "int" (which is 32-bit everywhere at the moment). You could mmap to a larger element type (e.g. []int32), or just muck with the pointer to the memory, but it won't be a drop-in replacement to syscall.Mmap.
when system run out of memory, the mmap memory is swapped to swap area or the mapping file?
The mmap memory should be swapped back to the mapping file, i think.
/proc/sys/vm/swappiness is one of the tuneables that let you decide, when the system should swap and when not. Default is 60, on a recent kernel 0 will disable swapping, 1 will minimize it.
How can I map a file with mmap while allocating an empty page before it?
You're on the right track. The missing piece you need is MAP_FIXED
:
int* data_pointer = mmap((void*) 0, length + PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if(data_pointer == MAP_FAILED) {
perror("mmap");
exit(1);
}
if(mmap(data_pointer + PAGE_SIZE, length, PROT_READ, MAP_PRIVATE|MAP_FIXED, fd, 0) == MAP_FAILED) {
perror("mmap");
exit(1);
}
You correctly pointed out that normally, the address is "more of a hint than an order", but passing MAP_FIXED
makes it an order.
If you're worried about safety, man 2 mmap
says:
The only safe use for
MAP_FIXED
is where the address range specified byaddr
andlength
was previously reserved using another mapping
And that's exactly this use.
Related Topics
Specify CPU Frequency as a Kernel Cmd_Line Parameter of Linux on Boot
How to Fetch The Logical Name of a Nic Card Given The Ip Address Associated with It
How to Use/Learn Video4Linux2 (On Screen Display) Output APIs
Shared Libraries in Same Folder with App in Tcsh
Using Winscp to Grab a File Through a Tunnel
Building Robert Nelson's Linux Kernel into Yocto(Daisy) for Beaglebone Black
Command Line Video Editing Tools
How to Use Vi to Edit a Command in Terminal on Linux
Vagrant, Shared Folder: Take Advantage of Inotify Over Nfs
Error Installing 'Topicmodels' Package, Non Zero Exit Status; Ubuntu
Ack & Negative Lookahead Giving Errors
Installing Mailutils Using Apt-Get Without User Intervention
Virtually Contiguous VS. Physically Contiguous Memory
Gnutls_Handshake() Failed: Handshake Failed Git
Kate Text Editor Cannot Handle Lines Longer Than 1024
Nasm X86_64 Assembly in 32-Bit Mode: Why Does This Instruction Produce Rip-Relative Addressing Code