How to Mmap() a Large File Without Risking The Oom Killer

How to mmap() a large file without risking the OOM killer?

After much further experimentation, I determined that the OOM-killer was visiting me not because the system had run out of RAM, but because RAM would occasionally become sufficiently fragmented that the kernel couldn't find a set of physically-contiguous RAM pages large enough to meet its immediate needs. When this happened, the kernel would invoke the OOM-killer to free up some RAM to avoid a kernel panic, which is all well and good for the kernel but not so great when it kills a process that the user was relying on to get his work done. :/

After trying and failing to find a way to convince Linux not to do that (I think enabling a swap partition would avoid the OOM-killer, but doing that is not an option for me on these particular machines), I came up with a hack work-around; I added some code to my program that periodically checks the amount of memory fragmentation reported by the Linux kernel, and if the memory fragmentation starts looking too severe, preemptively orders a memory-defragmentation to occur, so that the OOM-killer will (hopefully) not become necessary. If the memory-defragmentation pass doesn't appear to be improving matters any, then after 20 consecutive attempts, we also drop the VM Page cache as a way to free up contiguous physical RAM. This is all very ugly, but not as ugly as getting a phone call at 3AM from a user who wants to know why their server program just crashed. :/

The gist of the work-around implementation is below; note that DefragTick(Milliseconds) is expected to be called periodically (preferably once per second).

 // Returns how safe we are from the fragmentation-based-OOM-killer visits.
// Returns -1 if we can't read the data for some reason.
static int GetFragmentationSafetyLevel()
{
int ret = -1;
FILE * fpIn = fopen("/sys/kernel/debug/extfrag/extfrag_index", "r");
if (fpIn)
{
char buf[512];
while(fgets(buf, sizeof(buf), fpIn))
{
const char * dma = (strncmp(buf, "Node 0, zone", 12) == 0) ? strstr(buf+12, "DMA") : NULL;
if (dma)
{
// dma= e.g.: "DMA -1.000 -1.000 -1.000 -1.000 0.852 0.926 0.963 0.982 0.991 0.996 0.998 0.999 1.000 1.000"
const char * s = dma+4; // skip past "DMA ";
ret = 0; // ret now becomes a count of "safe values in a row"; a safe value is any number less than 0.500, per me
while((s)&&((*s == '-')||(*s == '.')||(isdigit(*s))))
{
const float fVal = atof(s);
if (fVal < 0.500f)
{
ret++;

// Advance (s) to the next number in the list
const char * space = strchr(s, ' '); // to the next space
s = space ? (space+1) : NULL;
}
else break; // oops, a dangerous value! Run away!
}
}
}
fclose(fpIn);
}
return ret;
}

// should be called periodically (e.g. once per second)
void DefragTick(Milliseconds current_time_in_milliseconds)
{
if ((current_time_in_milliseconds-m_last_fragmentation_check_time) >= Milliseconds(1000))
{
m_last_fragmentation_check_time = current_time_in_milliseconds;

const int fragmentationSafetyLevel = GetFragmentationSafetyLevel();
if (fragmentationSafetyLevel < 9)
{
m_defrag_pending = true; // trouble seems to start at level 8
m_fragged_count++; // note that we still seem fragmented
}
else m_fragged_count = 0; // we're in the clear!

if ((m_defrag_pending)&&((current_time_in_milliseconds-m_last_defrag_time) >= Milliseconds(5000)))
{
if (m_fragged_count >= 20)
{
// FogBugz #17882
FILE * fpOut = fopen("/proc/sys/vm/drop_caches", "w");
if (fpOut)
{
const char * warningText = "Persistent Memory fragmentation detected -- dropping filesystem PageCache to improve defragmentation.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "3");
fclose(fpOut);

m_fragged_count = 0;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/drop_caches to drop filesystem PageCache!";
printf("%s\n", errorText);
}
}

FILE * fpOut = fopen("/proc/sys/vm/compact_memory", "w");
if (fpOut)
{
const char * warningText = "Memory fragmentation detected -- ordering a defragmentation to avoid the OOM-killer.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "1");
fclose(fpOut);

m_defrag_pending = false;
m_last_defrag_time = current_time_in_milliseconds;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/compact_memory to trigger a memory-defragmentation!";
printf("%s\n", errorText);
}
}
}
}

How do you Mmap() a file bigger than 2GB in Go?

Look in http://golang.org/src/pkg/syscall/syscall_unix.go at the Mmap method on mmapper. You should be able to copy that code and adapt it as required.

Of course you won't be able to mmap to a []byte, since slice lengths are defined to be "int" (which is 32-bit everywhere at the moment). You could mmap to a larger element type (e.g. []int32), or just muck with the pointer to the memory, but it won't be a drop-in replacement to syscall.Mmap.

when system run out of memory, the mmap memory is swapped to swap area or the mapping file?

The mmap memory should be swapped back to the mapping file, i think.

/proc/sys/vm/swappiness is one of the tuneables that let you decide, when the system should swap and when not. Default is 60, on a recent kernel 0 will disable swapping, 1 will minimize it.

How can I map a file with mmap while allocating an empty page before it?

You're on the right track. The missing piece you need is MAP_FIXED:

int* data_pointer = mmap((void*) 0, length + PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if(data_pointer == MAP_FAILED) {
perror("mmap");
exit(1);
}
if(mmap(data_pointer + PAGE_SIZE, length, PROT_READ, MAP_PRIVATE|MAP_FIXED, fd, 0) == MAP_FAILED) {
perror("mmap");
exit(1);
}

You correctly pointed out that normally, the address is "more of a hint than an order", but passing MAP_FIXED makes it an order.

If you're worried about safety, man 2 mmap says:

The only safe use for MAP_FIXED is where the address range specified by addr and length was previously reserved using another mapping

And that's exactly this use.



Related Topics



Leave a reply



Submit