How to Return Memory from Process to the Os

Force memory release to the OS

You can't, and shouldn't.

Virtual memory allocation is complicated, and cannot be sufficiently understood by simply watching a number in System Monitor. It may appear as if a process is using more memory than it should, but this is just an artefact of the way virtual memory addressing works.

Rest assured, if you have freed this memory properly, and the OS really needed it back, it would be reassigned.

The only real actionable point here is to stop using System Monitor as if it were an accurate measure of physical RAM in use by your process!

Force free() to return malloc memory back to OS

With glibc malloc try to call malloc_trim function. It is not well documented and there were changes inside it at around 2007 (glibc 2.9) - https://stackoverflow.com/a/42281428.

Since 2007 this function will: Iterate over all malloc memory arenas (used in multithreaded applications) doing trim and fastbin consolidation; and release all aligned (4KB) pages fully freed.

https://sourceware.org/git/?p=glibc.git;a=commit;f=malloc/malloc.c;h=68631c8eb92ff38d9da1ae34f6aa048539b199cc

Ulrich Drepper
Sun, 16 Dec 2007 22:53:08 +0000 (22:53 +0000)

  • malloc/malloc.c (public_mTRIm): Iterate over all arenas and call mTRIm for all of them.

(mTRIm): Additionally iterate over all free blocks and use madvise
to free memory for all those blocks which contain at least one
memory page.

https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=malloc/malloc.c;h=c54c203cbf1f024e72493546221305b4fd5729b7;hp=1e716089a2b976d120c304ad75dd95c63737ad75;hb=68631c8eb92ff38d9da1ae34f6aa048539b199cc;hpb=52386be756e113f20502f181d780aecc38cbb66a

+  malloc_consolidate (av);
...
+ for (int i = 1; i < NBINS; ++i)
...
+ for (mchunkptr p = last (bin); p != bin; p = p->bk)
+ {
...
+ /* See whether the chunk contains at least one unused page. */
+ char *paligned_mem = (char *) (((uintptr_t) p
+ + sizeof (struct malloc_chunk)
+ + psm1) & ~psm1);
...
+ /* This is the size we could potentially free. */
+ size -= paligned_mem - (char *) p;
+
+ if (size > psm1)
+ {
...
+ madvise (paligned_mem, size & ~psm1, MADV_DONTNEED);

So, calling malloc_trim will release almost all freed memory back to the OS. Only pages containing still not freed data will be kept; OS may unmap or not unmap physical page when madvised with MADV_DONTNEED and linux usually does unmap. madvised pages are still count to VSIZE (total virtual memory size of the process), but usually help to reduce RSS (amount of physical memory used by process).

Alternatively, you can try to switch into alternative malloc library: tcmalloc (gperftools / google-perftools) or jemalloc (facebook), both of them have aggressive rules of returning freed memory back to OS (with madvise MADV_DONTNEED or even MADV_FREE).

Python3: Give unused interpreter memory back to the OS

It's not Python or NumPy

As already indicated in the comments to the question, the observed effect is not specific to Python (or NumPy, as the memory is actually used for large ndarrays). Instead, it is a feature of the C runtime used, in this case the glibc.

Heap and memory mapping

When memory is requested using malloc() (as done by NumPy when an array is allocated), the runtime decides if it should use the brk syscall for smaller chunks or mmap for larger chunks. sbrk is used to increase the heap size. Allocated heap space may be given back to the OS, but only if there is enough continugous space at the top of the heap. This means that already a few bytes for an object that happens to be at the top end of the heap may effectively prevent the process to give any heap memory back to the OS. The memory is not wasted, as the runtime will use other freed space on the heap for subsequent calls to malloc(), but the memory is still used by the process and therefore never reported as available until the process terminates.

Allocating memory pages via mmap() is less efficient, but it has the benefit that such pages can be given back to the OS when no longer needed. The performance hit is because the kernel is involved whenever memory pages are mapped or unmapped; especially since the kernel has to zero out mapped pages for security reasons.

The mmap threshold

malloc() uses a threshold on the requested amount of memory to decide if it should use the heap or mmap(). This threshold is dynamic in recent versions of the glibc, but it may be changed using the mallopt() function:

M_MMAP_THRESHOLD

[...]

Note: Nowadays, glibc uses a dynamic mmap threshold by default. The initial value of the threshold is 128*1024, but when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted upwards to the size of the freed block. When dynamic mmap thresholding is in effect, the threshold for trimming the heap is also dynamically adjusted to be twice the dynamic mmap threshold. Dynamic adjustment of the mmap threshold is disabled if any of the M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.

The threshold can be adjusted using at least two ways:

  1. Using a call to mallopt().

  2. By setting the environment variable MALLOC_MMAP_THRESHOLD_ (note the trailing underscore).

Applied to the example code

The example allocates (and deallocates) memory in chunks of 2**24 bytes or 16MiB. According to the theory, a fixed MMAP_THRESHOLD somewhat below this value should therefore ensure that all large arrays are allocated using mmap(), allowing them to be unmapped and given back to the OS.

First a run without modification:

$ ./test_mem.py 
Iteration 0
Available memory: 21.45 GiB
Available memory: 2.17 GiB
Result: 1235
Available memory: 21.50 GiB
Iteration 0 ends
Iteration 1
Available memory: 21.50 GiB
Available memory: 2.13 GiB
Result: 1238
Available memory: 3.95 GiB
Iteration 1 ends
Iteration 2
Available memory: 4.02 GiB
Available memory: 4.02 GiB
Result: 232
Available memory: 4.02 GiB
Iteration 2 ends
Program done.
Available memory: 4.02 GiB

The memory is not returned in iterations 1 and 2.

Let's set a fixed threshold of 1MiB now:

$ MALLOC_MMAP_THRESHOLD_=1048576 ./test_mem.py 
Iteration 0
Available memory: 21.55 GiB
Available memory: 2.13 GiB
Result: 1241
Available memory: 21.52 GiB
Iteration 0 ends
Iteration 1
Available memory: 21.52 GiB
Available memory: 2.11 GiB
Result: 1240
Available memory: 21.52 GiB
Iteration 1 ends
Iteration 2
Available memory: 21.51 GiB
Available memory: 2.12 GiB
Result: 1239
Available memory: 21.53 GiB
Iteration 2 ends
Program done.
Available memory: 21.53 GiB

As can be seen, the memory is successfully given back to the OS in all three iterations.
As an alternative, the setting can also be integrated into the Python script by a call to mallopt() using the ctypes module:

#!/usr/bin/env python3

import ctypes
import psutil
import time
import queue

import numpy as np

libc = ctypes.cdll.LoadLibrary("libc.so.6")
M_MMAP_THRESHOLD = -3

# Set malloc mmap threshold.
libc.mallopt(M_MMAP_THRESHOLD, 2**20)

# ...

Disclaimer: These solutions/workarounds are far from being platform-independent, as they make use of specific glibc features.

Note

The above text mainly answers the more important second question "How can I force the interpreter to give back reserved memory back to the OS, so that the reported available memory actually increases?". As for the first question "How can I query the interpreter's memory management to find out how much memory is used by referenced objects and how much is just reserved for future use and not given back to the OS?", I was not able to find a satisfactory answer. malloc_stats():

libc = ctypes.cdll.LoadLibrary("libc.so.6")
# ... script here ...
libc.malloc_stats()

gives some numbers, but those results:

Arena 0:
system bytes = 1632264192
in use bytes = 4629984
Total (incl. mmap):
system bytes = 1632858112
in use bytes = 5223904
max mmap regions = 1236
max mmap bytes = 20725514240

for a script run without changing the mmap threshold seem a bit confusing to me. 5MiB could be the actual used memory when the script ends, but what about the "system bytes"? The process still uses almost 20GiB at this time, so the indicated 1.6GiB somehow don't fit in the picture at all.

See also

  • allocated memory cache for numpy

  • Reduce memory fragmentation with MALLOC_MMAP_THRESHOLD_ and MALLOC_MMAP_MAX_

  • glibc allocator doesn't release all free()ed memory

  • Releasing memory in Python

  • Freeing Memory Allocated with malloc

  • Tips of malloc & free

When should a free() implementation give memory back to the OS?

Asking the OS for memory or returning it back are (relatively) expensive operation because they require a context switch user/kernel and back. For that reason, in most implementations, the malloc call only asks for large chunks and internally allocates from those chunks, and manages freed memory with a free blocks list. In that case, it only returns memory to the OS when a full chunk is present in the free list.

For a custom implementation, the rule for returning memory to the system is up to the programmer (you...).

Java VM - does the freed memory return to the OS?

That depends on the JVM implementation and isn't specified in the specification.

The Sun JVM will hold onto it as a first step. Once a certain (configurable) percentage of allocated memory is unused, it will return some of it to the OS (the behavior is influenced by the MinHeapFreeRatio and MaxHeapFreeRatio settings).



Related Topics



Leave a reply



Submit