Force memory release to the OS
You can't, and shouldn't.
Virtual memory allocation is complicated, and cannot be sufficiently understood by simply watching a number in System Monitor. It may appear as if a process is using more memory than it should, but this is just an artefact of the way virtual memory addressing works.
Rest assured, if you have freed this memory properly, and the OS really needed it back, it would be reassigned.
The only real actionable point here is to stop using System Monitor as if it were an accurate measure of physical RAM in use by your process!
Force free() to return malloc memory back to OS
With glibc malloc try to call malloc_trim
function. It is not well documented and there were changes inside it at around 2007 (glibc 2.9) - https://stackoverflow.com/a/42281428.
Since 2007 this function will: Iterate over all malloc memory arenas (used in multithreaded applications) doing trim and fastbin consolidation; and release all aligned (4KB) pages fully freed.
https://sourceware.org/git/?p=glibc.git;a=commit;f=malloc/malloc.c;h=68631c8eb92ff38d9da1ae34f6aa048539b199cc
Ulrich Drepper
Sun, 16 Dec 2007 22:53:08 +0000 (22:53 +0000)
- malloc/malloc.c (public_mTRIm): Iterate over all arenas and call mTRIm for all of them.
(mTRIm): Additionally iterate over all free blocks and use madvise
to free memory for all those blocks which contain at least one
memory page.
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=malloc/malloc.c;h=c54c203cbf1f024e72493546221305b4fd5729b7;hp=1e716089a2b976d120c304ad75dd95c63737ad75;hb=68631c8eb92ff38d9da1ae34f6aa048539b199cc;hpb=52386be756e113f20502f181d780aecc38cbb66a
+ malloc_consolidate (av);
...
+ for (int i = 1; i < NBINS; ++i)
...
+ for (mchunkptr p = last (bin); p != bin; p = p->bk)
+ {
...
+ /* See whether the chunk contains at least one unused page. */
+ char *paligned_mem = (char *) (((uintptr_t) p
+ + sizeof (struct malloc_chunk)
+ + psm1) & ~psm1);
...
+ /* This is the size we could potentially free. */
+ size -= paligned_mem - (char *) p;
+
+ if (size > psm1)
+ {
...
+ madvise (paligned_mem, size & ~psm1, MADV_DONTNEED);
So, calling malloc_trim
will release almost all freed memory back to the OS. Only pages containing still not freed data will be kept; OS may unmap or not unmap physical page when madvised with MADV_DONTNEED and linux usually does unmap. madvised pages are still count to VSIZE (total virtual memory size of the process), but usually help to reduce RSS (amount of physical memory used by process).
Alternatively, you can try to switch into alternative malloc library: tcmalloc (gperftools / google-perftools) or jemalloc (facebook), both of them have aggressive rules of returning freed memory back to OS (with madvise MADV_DONTNEED or even MADV_FREE).
Python3: Give unused interpreter memory back to the OS
It's not Python or NumPy
As already indicated in the comments to the question, the observed effect is not specific to Python (or NumPy, as the memory is actually used for large ndarrays). Instead, it is a feature of the C runtime used, in this case the glibc.
Heap and memory mapping
When memory is requested using malloc() (as done by NumPy when an array is allocated), the runtime decides if it should use the brk syscall for smaller chunks or mmap for larger chunks. sbrk
is used to increase the heap size. Allocated heap space may be given back to the OS, but only if there is enough continugous space at the top of the heap. This means that already a few bytes for an object that happens to be at the top end of the heap may effectively prevent the process to give any heap memory back to the OS. The memory is not wasted, as the runtime will use other freed space on the heap for subsequent calls to malloc()
, but the memory is still used by the process and therefore never reported as available until the process terminates.
Allocating memory pages via mmap()
is less efficient, but it has the benefit that such pages can be given back to the OS when no longer needed. The performance hit is because the kernel is involved whenever memory pages are mapped or unmapped; especially since the kernel has to zero out mapped pages for security reasons.
The mmap threshold
malloc()
uses a threshold on the requested amount of memory to decide if it should use the heap or mmap()
. This threshold is dynamic in recent versions of the glibc, but it may be changed using the mallopt() function:
M_MMAP_THRESHOLD
[...]
Note: Nowadays, glibc uses a dynamic mmap threshold by default. The initial value of the threshold is 128*1024, but when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted upwards to the size of the freed block. When dynamic mmap thresholding is in effect, the threshold for trimming the heap is also dynamically adjusted to be twice the dynamic mmap threshold. Dynamic adjustment of the mmap threshold is disabled if any of the M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.
The threshold can be adjusted using at least two ways:
Using a call to
mallopt()
.By setting the environment variable
MALLOC_MMAP_THRESHOLD_
(note the trailing underscore).
Applied to the example code
The example allocates (and deallocates) memory in chunks of 2**24
bytes or 16MiB. According to the theory, a fixed MMAP_THRESHOLD
somewhat below this value should therefore ensure that all large arrays are allocated using mmap()
, allowing them to be unmapped and given back to the OS.
First a run without modification:
$ ./test_mem.py
Iteration 0
Available memory: 21.45 GiB
Available memory: 2.17 GiB
Result: 1235
Available memory: 21.50 GiB
Iteration 0 ends
Iteration 1
Available memory: 21.50 GiB
Available memory: 2.13 GiB
Result: 1238
Available memory: 3.95 GiB
Iteration 1 ends
Iteration 2
Available memory: 4.02 GiB
Available memory: 4.02 GiB
Result: 232
Available memory: 4.02 GiB
Iteration 2 ends
Program done.
Available memory: 4.02 GiB
The memory is not returned in iterations 1 and 2.
Let's set a fixed threshold of 1MiB now:
$ MALLOC_MMAP_THRESHOLD_=1048576 ./test_mem.py
Iteration 0
Available memory: 21.55 GiB
Available memory: 2.13 GiB
Result: 1241
Available memory: 21.52 GiB
Iteration 0 ends
Iteration 1
Available memory: 21.52 GiB
Available memory: 2.11 GiB
Result: 1240
Available memory: 21.52 GiB
Iteration 1 ends
Iteration 2
Available memory: 21.51 GiB
Available memory: 2.12 GiB
Result: 1239
Available memory: 21.53 GiB
Iteration 2 ends
Program done.
Available memory: 21.53 GiB
As can be seen, the memory is successfully given back to the OS in all three iterations.
As an alternative, the setting can also be integrated into the Python script by a call to mallopt()
using the ctypes module:
#!/usr/bin/env python3
import ctypes
import psutil
import time
import queue
import numpy as np
libc = ctypes.cdll.LoadLibrary("libc.so.6")
M_MMAP_THRESHOLD = -3
# Set malloc mmap threshold.
libc.mallopt(M_MMAP_THRESHOLD, 2**20)
# ...
Disclaimer: These solutions/workarounds are far from being platform-independent, as they make use of specific glibc features.
Note
The above text mainly answers the more important second question "How can I force the interpreter to give back reserved memory back to the OS, so that the reported available memory actually increases?". As for the first question "How can I query the interpreter's memory management to find out how much memory is used by referenced objects and how much is just reserved for future use and not given back to the OS?", I was not able to find a satisfactory answer. malloc_stats()
:
libc = ctypes.cdll.LoadLibrary("libc.so.6")
# ... script here ...
libc.malloc_stats()
gives some numbers, but those results:
Arena 0:
system bytes = 1632264192
in use bytes = 4629984
Total (incl. mmap):
system bytes = 1632858112
in use bytes = 5223904
max mmap regions = 1236
max mmap bytes = 20725514240
for a script run without changing the mmap threshold seem a bit confusing to me. 5MiB could be the actual used memory when the script ends, but what about the "system bytes"? The process still uses almost 20GiB at this time, so the indicated 1.6GiB somehow don't fit in the picture at all.
See also
allocated memory cache for numpy
Reduce memory fragmentation with MALLOC_MMAP_THRESHOLD_ and MALLOC_MMAP_MAX_
glibc allocator doesn't release all free()ed memory
Releasing memory in Python
Freeing Memory Allocated with malloc
Tips of malloc & free
When should a free() implementation give memory back to the OS?
Asking the OS for memory or returning it back are (relatively) expensive operation because they require a context switch user/kernel and back. For that reason, in most implementations, the malloc call only asks for large chunks and internally allocates from those chunks, and manages freed memory with a free blocks list. In that case, it only returns memory to the OS when a full chunk is present in the free list.
For a custom implementation, the rule for returning memory to the system is up to the programmer (you...).
Java VM - does the freed memory return to the OS?
That depends on the JVM implementation and isn't specified in the specification.
The Sun JVM will hold onto it as a first step. Once a certain (configurable) percentage of allocated memory is unused, it will return some of it to the OS (the behavior is influenced by the MinHeapFreeRatio
and MaxHeapFreeRatio
settings).
Related Topics
Linux: Block Until a String Is Matched in a File ("Tail + Grep with Blocking")
How Bash Handles the Jobs When Logout
Why Is Pr_Debug of the Linux Kernel Not Giving Any Output
Using Named Pipes with Bash - Problem with Data Loss
Microsecond Accurate (Or Better) Process Timing in Linux
Linux Command Output as a Parameter of Another Command
How to Change the Mime Type of a File from the Terminal
What's the Practical Limit on the Size of Single Packet Transmitted Over Domain Socket
What Makes a Kernel/Os Real-Time
How to Setup and Clone a Remote Git Repo on Windows
How to Make Binary Distribution of Qt Application for Linux
How to Display Only Files from Aws S3 Ls Command
Pytorch Says That Cuda Is Not Available
Getting Current Path in Variable and Using It
How to Toggle Cr/Lf in Gnu Screen