Release Memory in R

How to clean up R memory without the need to restart R session

Garbage collection is "complicated". If x is a variable bound in an environment e, then rm(x, pos = e); gc() does not necessarily free object.size(e$x) bytes for use by the OS.

That is because R objects are just pointers to blocks of memory. If multiple objects point to the same block of memory, then you need to remove all of them to make that memory available for garbage collection. That can be hard to do if your global environment binds a large number of variables—possibly recursively, if you make frequent use of lists (including data frames), pairlists, and environments (including function evaluation environments).

Here is an example, which I've run on a machine with 8 GB RAM running Ubuntu 20.04. (It should be reproducible on most Unix-alikes, but not on Windows due to the Unix command in the system call.)

$ R --vanilla
## Force garbage collection then output the amount of memory
## being used by R, as seen by R ('gc') and by the OS ('ps')
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6

## Allocate a large block of memory and create multiple
## references to it
x <- double(1e+08)
y <- x
l <- list(x = x)
p <- pairlist(x = x)
e <- new.env(); e$x <- x
f <- (function(x) {force(x); function(x) x})(x)

usage()
## gc (MiB) ps (%)
## 786.1 10.3

## Apply 'object.size' to each object in current environment
## and scale from bytes to mebibytes
0x1p-20 * unlist(eapply(environment(), object.size))
## x y usage e f l p
## 7.629395e+02 7.629395e+02 1.787567e-02 5.340576e-05 1.106262e-03 7.629398e+02 7.629396e+02

## Remove references to 'double(1e+09)' one by one
rm(x); usage()
## gc (MiB) ps (%)
## 786.1 10.3

rm(y); usage()
## gc (MiB) ps (%)
## 786.1 10.3

l$x <- NULL; usage()
## gc (MiB) ps (%)
## 786.1 10.3

p$x <- NULL; usage()
## gc (MiB) ps (%)
## 786.1 10.3

rm(x, pos = e); usage()
## gc (MiB) ps (%)
## 786.1 10.3

rm(x, pos = environment(f)); usage()
## gc (MiB) ps (%)
## 23.2 0.6

This example shows that object.size is not a reliable means of determining what variables you need to remove in order to return a certain block of memory to the OS. To actually free the ~760 MiB (~800 MB) allocated for double(1e+08), it was necessary to remove six references: x, y, l$x, p$x, e$x, and environment(f)$x.

Your observation that gc appears to do nothing only in long-running R processes with many variables bound in the global environment makes me suspect that you have removed some but not all references to the blocks of memory that you are trying to free. I wouldn't jump to the conclusion that the garbage collector is behaving incorrectly, especially without a minimal reproducible example.

That said...

Issues with memory deallocation on Linux have been discussed on the R-devel mailing list and on Bugzilla. It is even covered in the R FAQ. Here are the most relevant links:

  1. Why is R apparently not releasing memory? R FAQ 7.42
  2. Help to create bugzilla account, R-devel [1] ... very poorly titled
  3. Issue with memory deallocation/fragmentation on systems which use glibc, R-devel [2], [3]
  4. R doesn't release memory to the system, BR 14611
  5. glibc malloc doesn't release memory to the system, BR 17505

To summarize, it turns out that there is an issue on Linux, but it is due to a limitation of glibc that is outside of R's control. Specifically, when glibc allocates then deallocates many small blocks of memory, you can end up with a fragmented heap from which the OS is unable to reclaim unused memory.

Minimal reproducible example

We can reproduce the issue in R by creating a long list of short atomic vectors, rather than one very long atomic vector:

$ R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6

x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 15.9

rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 15.8

Indeed, the OS is unable to reclaim most of the memory that was occupied by x and its elements. It continues to reserve ~15% of RAM for the R process, even though only ~23 MiB of that memory is used.

(That is on my Linux machine. On my Mac, which has twice as much RAM, the percentage memory used as reported by the OS changes from 0.4 to 6.2 to 1.2.)

Possible fixes

A few work-arounds were suggested in the mailing list threads:

  1. Set environment variables to tune the behaviour of glibc. No advice or example was provided, so you'll have to do a deep dive to figure this out. You might start with the mallopt man-page.

  2. Instruct R to use an allocator other than glibc's malloc, such as jemalloc or tcmalloc. Luke Tierney wrote:

    ... it is possible to use alternate malloc implementations, either rebuilding R to use them or using LD_PRELOAD. On Ubuntu for example, you can have R use jemalloc with

    sudo apt-get install libjemalloc1
    env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 R

    This does not seem to hold onto memory to the same degree, but I don't know about any other aspect of its performance.

  3. Explicitly call the glibc utility malloc_trim to instruct the OS to reclaim unused memory where possible. The malloc_trim man-page says:

    Since glibc 2.8 this function frees memory in all arenas and in all chunks with whole free pages.

    which seems promising!

Dmitry Selivanov compared malloc, jemalloc, tcmalloc, and malloc+malloc_trim here. They showed convincingly that all of jemalloc, tcmalloc, and malloc+malloc_trim can help mitigate fragmentation issues seen with malloc. Some caveats:

  • They only tested on Ubuntu 16.04.
  • They didn't share what versions of glibc, libjemalloc1, and libtcmalloc-minimal4 they had installed.
  • They showed that none of the malloc alternatives is a panacea. They rarely performed worse than malloc, but they did not always perform better.

Some experiments

I retried the above replicate example using each of the malloc alternatives in turn. In this (nongeneralizable) experiment, jemalloc and tcmalloc did not perform much better than malloc, while malloc+malloc_trim allowed the OS to reclaim all deallocated memory. Here are the libraries that I used:

libc6                 version 2.31-0ubuntu9.2
libjemalloc2 version 5.2.1-1ubuntu1
libtcmalloc-minimal4 version 2.7-1ubuntu2

See below for results.

jemalloc

$ sudo apt install libjemalloc2
$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6

x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 13.9

rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 9.4

tcmalloc

$ sudo apt install libtcmalloc-minimal4
$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.7

x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 13.8

rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 13.8

malloc+malloc_trim, via Simon Urbanek's mallinfo::malloc.trim

$ R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.7

x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 15.9

rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 15.8

## install.packages("mallinfo", repos = "http://www.rforge.net/")
mallinfo::malloc.trim(0L)
usage()
## gc (MiB) ps (%)
## 23.2 0.6

How do I clean up R memory without restarting my PC?

Maybe you can try to use the function gc(). A call of gc() causes a garbage collection to take place. It can be useful to call gc() after a large object has been removed, as this may prompt R to return memory to the operating system.
gc() also return a summary of the occupy memory.

clear memory allocated by R session (gc() doesnt help !)

best solution i found is restarting R session.
in R studio ctr+shft+f10

and if you dont want to save workspace

makeActiveBinding("refresh", function() { system(paste0(R.home(),"/bin/i386/R")); q("no") }, .GlobalEnv)

paste0(R.home(),"/bin/i386/R --no-save") #--save will save workspace

cheers.

Delete global variable and release memory from a function

I have found some code that works

library(data.table)
DT = data.table(col1 = 1:1e6)
cols = paste0('col', 2:100)
for (col in cols){ DT[, col := 1:1e6, with = F] }

rm_and_release <- function(){
dt <- copy(DT)
dt <- dt[sample(1e6, 9e5, FALSE)]
print(gc())
rm(DT, envir = globalenv())

print(gc())
}

rm_and_release()

It results in

           used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells 865272 46.3 1442291 77.1 1280599 68.4
Vcells 96733883 738.1 167167064 1275.4 147681076 1126.8
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 865173 46.3 1442291 77.1 1280599 68.4
Vcells 46731629 356.6 133733651 1020.4 147681076 1126.8

I think it's very ugly not inserting DT as an argument of the function, but at least in this scenario the memory is lowered from 738Mb to 356Mb, which is crucial to what I'm doing

Why does gc() not free memory?

How do you check memory usage? Normally virtual machine allocates some chunk of memory that it uses to store its data. Some of the allocated may be unused and marked as free. What GC does is discovering data that is not referenced from anywhere else and marking corresponding chunks of memory as unused, this does not mean that this memory is released to the OS. Still from the VM perspective there's now more free memory that can be used for further computation.

As others asked did you experience out of memory errors? If not then there's nothing to worry about.

EDIT:
This and this should be enough to understand how memory allocation and garbage collection works in R.

From the first document:

Occasionally an attempt is made to release unused pages back to the
operating system. When pages are released, a number of free nodes
equal to R_MaxKeepFrac times the number of allocated nodes for each
class is retained. Pages not needed to meet this requirement are
released. An attempt to release pages is made every R_PageReleaseFreq level 1
or level 2 collections.

EDIT2:

To see used memory try running gc() with verbose set to TRUE:

gc(verbose=T)

Here's a result with an array of 10'000'000 integers in memory:

Garbage collection 9 = 1+0+8 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
40.6 Mbytes of vectors used (72%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198838 10.7 407500 21.8 350000 18.7
Vcells 5311050 40.6 7421749 56.7 5311504 40.6

And here's after discarding reference to it:

Garbage collection 10 = 1+0+9 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
2.4 Mbytes of vectors used (5%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198821 10.7 407500 21.8 350000 18.7
Vcells 310987 2.4 5937399 45.3 5311504 40.6

As you can see memory used by Vcells fell from 40.6Mb to 2.4Mb.

Release memory in R

  1. the command to release memory is gc(). It will show the change in memory consumption in Task Manager as well. You probably actually don't need to use it to make the memory available, but I'm not completely sure about it. All I know is that Task Manager is not reliable in these cases. There is specialized software that helps with it though.

  2. I have no idea what ORE is. Do you have an actual problem where the memory is limiting factor or are you just trying to get a feel how these things work with R?



Related Topics



Leave a reply



Submit