Forcing Garbage Collection to Run in R With the Gc() Command

Forcing garbage collection to run in R with the gc() command

"Probably." I do it too, and often even in a loop as in

cleanMem <- function(n=10) { for (i in 1:n) gc() }

Yet that does not, in my experience, restore memory to a pristine state.

So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.

That workflow happens to agree with

  • workflow-for-statistical-analysis-and-report-writing
  • tricks-to-manage-the-available-memory-in-an-r-session

which we covered here before.

Garbage Collection in R

Calling gc() is largely pointless, as R calls it automatically when more memory is needed. The only reason I can think of for calling gc() explicitly is if another program needs memory that R is hogging.

Forcing garbage collection to run in R with the gc() command

"Probably." I do it too, and often even in a loop as in

cleanMem <- function(n=10) { for (i in 1:n) gc() }

Yet that does not, in my experience, restore memory to a pristine state.

So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.

That workflow happens to agree with

  • workflow-for-statistical-analysis-and-report-writing
  • tricks-to-manage-the-available-memory-in-an-r-session

which we covered here before.

What is the difference between gc() and rm()

First, it is important to note that the two are very different in that gc does not delete any variables that you are still using- it only frees up the memory for ones that you no longer have access to (whether removed using rm() or, say, created in a function that has since returned). Running gc() will never make you lose variables.

The question of whether you should call gc() after calling rm(), though, is a good one. The documentation for gc helpfully notes:

A call of gc causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling gc is for the report on memory usage.

However, it can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.

So the answer is that it can be good to call gc() (and at the very least, can't hurt), even though it would likely be triggered anyway (if not right away, then soon).

Speed up garbage collection in lapply

Not really an answer, but longer than a comment. Ben, this

fun0 = function(x) sum(x, gc())

defines a function that calculates the sum of "x and the value returned by gc()". This

fun1 = function(x) sum(x); gc()

defines a function that returns the sum of x. gc() is run after the function is defined, but is not part of the function definition.

fun2 = function(x) {
result = sum(x)
gc()
result
}

defines a function that calculates the sum of x and saves it to a variable result that exists inside the function. It then evaluates the function gc(). It then returns the value contained in result, i.e., the sum of x. It's worth comparing results in addition to times

test_case = 1:5
identical(sum(test_case), fun0(test_case)) # FALSE
identical(sum(test_case), fun1(test_case)) # TRUE, but no garbage collection
identical(sum(test_case), fun2(test_case)) # TRUE

Invoking gc() in fun2 doesn't really accomplish anything, after the first time fun2 is evaluated. There is no memory that has been allocated but no longer associated with a symbol, so no garbage to collect. Here's a case where we allocate some memory, use it, remove a reference to it, and then run the garbage collect to release the memory.

fun3 = function(x) {
m = rnorm(length(x))
result = sum(m * x)
rm(m)
gc()
result
}

BUT EXPLICIT GARBAGE COLLECTION DOES NOT DO ANYTHING USEFUL HERE -- the garbage collector automatically runs when R needs more memory than it has available. If fun3 has been invoked several times, then there will be memory used inside each invocation that is no longer referenced by a symbol, and hence will be collected when the garbage collector runs automatically. By invoking gc() directly, you're asserting that your naive garbage collection strategy (do it all the time) is better than R's (do it when more memory is needed).

Which one might be able to do (write a better garbage collector).

But isn't the case here.

I mentioned that it often pays when confronted with performance or memory issues to step back and look at your algorithm and implementation. I know this is a 'toy' example, but let's look anyway. What you're calculating is the cumulative sum of the elements of x. I'd have written your implementation as

fun4 = function(i, x) sum(x[seq_len(i)])
sapply(seq_along(test_case), fun4, test_case)

which give

> x0 <- sapply(seq_along(test_case), fun4, test_case)
> x0
[1] 1 3 6 10 15

But R has a function cumsum that does this more efficiently in terms of both memory and speed.

> x1 <- cumsum(test_case)
> identical(x0, x1)
[1] TRUE
> test_case = seq_len(10000)
> system.time(x0 <- sapply(seq_along(test_case), fun4, test_case))
user system elapsed
2.508 0.000 2.517
> system.time(x1 <- cumsum(test_case))
user system elapsed
0.004 0.000 0.002

How to force garbage collector to run?

System.GC.Collect() forces garbage collector to run. This is not recommended but can be used if situations arise.

Why does r gc() function report higher memory usage than windows task manager

Assuming that windows' native taskmanager only shows physical RAM statistics, it is pretty likely (and from my experience it really is) that the rest of the memory used by R (your "missing" 5-6 Gb) is allocated to the swap file by windows (which is really slow then). You could check this yourself by e. g. using process explorer which I use and also shows the virtual memory (including that on disk).

The memory allocation is done before the end of RAM, certainly with a view to protect the system from crashing. From my experience windows doesn't swap R memory at all, and at a point it's limit is just reached and you get some Error: cannot allocate vector of size 200 Mb -- see also this question.

I guess you'd like to free memory by gc() though the use of running is controversially discussed.

If you have no other machine with more RAM and don't like to upgrade your laptop's RAM you could take a look at the topic cloud computing.

I hope this might help you.



Related Topics



Leave a reply



Submit