Forcing garbage collection to run in R with the gc() command
"Probably." I do it too, and often even in a loop as in
cleanMem <- function(n=10) { for (i in 1:n) gc() }
Yet that does not, in my experience, restore memory to a pristine state.
So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.
That workflow happens to agree with
- workflow-for-statistical-analysis-and-report-writing
- tricks-to-manage-the-available-memory-in-an-r-session
which we covered here before.
Garbage Collection in R
Calling gc()
is largely pointless, as R calls it automatically when more memory is needed. The only reason I can think of for calling gc()
explicitly is if another program needs memory that R is hogging.
Forcing garbage collection to run in R with the gc() command
"Probably." I do it too, and often even in a loop as in
cleanMem <- function(n=10) { for (i in 1:n) gc() }
Yet that does not, in my experience, restore memory to a pristine state.
So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.
That workflow happens to agree with
- workflow-for-statistical-analysis-and-report-writing
- tricks-to-manage-the-available-memory-in-an-r-session
which we covered here before.
What is the difference between gc() and rm()
First, it is important to note that the two are very different in that gc
does not delete any variables that you are still using- it only frees up the memory for ones that you no longer have access to (whether removed using rm()
or, say, created in a function that has since returned). Running gc()
will never make you lose variables.
The question of whether you should call gc()
after calling rm()
, though, is a good one. The documentation for gc helpfully notes:
A call of gc causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling gc is for the report on memory usage.
However, it can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.
So the answer is that it can be good to call gc()
(and at the very least, can't hurt), even though it would likely be triggered anyway (if not right away, then soon).
Speed up garbage collection in lapply
Not really an answer, but longer than a comment. Ben, this
fun0 = function(x) sum(x, gc())
defines a function that calculates the sum of "x and the value returned by gc()". This
fun1 = function(x) sum(x); gc()
defines a function that returns the sum of x. gc()
is run after the function is defined, but is not part of the function definition.
fun2 = function(x) {
result = sum(x)
gc()
result
}
defines a function that calculates the sum of x and saves it to a variable result
that exists inside the function. It then evaluates the function gc()
. It then returns the value contained in result
, i.e., the sum of x. It's worth comparing results in addition to times
test_case = 1:5
identical(sum(test_case), fun0(test_case)) # FALSE
identical(sum(test_case), fun1(test_case)) # TRUE, but no garbage collection
identical(sum(test_case), fun2(test_case)) # TRUE
Invoking gc()
in fun2
doesn't really accomplish anything, after the first time fun2
is evaluated. There is no memory that has been allocated but no longer associated with a symbol, so no garbage to collect. Here's a case where we allocate some memory, use it, remove a reference to it, and then run the garbage collect to release the memory.
fun3 = function(x) {
m = rnorm(length(x))
result = sum(m * x)
rm(m)
gc()
result
}
BUT EXPLICIT GARBAGE COLLECTION DOES NOT DO ANYTHING USEFUL HERE -- the garbage collector automatically runs when R needs more memory than it has available. If fun3
has been invoked several times, then there will be memory used inside each invocation that is no longer referenced by a symbol, and hence will be collected when the garbage collector runs automatically. By invoking gc()
directly, you're asserting that your naive garbage collection strategy (do it all the time) is better than R's (do it when more memory is needed).
Which one might be able to do (write a better garbage collector).
But isn't the case here.
I mentioned that it often pays when confronted with performance or memory issues to step back and look at your algorithm and implementation. I know this is a 'toy' example, but let's look anyway. What you're calculating is the cumulative sum of the elements of x. I'd have written your implementation as
fun4 = function(i, x) sum(x[seq_len(i)])
sapply(seq_along(test_case), fun4, test_case)
which give
> x0 <- sapply(seq_along(test_case), fun4, test_case)
> x0
[1] 1 3 6 10 15
But R has a function cumsum
that does this more efficiently in terms of both memory and speed.
> x1 <- cumsum(test_case)
> identical(x0, x1)
[1] TRUE
> test_case = seq_len(10000)
> system.time(x0 <- sapply(seq_along(test_case), fun4, test_case))
user system elapsed
2.508 0.000 2.517
> system.time(x1 <- cumsum(test_case))
user system elapsed
0.004 0.000 0.002
How to force garbage collector to run?
System.GC.Collect()
forces garbage collector to run. This is not recommended but can be used if situations arise.
Why does r gc() function report higher memory usage than windows task manager
Assuming that windows' native taskmanager only shows physical RAM statistics, it is pretty likely (and from my experience it really is) that the rest of the memory used by R (your "missing" 5-6 Gb) is allocated to the swap file by windows (which is really slow then). You could check this yourself by e. g. using process explorer which I use and also shows the virtual memory (including that on disk).
The memory allocation is done before the end of RAM, certainly with a view to protect the system from crashing. From my experience windows doesn't swap R memory at all, and at a point it's limit is just reached and you get some Error: cannot allocate vector of size 200 Mb
-- see also this question.
I guess you'd like to free memory by gc()
though the use of running is controversially discussed.
If you have no other machine with more RAM and don't like to upgrade your laptop's RAM you could take a look at the topic cloud computing.
I hope this might help you.
Related Topics
R: Gsub, Pattern = Vector and Replacement = Vector
Find Which Season a Particular Date Belongs To
Dplyr: "Error in N(): Function Should Not Be Called Directly"
Assign Multiple Objects to .Globalenv from Within a Function
How to Order Data by Value Within Ggplot Facets
How to Delete Rows from a Dataframe That Contain N*Na
Call Apply-Like Function on Each Row of Dataframe With Multiple Arguments from Each Row
Gradient of N Colors Ranging from Color 1 and Color 2
How to Install Packages in Latest Version of Rstudio and R Version.3.1.1
Convert Unix Epoch to Date Object
R - Concatenate Two Dataframes
Convert Hour:Minute:Second (Hh:Mm:Ss) String to Proper Time Class
Conditional Replacement of Values in a Data.Frame