Differencebetween Gc() and Rm()

What is the difference between gc() and rm()

First, it is important to note that the two are very different in that gc does not delete any variables that you are still using- it only frees up the memory for ones that you no longer have access to (whether removed using rm() or, say, created in a function that has since returned). Running gc() will never make you lose variables.

The question of whether you should call gc() after calling rm(), though, is a good one. The documentation for gc helpfully notes:

A call of gc causes a garbage collection to take place. This will also take place automatically without user intervention, and the primary purpose of calling gc is for the report on memory usage.

However, it can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.

So the answer is that it can be good to call gc() (and at the very least, can't hurt), even though it would likely be triggered anyway (if not right away, then soon).

Why does gc() not free memory?

How do you check memory usage? Normally virtual machine allocates some chunk of memory that it uses to store its data. Some of the allocated may be unused and marked as free. What GC does is discovering data that is not referenced from anywhere else and marking corresponding chunks of memory as unused, this does not mean that this memory is released to the OS. Still from the VM perspective there's now more free memory that can be used for further computation.

As others asked did you experience out of memory errors? If not then there's nothing to worry about.

EDIT:
This and this should be enough to understand how memory allocation and garbage collection works in R.

From the first document:

Occasionally an attempt is made to release unused pages back to the
operating system. When pages are released, a number of free nodes
equal to R_MaxKeepFrac times the number of allocated nodes for each
class is retained. Pages not needed to meet this requirement are
released. An attempt to release pages is made every R_PageReleaseFreq level 1
or level 2 collections.

EDIT2:

To see used memory try running gc() with verbose set to TRUE:

gc(verbose=T)

Here's a result with an array of 10'000'000 integers in memory:

Garbage collection 9 = 1+0+8 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
40.6 Mbytes of vectors used (72%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198838 10.7 407500 21.8 350000 18.7
Vcells 5311050 40.6 7421749 56.7 5311504 40.6

And here's after discarding reference to it:

Garbage collection 10 = 1+0+9 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
2.4 Mbytes of vectors used (5%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198821 10.7 407500 21.8 350000 18.7
Vcells 310987 2.4 5937399 45.3 5311504 40.6

As you can see memory used by Vcells fell from 40.6Mb to 2.4Mb.

Forcing garbage collection to run in R with the gc() command

"Probably." I do it too, and often even in a loop as in

cleanMem <- function(n=10) { for (i in 1:n) gc() }

Yet that does not, in my experience, restore memory to a pristine state.

So what I usually do is to keep the tasks at hand in script files and execute those using the 'r' frontend (on Unix, and from the 'littler' package). Rscript is an alternative on that other OS.

That workflow happens to agree with

  • workflow-for-statistical-analysis-and-report-writing
  • tricks-to-manage-the-available-memory-in-an-r-session

which we covered here before.

How does Rust's memory management differ from compile-time garbage collection?

Compile-time garbage collection is commonly defined as follows:

A complementary form of automatic memory management is compile-time memory management (CTGC), where the decisions for memory management are taken at compile-time instead of at run-time. The compiler determines the life-time of the variables that are created during the execution of the program, and thus also the memory that will be associated with these variables. Whenever the compiler can guarantee that a variable, or more precisely, parts of the memory resources that this variable points to at run-time, will never ever be accessed beyond a certain program instruction, then the compiler can add instructions to deallocate these resources at that particular instruction without compromising the correctness of the resulting code.

(From Compile-Time Garbage Collection for the Declarative Language Mercury by Nancy Mazur)

Rust handles memory by using a concept of ownership and borrow checking. Ownership and move semantics describe which variable owns a value. Borrowing describes which references are allowed to access a value. These two concepts allow the compiler to "drop" the value when it is no longer accessible, causing the program to call the dtop method from the Drop trait).

However, the compiler itself doesn't handle dynamically allocated memory at all. It only handles drop checking (figuring out when to call drop) and inserting the .drop() calls. The drop implementation is responsible for determining what happens at this point, whether that is deallocating some dynamic memory (which is what Box's drop does, for example), or doing anything else. The compiler therefore never really enforces garbage collection, and it doesn't enforce deallocating unused memory. So we can't claim that Rust implements compile-time garbage collection, even if what Rust has is very reminiscent of it.



Related Topics



Leave a reply



Submit