.Net Vs Java Garbage Collector

Garbage collection in C# and Java

Usually in a function memory is allocated in the stack for non-object variables and when a function completes its execution, the stack is cleared and memory is freed.

For objects, memory is allocated in the heap (You will remember malloc() and free() in C). But in Java and C# , the free() function is what the garbage collector does for you instead of you worrying about it.

So even in functions, the objects are local variables but not stored in stack but on heap. So they are not the same as int i. But when the function is completed, those objects are out of scope. So you will no longer have access to them but their memory is not freed until garbage collector runs and clears them.

But how a garbage collector runs, when it runs is all based on different algorithm. They may not be the same for even different implementations of Java (e.g sun java may have different algorithm than another Java implementation)

What are the fundamental differences between garbage collection in C# and Java?

The advice you've been given is, broadly speaking, a load of hooey.

Both C# and Java have GCs that attempt to optimise the fast recovery of lots of small objects. They're designed to solve the same problem, they do it in slightly different ways but as a user the technical differences in your approach to using them is minimal, even non-existent for the majority of users.

IDisposable is nothing to do with the GC as such. It's a standard way of naming methods that would otherwise be called close, destroy, dispose, etc., and often are called that in Java. There is a proposal for Java 7 to add something very similar to the using keyword that would call a similar close method.

"Destructors" in C# refers to finalizers - this was done deliberately to confuse C++ programmers. :) The CLR spec itself calls them finalizers, exactly as the JVM does.

There are many ways in which Java and C#/CLR differ (user value types, properties, generics and the whole family of related features known as Linq), but the GC is one area where you can develop a substantial amount of software before you need to worry much about the difference between them.

Is a garbage collector (.net/java) an issue for real-time systems?

To be precise, garbage collectors are a problem for real-time systems. To be even more precise, it is possible to write real-time software in languages that have automatic memory management.

More details can be found in the Real Time Specification for Java on one of the approaches for achieving real-time behavior using Java. The idea behind RTSJ is very simple - do not use a heap. RTSJ provides for new varieties of Runnable objects that ensure threads do not access heap memory of any kind. Threads can either access scoped memory (nothing unusual here; values are destroyed when the scope is closed) or immortal memory (that exists throughout the application lifetime). Variables in the immortal memory are written over, time and again with new values.

Through the use of immortal memory, RTSJ ensures that threads do not access the heap, and more importantly, the system does not have a garbage collector that preempts execution of the program by the threads.

More details are available in the paper "Project Golden Gate: Towards Real-Time Java in Space Missions" published by JPL and Sun.

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

You seem to be asking two things:

  • have GC's improved since that research was performed, and
  • can I use the conclusions of the paper as a formula to predict required memory.

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:

  • GC'ed memory management still requires significantly more virtual memory.
  • If you try to constrain the heap size the GC performance drops significantly.
  • If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.

However, the conclusions cannot really be used as a formula:

  • The original study was done with JikesRVM rather than a Sun JVM.
  • The Sun JVM's garbage collectors have improved in the ~5 years since the study.
  • The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.

On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.

These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:

  • Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
  • The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
  • The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.

1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

Explicitly calling garbage collection in .NET

This depends on how you trigger the GC.

GCCollectionMode:

Default The default setting for this enumeration, which is currently Forced.

Forced Forces the garbage collection to occur immediately.

Optimized Allows the garbage collector to determine whether the current time is optimal to reclaim objects.

If you call the parameterless overload or pass GCCollectionMode.Default it currently forces a GC, but in theory that behaviour may change in future versions of .NET.

If you pass GCCollectionMode.Forced it forces an immediate GC.

If you pass GCCollectionMode.Optimized it's only a hint. I don't know how seriously the runtime treats this hint.

So if you want to either force a GC or make sure that it's only a hint, use the Collect(int generation, GCCollectionMode mode) overload.

Is memory cleared before garbage collection?

Practically speaking, no, this doesn't happen. Overwriting memory you've just freed takes time, so there are performance penalties. "Secure" objects like SecureString are just wiping themselves, not relying on the GC.

More broadly, it depends very much on that particular implementation of that particular language. Every language that assumes the existence of a GC (like C#) specifies different rules about how and when garbage collection should happen.

To take your C# example, the C# specification does not require that objects be overwritten after being freed, and it doesn't forbid it either:

Finally, at some time after the object becomes eligible for collection, the garbage collector frees the memory associated with that object.

§3.9 C# 5.0 Language Specification

If the memory is later assigned to a reference type, you'll have a constructor that does your own custom initialization. If the memory is later assigned to a value type, it gets zeroed out before you can start reading from it:

Initialization to default values is typically done by having the memory manager or garbage collector initialize memory to all-bits-zero before it is allocated for use. For this reason, it is convenient to use all-bits-zero to represent the null reference.

§5.2 C# 5.0 Language Specification

Additionally, there's at least two implementations of C# -- Microsoft's implementation and Mono's implementation, so just saying "C#" isn't specific enough. Each implementation might decide to overwrite memory (or not).



Related Topics



Leave a reply



Submit