Understanding Garbage Collection in .Net

Understanding garbage collection in .NET

You are being tripped up here and drawing very wrong conclusions because you are using a debugger. You'll need to run your code the way it runs on your user's machine. Switch to the Release build first with Build + Configuration manager, change the "Active solution configuration" combo in the upper left corner to "Release". Next, go into Tools + Options, Debugging, General and untick the "Suppress JIT optimization" option.

Now run your program again and tinker with the source code. Note how the extra braces have no effect at all. And note how setting the variable to null makes no difference at all. It will always print "1". It now works the way you hope and expected it would work.

Which does leave with the task of explaining why it works so differently when you run the Debug build. That requires explaining how the garbage collector discovers local variables and how that's affected by having a debugger present.

First off, the jitter performs two important duties when it compiles the IL for a method into machine code. The first one is very visible in the debugger, you can see the machine code with the Debug + Windows + Disassembly window. The second duty is however completely invisible. It also generates a table that describes how the local variables inside the method body are used. That table has an entry for each method argument and local variable with two addresses. The address where the variable will first store an object reference. And the address of the machine code instruction where that variable is no longer used. Also whether that variable is stored on the stack frame or a cpu register.

This table is essential to the garbage collector, it needs to know where to look for object references when it performs a collection. Pretty easy to do when the reference is part of an object on the GC heap. Definitely not easy to do when the object reference is stored in a CPU register. The table says where to look.

The "no longer used" address in the table is very important. It makes the garbage collector very efficient. It can collect an object reference, even if it is used inside a method and that method hasn't finished executing yet. Which is very common, your Main() method for example will only ever stop executing just before your program terminates. Clearly you would not want any object references used inside that Main() method to live for the duration of the program, that would amount to a leak. The jitter can use the table to discover that such a local variable is no longer useful, depending on how far the program has progressed inside that Main() method before it made a call.

An almost magic method that is related to that table is GC.KeepAlive(). It is a very special method, it doesn't generate any code at all. Its only duty is to modify that table. It extends the lifetime of the local variable, preventing the reference it stores from getting garbage collected. The only time you need to use it is to stop the GC from being to over-eager with collecting a reference, that can happen in interop scenarios where a reference is passed to unmanaged code. The garbage collector cannot see such references being used by such code since it wasn't compiled by the jitter so doesn't have the table that says where to look for the reference. Passing a delegate object to an unmanaged function like EnumWindows() is the boilerplate example of when you need to use GC.KeepAlive().

So, as you can tell from your sample snippet after running it in the Release build, local variables can get collected early, before the method finished executing. Even more powerfully, an object can get collected while one of its methods runs if that method no longer refers to this. There is a problem with that, it is very awkward to debug such a method. Since you may well put the variable in the Watch window or inspect it. And it would disappear while you are debugging if a GC occurs. That would be very unpleasant, so the jitter is aware of there being a debugger attached. It then modifies the table and alters the "last used" address. And changes it from its normal value to the address of the last instruction in the method. Which keeps the variable alive as long as the method hasn't returned. Which allows you to keep watching it until the method returns.

This now also explains what you saw earlier and why you asked the question. It prints "0" because the GC.Collect call cannot collect the reference. The table says that the variable is in use past the GC.Collect() call, all the way up to the end of the method. Forced to say so by having the debugger attached and by running the Debug build.

Setting the variable to null does have an effect now because the GC will inspect the variable and will no longer see a reference. But make sure you don't fall in the trap that many C# programmers have fallen into, actually writing that code was pointless. It makes no difference whatsoever whether or not that statement is present when you run the code in the Release build. In fact, the jitter optimizer will remove that statement since it has no effect whatsoever. So be sure to not write code like that, even though it seemed to have an effect.


One final note about this topic, this is what gets programmers in trouble that write small programs to do something with an Office app. The debugger usually gets them on the Wrong Path, they want the Office program to exit on demand. The appropriate way to do that is by calling GC.Collect(). But they'll discover that it doesn't work when they debug their app, leading them into never-never land by calling Marshal.ReleaseComObject(). Manual memory management, it rarely works properly because they'll easily overlook an invisible interface reference. GC.Collect() actually works, just not when you debug the app.

Understanding garbage collector behavior for a local variable

@Porges' comments explain everything very well:

Try building & running it in Release mode, without the debugger attached. I get the expected behaviour there but not in Debug.

...

ie. running with Ctrl-F5, not just F5. It collects it instantly for me in each of .NET 4/4.5/4.5.1. But yes, you can't really rely on this behaviour.

The Release build and Ctrl-F5 brought back the expected behavior. I urge @Porges to post this as an answer, which I'd up-vote and accept with thanks.

As a follow-up, I'd like to feature the following interesting behavior. Now with Release + Ctrl-F5, even if I un-comment the // GC.KeepAlive(callback) line in my code, the callback still gets garbage-collected. Apparently, this is because the compiler recognizes this line as unreachable due to while (true) loop and still doesn't emit a strong reference on callback.

The following is the correct pattern:

static void Test(CancellationToken token)
{
Callback callback = new Callback();

try
{
while (true)
{
token.ThrowIfCancellationRequested();

// for the GC
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();

Thread.Sleep(100);
}
}
finally
{
GC.KeepAlive(callback);
}
}

It's also interesting to look at the GC.KeepAlive implementation:

[MethodImpl(MethodImplOptions.NoInlining)]
public static void KeepAlive(object obj)
{
}

As expected, it does nothing and merely servers as a hint to the compiler to generate IL code which keeps a strong reference to the object, up to the point where KeepAlive is called. MethodImplOptions.NoInlining is very relevant here to prevent any optimizations like above.

What makes C# Garbage Collection slow?

There's no a single answer to this question but there is a general "The one rule to remember" from Maoni Stephens, the lead developer of .NET GC:

"What survives usually determines how much work GC needs to do; what doesn't survive usually determines how often a GC is triggered."

The relation between "how much work GC needs to do" and pauses that it introduce (as it seems you are mostly concerned about this aspect) is a different story and heavily depends on the specific GC implementation.

As I feel you are interested in GC internals, I strongly recommend to read the linked document and also watch my ".NET GC Internals" series.

.NET server garbage collection and object lifetime

There's quite a bit going on in this question, so I'll tackle the overarching issues first.

  1. The variable declared in a using statement will not be garbage collected before the end of the block, for exactly the reason you indicated - a reference is held in order to call Dispose() in the implicit finally block.

  2. If you find yourself writing a finalizer in C#, you are probably doing something wrong. If your finalizer in C# calls Stream.Dispose(), you are definitely doing something wrong. Outside of an implementation of .NET itself, I have seen hundreds of misused finalizers, and exactly 1 finalizer that was actually needed. For more information, see DG Update: Dispose, Finalization, and Resource Management.

  3. ObjectDisposedException has nothing to do with finalization. This exception generally occurs when code calls Dispose() on an object (not finalize), and then a later call is made to do something with the object.

    Sometimes it's not obvious when code is disposing of a Stream. One case that surprised me was using StreamContent as part of sending an HTTP request using HttpClient. The implementation calls Stream.Dispose() after sending the request, so I had to write a wrapper Stream class called DelegatingStream to preserve our library's original behavior during the conversion from HttpWebRequest to HttpClient.

One scenario where you could see an ObjectDisposedException is if the getStream() method caches a Stream instance and returns it for multiple future calls. If Request.Dispose() disposes of the stream, or if DoStuff2(Stream) disposes of the stream, then the next time you attempt to use the stream you'll get an ObjectDisposedException.

How does garbage collection and scoping work in C#?

The dotnet GC engine is a mark-and-sweep engine rather than a reference-counter engine like you're used to in python. The system doesn't maintain a count of references to a variable, but rather runs a "collection" when it needs to reclaim RAM, marking all of the currently-reachable pointers, and removing all the pointers that aren't reachable (and therefore are out of scope).

You can find out more about how it works here:

http://msdn.microsoft.com/en-us/library/ee787088.aspx

The system finds "reachable" objects by starting at specific "root" locations, like global objects and objects on the stack, and traces all objects referenced by those, and all the objects referenced by those, etc., until it's built a complete tree. This is faster than it sounds.

Explicitly calling garbage collection in .NET

This depends on how you trigger the GC.

GCCollectionMode:

Default The default setting for this enumeration, which is currently Forced.

Forced Forces the garbage collection to occur immediately.

Optimized Allows the garbage collector to determine whether the current time is optimal to reclaim objects.

If you call the parameterless overload or pass GCCollectionMode.Default it currently forces a GC, but in theory that behaviour may change in future versions of .NET.

If you pass GCCollectionMode.Forced it forces an immediate GC.

If you pass GCCollectionMode.Optimized it's only a hint. I don't know how seriously the runtime treats this hint.

So if you want to either force a GC or make sure that it's only a hint, use the Collect(int generation, GCCollectionMode mode) overload.



Related Topics



Leave a reply



Submit