Garbage Collection in C++ -- Why

Garbage Collection in C++ -- why?

I keep hearing people complaining that C++ doesn't have garbage collection.

I am so sorry for them. Seriously.

C++ has RAII, and I always complain to find no RAII (or a castrated RAII) in Garbage Collected languages.

What advantages could garbage collection offer an experienced C++ developer?

Another tool.

Matt J wrote it quite right in his post (Garbage Collection in C++ -- why?): We don't need C++ features as most of them could be coded in C, and we don't need C features as most of them could coded in Assembly, etc.. C++ must evolve.

As a developer: I don't care about GC. I tried both RAII and GC, and I find RAII vastly superior. As said by Greg Rogers in his post (Garbage Collection in C++ -- why?), memory leaks are not so terrible (at least in C++, where they are rare if C++ is really used) as to justify GC instead of RAII. GC has non deterministic deallocation/finalization and is just a way to write a code that just don't care with specific memory choices.

This last sentence is important: It is important to write code that "juste don't care". In the same way in C++ RAII we don't care about ressource freeing because RAII do it for us, or for object initialization because constructor do it for us, it is sometimes important to just code without caring about who is owner of what memory, and what kind pointer (shared, weak, etc.) we need for this or this piece of code. There seems to be a need for GC in C++. (even if I personaly fail to see it)

An example of good GC use in C++

Sometimes, in an app, you have "floating data". Imagine a tree-like structure of data, but no one is really "owner" of the data (and no one really cares about when exactly it will be destroyed). Multiple objects can use it, and then, discard it. You want it to be freed when no one is using it anymore.

The C++ approach is using a smart pointer. The boost::shared_ptr comes to mind. So each piece of data is owned by its own shared pointer. Cool. The problem is that when each piece of data can refer to another piece of data. You cannot use shared pointers because they are using a reference counter, which won't support circular references (A points to B, and B points to A). So you must know think a lot about where to use weak pointers (boost::weak_ptr), and when to use shared pointers.

With a GC, you just use the tree structured data.

The downside being that you must not care when the "floating data" will really be destroyed. Only that it will be destroyed.

Conclusion

So in the end, if done properly, and compatible with the current idioms of C++, GC would be a Yet Another Good Tool for C++.

C++ is a multiparadigm language: Adding a GC will perhaps make some C++ fanboys cry because of treason, but in the end, it could be a good idea, and I guess the C++ Standards Comitee won't let this kind of major feature break the language, so we can trust them to make the necessary work to enable a correct C++ GC that won't interfere with C++: As always in C++, if you don't need a feature, don't use it and it will cost you nothing.

Why does C not require a garbage collector?

First of all, lets be clear about what garbage is.

The Java definition of garbage is objects that are no longer reachable. The precise meaning of reachable is a bit abstruse, but a practical definition is that if you can get to an object by following references (pointers) from well known places like thread stacks or static variables, then it may be reachable. (In practice, some imprecision is OK, so long as objects that are reachable don't get deleted.)

You could try to apply the same definition to C and C++. An object is garbage if it cannot be reached.

However, the practical problem with this definition ... and garbage collection ... in C or C++ is whether a "pointer like" value is actually a valid pointer. For instance:

An uninitialized C variable can contain a random value that looks like a pointer to an object.
When a C union type that overlays a pointer with an long, a garbage collector cannot be sure whether the union contains one or the other ... or both.
When C application code "compresses" pointers to word aligned heap nodes by dividing them by 4 or 8, a garbage collector won't detect them as "pointer like". Or if it does, it will misinterpret them.
A similar issues is when C application code represents pointers as offsets relative to something else.

However, it is clear that a C program can call malloc, forget to call free, and then forget the address of the heap node. That node is garbage.

There are two reasons why C / C++ doesn't have garbage collection.

It is "culturally inappropriate". The culture of these languages is to leave storage management to the programmer.
It would be technically difficult (and expensive) to implement a precise garbage collector for C / C++. Indeed, doing this would involve things that made the language implementation slow.
Imprecise (i.e. conservative) garbage collectors are practical, but they have performance and (I have heard) reliability issues. (For instance, a conservative collector cannot move non-garbage objects.)

It would be simpler if the implementer (of a C / C++ garbage collector) could assume that the programmer only wrote code that strictly conformed to the C / C++ specs. But they don't.

But your answer seems to be, why did they design C like that?

Questions like that can only be answered authoritatively by the designers (in this case, the late Dennis Ritchie) or their writings.

As you point out in the question, C was designed to be simple and "close to the hardware".

However, C was designed in the early 1970's. In those days programming languages which required a garbage collector were rare, and GC techniques were not as advanced as they are now.

And even now, it is still a fact that garbage collected languages (like Java) are not suitable for applications that require predictable "real-time" performance.

In short, I suspect that the designers were of the view that garbage collection would make the language impractical for its intended purpose.

Was there a specific reason why garbage collection was not designed for C?

Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

Unpredictability
Cache pollution
Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.

Why to use garbage collector?

To be more productive. In other words, the programmer can focus on writing the bits that is unique for his particular problem.

How does garbage collection and scoping work in C#?

The dotnet GC engine is a mark-and-sweep engine rather than a reference-counter engine like you're used to in python. The system doesn't maintain a count of references to a variable, but rather runs a "collection" when it needs to reclaim RAM, marking all of the currently-reachable pointers, and removing all the pointers that aren't reachable (and therefore are out of scope).

You can find out more about how it works here:

http://msdn.microsoft.com/en-us/library/ee787088.aspx

The system finds "reachable" objects by starting at specific "root" locations, like global objects and objects on the stack, and traces all objects referenced by those, and all the objects referenced by those, etc., until it's built a complete tree. This is faster than it sounds.

Can we manually operate the garbage collector in C or C++?

There is no garbage collector in either language. At least, not as part of standard compliant implementations.

Note that C++ had language restrictions which made it hard to implement garbage collection. Some of those rules have been relaxed in the latest standard, C++11. So in principle it would be possible to implement a standards compliant c++ garbage collector now.

The standard approach in C++ is to use smart pointers to automatically manage memory.

There's an interesting article here, containing some useful links. From the comments you might be able to see how difficult it is to reconcile GC with idiomatic C++.

Garbage collection in C# not carried out. Why?

The .NET garbage collector is an highly optimized, complicated beast of software. It is optimized to make your program run as fast as possible and using not too much memory in doing so.

Because the process of freeing memory takes some time, the garbage collector often waits to run it until your program uses a whole lot of memory. Then it does all the work at once, which results in a small delay of your program after a relatively long time (instead of many smaller delays earlier, which would slow down your program).

All this means, that the time the garbage collector runs is not predictable.

You may call your test several times (with some Sleep() in the loop) and watch memory usage slowly building up. When your program begins to consume a significant portion of available physical memory its memory usage will suddenly drop to near-zero.

There are a couple of functions (like GC.Collect()) which force several levels of garbage collection, but it's strongly advised not to use them unless you know what you are doing, because this tends to make your software slower and stops the garbage collector in doing its work in an optimal way.

Is garbage collection automatic in standard C++?

The long answer to it is that for every time new is called, somewhere, somehow, delete must be called, or some other deallocation function (depends on the memory allocator etc.)

But you don't need to be the one supplying the delete call:

There is garbage collection for C++, in the form of the Hans-Boehm Garbage Collector. There is also probably other garbage collection libraries.
You can use smart pointers, which use RAII (and reference counting if the pointer allows shared access) to determine when to delete the object. A good smart pointer library is Boost's smart pointer. Smart pointers in the vast majority of cases can replace raw pointers.
Some application frameworks, like Qt, build object trees, such that there is a parent child relationship for the framework's heap allocated objects. As a result, all is needed is for a delete to be called on an object, and all its children will automatically be deleted as well.

If you don't want to use any of these techniques, to safeguard against memory leaks, you can try using a memory checking tool. Valgrind is particularly good, although it only works on Linux

As for .NET, yes, allocating using gcnew means that the memory is tracked by .NET, so no leaks. Other resources however, like file handles etc. are not managed by the GC.

Why doesn't C++ have a garbage collector?

Implicit garbage collection could have been added in, but it just didn't make the cut. Probably due to not just implementation complications, but also due to people not being able to come to a general consensus fast enough.

A quote from Bjarne Stroustrup himself:

I had hoped that a garbage collector
which could be optionally enabled
would be part of C++0x, but there were
enough technical problems that I have
to make do with just a detailed
specification of how such a collector
integrates with the rest of the
language, if provided. As is the case
with essentially all C++0x features,
an experimental implementation exists.

There is a good discussion of the topic here.

General overview:

C++ is very powerful and allows you to do almost anything. For this reason it doesn't automatically push many things onto you that might impact performance. Garbage collection can be easily implemented with smart pointers (objects that wrap pointers with a reference count, which auto delete themselves when the reference count reaches 0).

C++ was built with competitors in mind that did not have garbage collection. Efficiency was the main concern that C++ had to fend off criticism from in comparison to C and others.

There are 2 types of garbage collection...

Explicit garbage collection:

C++0x has garbage collection via pointers created with shared_ptr

If you want it you can use it, if you don't want it you aren't forced into using it.

For versions before C++0x, boost:shared_ptr exists and serves the same purpose.

Implicit garbage collection:

It does not have transparent garbage collection though. It will be a focus point for future C++ specs though.

Why Tr1 doesn't have implicit garbage collection?

There are a lot of things that tr1 of C++0x should have had, Bjarne Stroustrup in previous interviews stated that tr1 didn't have as much as he would have liked.

Are there practical uses of C++11's Garbage Collection ABI?

No, there is no currently practical usage of C++11 GC interface as there is no compiler which fully supports this API in the meantime. Also, C++11 standard declares this API as optional and there is no movement seen to implement it in the major compilers (but as Jesse Good notes MSVC already does support it).

Also you should look this post, it has related information: Why garbage collection when RAII is available?

Garbage Collection in C++ -- Why