Garbage Collection Libraries in C++

C++ garbage collection

This question cannot be answered in general. There are different systems that may be regarded as garbage collection for C++; for example, Herb Sutter's deferred_ptr is basically a garbage collecting smart pointer. I've personally implemented another version of this idea, similar to Sutter's but less fancy.

I can answer about Boehm, however. How the Boehm garbage collector recognizes pointers when it does its "mark" phase, is basically by scanning memory and assuming that things that look like pointers are pointers.

The garbage collector knows all the areas of memory where user data is and it knows all of the pointers that it has allocated and how big those allocations were. It just looks for chains of pointers starting from "root segments" defined as below, where by "look" we mean explicitly scanning memory for 64 bit values that are the same as one of the GC allocations it has done.

From here:

Since it cannot generally tell where pointer variables are located, it
scans the following root segments for pointers:

  • The registers. Depending on the architecture, this may be done using assembly code, or by calling a setjmp-like function which
    saves register contents on the stack.
  • The stack(s). In the case of a single-threaded application, on most platforms this is done by scanning the memory between (an

    approximation of) the current stack pointer and GC_stackbottom. (For

    Itanium, the register stack scanned separately.) The GC_stackbottom

    variable is set in a highly platform-specific way depending on the

    appropriate configuration information in gcconfig.h. Note that the

    currently active stack needs to be scanned carefully, since

    callee-save registers of client code may appear inside collector

    stack frames, which may change during the mark process. This is

    addressed by scanning some sections of the stack "eagerly",

    effectively capturing a snapshot at one point in time.
  • Static data region(s). In the simplest case, this is the region between DATASTART and DATAEND, as defined in gcconfig.h. However, in

    most cases, this will also involve static data regions associated

    with dynamic libraries. These are identified by the mostly

    platform-specific code in dyn_load.c.

The address space for 64-bit pointers is huge so false positives will be rare, but even if they occur, false positives would just be leaks, that last as long as there happens to be some other variable in the memory the mark phase scans that is exactly the same value as some 64-bit pointer that was allocated by the garbage collector.

Garbage Collection in C++ -- why?

I keep hearing people complaining that C++ doesn't have garbage collection.

I am so sorry for them. Seriously.

C++ has RAII, and I always complain to find no RAII (or a castrated RAII) in Garbage Collected languages.

What advantages could garbage collection offer an experienced C++ developer?

Another tool.

Matt J wrote it quite right in his post (Garbage Collection in C++ -- why?): We don't need C++ features as most of them could be coded in C, and we don't need C features as most of them could coded in Assembly, etc.. C++ must evolve.

As a developer: I don't care about GC. I tried both RAII and GC, and I find RAII vastly superior. As said by Greg Rogers in his post (Garbage Collection in C++ -- why?), memory leaks are not so terrible (at least in C++, where they are rare if C++ is really used) as to justify GC instead of RAII. GC has non deterministic deallocation/finalization and is just a way to write a code that just don't care with specific memory choices.

This last sentence is important: It is important to write code that "juste don't care". In the same way in C++ RAII we don't care about ressource freeing because RAII do it for us, or for object initialization because constructor do it for us, it is sometimes important to just code without caring about who is owner of what memory, and what kind pointer (shared, weak, etc.) we need for this or this piece of code. There seems to be a need for GC in C++. (even if I personaly fail to see it)

An example of good GC use in C++

Sometimes, in an app, you have "floating data". Imagine a tree-like structure of data, but no one is really "owner" of the data (and no one really cares about when exactly it will be destroyed). Multiple objects can use it, and then, discard it. You want it to be freed when no one is using it anymore.

The C++ approach is using a smart pointer. The boost::shared_ptr comes to mind. So each piece of data is owned by its own shared pointer. Cool. The problem is that when each piece of data can refer to another piece of data. You cannot use shared pointers because they are using a reference counter, which won't support circular references (A points to B, and B points to A). So you must know think a lot about where to use weak pointers (boost::weak_ptr), and when to use shared pointers.

With a GC, you just use the tree structured data.

The downside being that you must not care when the "floating data" will really be destroyed. Only that it will be destroyed.

Conclusion

So in the end, if done properly, and compatible with the current idioms of C++, GC would be a Yet Another Good Tool for C++.

C++ is a multiparadigm language: Adding a GC will perhaps make some C++ fanboys cry because of treason, but in the end, it could be a good idea, and I guess the C++ Standards Comitee won't let this kind of major feature break the language, so we can trust them to make the necessary work to enable a correct C++ GC that won't interfere with C++: As always in C++, if you don't need a feature, don't use it and it will cost you nothing.

Was there a specific reason why garbage collection was not designed for C?

Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

  • Unpredictability
  • Cache pollution
  • Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.

Does C++ have a Garbage Collector (GC)?

Native C++ by default has no such thing (the closest thing to this are the smart pointers, but that's still something entirely different), but that doesn't prevent you from writing your own garbage collection solution (or using third party solution).

Managed C++ (and its successor C++/CLI) of course use .NET garbage collection for managed resources (though native resources are not garbage collected and have to be managed manually as in native C++).

Visual C++ GC Interface How to enable it and which library to include

Visual C++ does not implement garbage collection, so the questions of whether/how to enable it or which libraries it requires are moot.

The presence of the listed functions does not mean that a GC exists. It only means that VC++ implements the interfaces mandated by C++11 that would allow a GC to work. But there is none such provided as of the latest version 2019, and the VC++ implementation of those functions is just no-ops, with the "pointer safety model" returned as pointer_safety::relaxed i.e. none at all. Quoting from the VC++ memory header:

// GARBAGE COLLECTION
enum class pointer_safety { relaxed, preferred, strict };

inline void declare_reachable(void*) {}

template <class _Ty>
_Ty* undeclare_reachable(_Ty* _Ptr) {
return _Ptr;
}

inline void declare_no_pointers(char*, size_t) {}

inline void undeclare_no_pointers(char*, size_t) {}

inline pointer_safety get_pointer_safety() noexcept {
return pointer_safety::relaxed;
}

From Stroustrup's GC ABI FAQ:

relaxed: safely-derived and not safely-derived pointers are treated equivalently; like C and C++98 [...]

More on SO about C++11 GC:

  • Garbage Collection in C++11

  • C++11: what is its GC interface, and how to implement?

  • How to use Minimal GC in VC++ 2013?



Related Topics



Leave a reply



Submit