C++ Standard Library and Boehm Garbage Collector

C++ standard library and Boehm garbage collector

To answer partly my own question, the following code

// file myvec.cc
#include <gc/gc.h>
#include <gc/gc_cpp.h>
#include <gc/gc_allocator.h>
#include <vector>

class Myvec {
std::vector<int,gc_allocator<int> > _vec;
public:
Myvec(size_t sz=0) : _vec(sz) {};
Myvec(const Myvec& v) : _vec(v._vec) {};
const Myvec& operator=(const Myvec &rhs)
{ if (this != &rhs) _vec = rhs._vec; return *this; };
void resize (size_t sz=0) { _vec.resize(sz); };
int& operator [] (size_t ix) { return _vec[ix];};
const int& operator [] (size_t ix) const { return _vec[ix]; };
~Myvec () {};
};

extern "C" Myvec* myvec_make(size_t sz=0) { return new(GC) Myvec(sz); }
extern "C" void myvec_resize(Myvec*vec, size_t sz) { vec->resize(sz); }
extern "C" int myvec_get(Myvec*vec, size_t ix) { return (*vec)[ix]; }
extern "C" void myvec_put(Myvec*vec, size_t ix, int v) { (*vec)[ix] = v; }

when compiled with g++ -O3 -Wall -c myvec.cc produces an object file with

 % nm -C myvec.o
U GC_free
U GC_malloc
U GC_malloc_atomic
U _Unwind_Resume
0000000000000000 W std::vector<int, gc_allocator<int> >::_M_fill_insert(__gnu_cxx::__normal_iterator<int*, std::vector<int, gc_allocator<int> > >, unsigned long, int const&)
U std::__throw_length_error(char const*)
U __gxx_personality_v0
U memmove
00000000000000b0 T myvec_get
0000000000000000 T myvec_make
00000000000000c0 T myvec_put
00000000000000d0 T myvec_resize

So there is no plain malloc or ::operator new in the generated code.

So by using gc_allocator and new(GC) I apparently can be sure that plain ::opertor new or malloc is not used without my knowledge, and I don't need to redefine ::operator new


addenda (january 2017)

For future reference (thanks to Sergey Zubkov for mentioning it on Quora in a comment), see also n2670 and <memory> and garbage collection support (like std::declare_reachable, std::declare_no_pointers, std::pointer_safety etc...). However, that has not been implemented (except in the trivial but acceptable way of making it a no-op) in current GCC or Clang at least.

Is it possible to use Boehm garbage collector only for the part of the program?

The example in the manual states:

It is usually best not to mix garbage-collected allocation with the system malloc-free. If you do, you need to be careful not to store pointers to the garbage-collected heap in memory allocated with the system malloc.

And more specifically for C++:

In the case of C++, you need to be especially careful not to store pointers to the garbage-collected heap in areas that are not traced by the collector. The collector includes some alternate interfaces to make that easier.

Looking at the source code in the manual you will see the garbage-collected memory is handled through specific calls, hence, the management is handled separately (either by the collector or manually). So as long your library handles its internals properly and doesn't expose collected memory, you should be fine. You don't know how other libraries manage their memory and you can use them as well, don't you? :)

Precise mode in Boehm Garbage Collector

The file doc/gcinterface.html from the garbage collector (archive here) states:

void * GC_MALLOC_ATOMIC(size_t nbytes)
Allocates nbytes of storage. Requires (amortized) time
proportional to nbytes. The resulting object will be automatically
deallocated when unreferenced. The client promises that the resulting
object will never contain any pointers. The memory is not cleared.
This is the preferred way to allocate strings, floating point arrays,
bitmaps, etc. More precise information about pointer locations can be
communicated to the collector using the interface in gc_typed.h in the
distribution.

It looks like there is a "precise" interface that can be used.

garbage collector c++

Is there a reason you're rolling your own custom garbage collector? If all your objects are created dynamically, then why aren't you using boost's smart pointers (like boost::shared_ptr) which essentially uses RAII to give you a well tested garbage collection solution?

I ask because usually in the course of software development life cycle of a project, you end up fixing bugs in the code you wrote yourself (most of the time, at least). So is there a reason you're re-inventing the wheel?

Was there a specific reason why garbage collection was not designed for C?

Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

  • Unpredictability
  • Cache pollution
  • Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.

Does C++ have a Garbage Collector (GC)?

Native C++ by default has no such thing (the closest thing to this are the smart pointers, but that's still something entirely different), but that doesn't prevent you from writing your own garbage collection solution (or using third party solution).

Managed C++ (and its successor C++/CLI) of course use .NET garbage collection for managed resources (though native resources are not garbage collected and have to be managed manually as in native C++).

Can we manually operate the garbage collector in C or C++?

There is no garbage collector in either language. At least, not as part of standard compliant implementations.

Note that C++ had language restrictions which made it hard to implement garbage collection. Some of those rules have been relaxed in the latest standard, C++11. So in principle it would be possible to implement a standards compliant c++ garbage collector now.

The standard approach in C++ is to use smart pointers to automatically manage memory.

There's an interesting article here, containing some useful links. From the comments you might be able to see how difficult it is to reconcile GC with idiomatic C++.



Related Topics



Leave a reply



Submit