Raii and Smart Pointers in C++

RAII and smart pointers in C++

A simple (and perhaps overused) example of RAII is a File class. Without RAII, the code might look something like this:

File file("/path/to/file");
// Do stuff with file
file.close();

In other words, we must make sure that we close the file once we've finished with it. This has two drawbacks - firstly, wherever we use File, we will have to called File::close() - if we forget to do this, we're holding onto the file longer than we need to. The second problem is what if an exception is thrown before we close the file?

Java solves the second problem using a finally clause:

try {
File file = new File("/path/to/file");
// Do stuff with file
} finally {
file.close();
}

or since Java 7, a try-with-resource statement:

try (File file = new File("/path/to/file")) {
// Do stuff with file
}

C++ solves both problems using RAII - that is, closing the file in the destructor of File. So long as the File object is destroyed at the right time (which it should be anyway), closing the file is taken care of for us. So, our code now looks something like:

File file("/path/to/file");
// Do stuff with file
// No need to close it - destructor will do that for us

This cannot be done in Java since there's no guarantee when the object will be destroyed, so we cannot guarantee when a resource such as file will be freed.

Onto smart pointers - a lot of the time, we just create objects on the stack. For instance (and stealing an example from another answer):

void foo() {
std::string str;
// Do cool things to or using str
}

This works fine - but what if we want to return str? We could write this:

std::string foo() {
std::string str;
// Do cool things to or using str
return str;
}

So, what's wrong with that? Well, the return type is std::string - so it means we're returning by value. This means that we copy str and actually return the copy. This can be expensive, and we might want to avoid the cost of copying it. Therefore, we might come up with idea of returning by reference or by pointer.

std::string* foo() {
std::string str;
// Do cool things to or using str
return &str;
}

Unfortunately, this code doesn't work. We're returning a pointer to str - but str was created on the stack, so we be deleted once we exit foo(). In other words, by the time the caller gets the pointer, it's useless (and arguably worse than useless since using it could cause all sorts of funky errors)

So, what's the solution? We could create str on the heap using new - that way, when foo() is completed, str won't be destroyed.

std::string* foo() {
std::string* str = new std::string();
// Do cool things to or using str
return str;
}

Of course, this solution isn't perfect either. The reason is that we've created str, but we never delete it. This might not be a problem in a very small program, but in general, we want to make sure we delete it. We could just say that the caller must delete the object once he's finished with it. The downside is that the caller has to manage memory, which adds extra complexity, and might get it wrong, leading to a memory leak i.e. not deleting object even though it is no longer required.

This is where smart pointers come in. The following example uses shared_ptr - I suggest you look at the different types of smart pointers to learn what you actually want to use.

shared_ptr<std::string> foo() {
shared_ptr<std::string> str = new std::string();
// Do cool things to or using str
return str;
}

Now, shared_ptr will count the number of references to str. For instance

shared_ptr<std::string> str = foo();
shared_ptr<std::string> str2 = str;

Now there are two references to the same string. Once there are no remaining references to str, it will be deleted. As such, you no longer have to worry about deleting it yourself.

Quick edit: as some of the comments have pointed out, this example isn't perfect for (at least!) two reasons. Firstly, due to the implementation of strings, copying a string tends to be inexpensive. Secondly, due to what's known as named return value optimisation, returning by value may not be expensive since the compiler can do some cleverness to speed things up.

So, let's try a different example using our File class.

Let's say we want to use a file as a log. This means we want to open our file in append only mode:

File file("/path/to/file", File::append);
// The exact semantics of this aren't really important,
// just that we've got a file to be used as a log

Now, let's set our file as the log for a couple of other objects:

void setLog(const Foo & foo, const Bar & bar) {
File file("/path/to/file", File::append);
foo.setLogFile(file);
bar.setLogFile(file);
}

Unfortunately, this example ends horribly - file will be closed as soon as this method ends, meaning that foo and bar now have an invalid log file. We could construct file on the heap, and pass a pointer to file to both foo and bar:

void setLog(const Foo & foo, const Bar & bar) {
File* file = new File("/path/to/file", File::append);
foo.setLogFile(file);
bar.setLogFile(file);
}

But then who is responsible for deleting file? If neither delete file, then we have both a memory and resource leak. We don't know whether foo or bar will finish with the file first, so we can't expect either to delete the file themselves. For instance, if foo deletes the file before bar has finished with it, bar now has an invalid pointer.

So, as you may have guessed, we could use smart pointers to help us out.

void setLog(const Foo & foo, const Bar & bar) {
shared_ptr<File> file = new File("/path/to/file", File::append);
foo.setLogFile(file);
bar.setLogFile(file);
}

Now, nobody needs to worry about deleting file - once both foo and bar have finished and no longer have any references to file (probably due to foo and bar being destroyed), file will automatically be deleted.

The difference between RAII and smart pointers in C++

RAII is the idea of using C++'s automatic call of a destructor, to release resources acquired in a constructor.

The acronym indicates that only vaguely, Resource Acquisition Is Initialization.

A smart pointer is a class that overloads at least operator-> and the dereference operator* to enable use with pointer notation. Typically a smart pointer will use RAII techniques to automatically deallocate memory. But it can do other things. It is however implicit that a smart pointer deals somehow with ”ownership” of a contained raw pointer. For example, a simple iterator class overloads operator-> and operator* but is not regarded as a smart pointer.

Is smart pointer a good practice of RAII?

From cppreference:

Resource Acquisition Is Initialization or RAII, is a C++ programming technique which binds the life cycle of a resource that must be acquired before use (allocated heap memory, thread of execution, open socket, open file, locked mutex, disk space, database connection—anything that exists in limited supply) to the lifetime of an object.

std::shared_ptr is definitely RAII as it aquires a resource and binds its lifetime to its own, thus taking over the responsibility of releasing/destructing the resource. This is the core principle of RAII.

The term RRID (Resource Release Is Destruction) is rarely seen and its meaning seems to be somewhat ambiguous. Mostly it is used with the same meaning as RAII.

IMHO many discussions revolving around variants of RAII deem from interpreting the meaning of the term too exactly. RAII is meant to represent a concept of object life-time management.

Any reason to use raw pointers to do RAII? C++11/14

Are there any reasons to still use raw pointers (for managed
resources) in C++11/14?

I assume that by "managed resources" you mean "owned resources".

Yes there are reasons:

  1. As you are inferring in your question, sometime you want to refer to an object or none, and it can change through time. You have to use a raw pointer in this case, because there is no alternative right now in this specific case. There might be later as there is a proposal about adding a "dumb" non-owning pointer which just clarify the role of the pointer (observe/refer, not own). Meanwhile the recommendation is to avoid new/delete if you can and use raw pointer only as "refers to without owning" kind of re-assignable reference.
  2. You still need raw pointers for implementation of non-raw pointers and any low level RAII construct. Not everybody needs to work on fundamental libraries, but of course if you do, then you need basic constructs to work with. For example in my domain I often have to build up custom "object pool" systems in different ways. At some point in the implementation you have to manipulate raw memory which are not objects yet, so you need to use raw pointers to work with that.
  3. When communicating with C interfaces, you have no other choices than to pass raw pointers to functions taking them. A lot of C++ developers have to do this, so it's just in some very specific regions of your code that you will have to use them.
  4. Most companies using C++ don't have a "modern C++" experience of C++ and work with code that use a lot of pointers where it is not actually necessary. So most of the time when you add code to their codebase, you might be forced by the code-environment, politics, peer-pressure and conventions of the company to use pointers where in a more "modern c++" usage kind of company it would not pass peer-review. So consider the political/historic/social/knowledge-base-of-coworkers context too when choosing your techniques. Or make sure companies/projects you work in match your way of doing things (this might be harder).

Should resource member variables in a class be held in their own smart
pointers for automatic RAII without need for cleanup in destructor?

Resource member variables, in the best case, should just be members without being obvious pointers, not even smart pointers. Smart pointers are a "bridge" between code manipulating raw pointers and pure RAII style. If you have total control over some code and it's new code, you can totally avoid making any use of smart pointer in your interfaces. Maybe you will need them in your implementations though. Keep in mind that there is no actual rules, only recommendations of what you could result in if you

Is the implementation of smart pointers inlined that there is no
overhead in doing so?

Implementation of standard smart pointers is as efficient as they can be so yes most of their code is inlined. However, they are not always free, it depends on what they actually do. For example, in almost all cases, unique_ptr is exactly one raw pointer, with just additional checks around it's places of use. So it's "free". shared_ptr on the other hand have to maintain a counter of how many other shared_ptr refer to the same object. That counter can be changed on several threads doing copies of the shared_ptr, so it have to be atomic. Changing the value of atomic counters is not always free, and you should always assume that there is a higher cost than copying a raw pointer.

So "it depends".

Just:

  • use RAII as much as you can, without exposing any kind of pointer in your interfaces (smart or not);
  • use standard smart pointers in implementations if you must use owning pointers;
  • use raw pointers only if you need to refer to object, null, or other objects changing through time, without owning them;
  • avoid raw pointers in interfaces except in case of allowing passing optional objects (when nullptr is a correct argument);

You will end-up with code that, from the user's perspective, don't seem to manipulate pointers. If you have several layers of code following these rules, the code will be easier to follow and highly maintainable.

On a related note from: When to use references vs. pointers

Avoid pointers until you can't.

Also note that Sean Parents in his recent talks also consider smart pointers to be raw pointers. They can indeed be encapsulated as implementation details of value-semantic types corresponding to the actual concept being manipulated. Additionally using type-erasure techniques in implementations but never exposing them to the user helps extensibility of some library constructs.

what is the relation between RAII and shared_ptr?

std::shared_ptr<T> extends RAII to resources with multiple ownership. Rather than having to figure out yourself when to delete a shared object, you take a shared pointer down, letting it destroy a shared object, but only when it is the last reference.

It is helpful not to think of the object pointed to by a shared pointer as an object owned by that shared pointer object. Instead, one could think of it as collectively owned by all shared pointers pointing to it. The resource acquired by the shared pointer object is not only the object itself, but also its reference counter. Releasing the object is an equivalent of decreasing the reference counter, with the caveat that once the reference count drops to zero, and additional operation of deleting the object must follow.

RAII vs. Garbage Collector

If I strictly follow RAII rules, which seems to be a good thing, why would that be any different from having a garbage collector in C++?

While both deal with allocations, they do so in completely different manners. If you are reffering to a GC like the one in Java, that adds its own overhead, removes some of the determinism from the resource release process and handles circular references.

You can implement GC though for particular cases, with much different performance characteristics. I implemented one once for closing socket connections, in a high-performance/high-throughput server (just calling the socket close API took too long and borked the throughput performance). This involved no memory, but network connections, and no cyclic dependency handling.

I know that with RAII the programmer is in full control of when the resources are freed again, but is that in any case beneficial to just having a garbage collector?

This determinism is a feature that GC simply doesn't allow. Sometimes you want to be able to know that after some point, a cleanup operation has been performed (deleting a temporary file, closing a network connection, etc).

In such cases GC doesn't cut it which is the reason in C# (for example) you have the IDisposable interface.

I even heard that having a garbage collector can be more efficient, as it can free larger chunks of memory at a time instead of freeing small memory pieces all over the code.

Can be ... depends on the implementation.

Understanding RAII object

like smart pointers (e.g. shared_ptr) it takes a pointer to the resource and then manages it. Is it correct

Not quite. shared_ptrs take part in ownership of the object to which that pointer points, while unique_ptr takes exclusive ownership. Of smart pointers, weak_ptr doesn't take ownership immediately, but it does join as an observer of an object owned by shared_ptrs and allows sharing of ownership to be attempted later.

The point is that those smart pointers take ownership of an existing object indicated by the pointer they're given.

std::string(const char*), on the other hand, makes a copy of the NUL-terminated string to which the pointer points, which it then has exclusive ownership of. The original text to which the constructor's pointer argument pointed is of no on-going relevance to the string object constructed; for example, modifications to the string do not affect that text. Separately, the std::string object may internally keep a pointer to a dynamically allocated buffer storing the copy of the text, and that buffer can be resized and updated (other times - for sufficiently short text - it may be stored directly in the std::string object as an optimisation). On destruction, std::string will delete[] any internal pointer it is still managing. They never leak memory.

C++ Order of class members when smart pointers are involved matters

Order of members matters always.

The presence of the smart pointer is a red hering. The crux in your example is just that initialization of one member depends on some other member already being initialized.

Members are initialized in the order they appear in the class definition. Always.

Even if you list them in different order in the member initializer list, they are still initialized in the order they appear in the class definition. And usually compilers warn when the order is different.

You get similar issue with:

struct foo {
int x;
int y;
foo() : x(1),y(this->x * 2) {}
};

Changing the order of x and y would render the initialization undefined (y would use uninitialzed x).

But is this design not somewhat brittle?

Yes it is. You need to be extra careful when initialization of members depends on each other.

What if someone changes the member order?

You will get a compiler warning, that you better not ignore.

Is there a better/canonical design where there is no need of relying on the member order, or is it just like it is with RAII?

You probably do not need a reference and a smart pointer. Get rid of one of them. As they are both public there is really no point in having them both.

In general, what was suggested in a comment may be a solution. If you refactor one member to be member of a base class then there is no doubt about order of initialization, because base classes are initialized first.



Related Topics



Leave a reply



Submit