What Is Copy Elision and How Does It Optimize the Copy-And-Swap Idiom

What is copy elision and how does it optimize the copy-and-swap idiom?

The copy constructor exists to make copies. In theory when you write a line like:

CLASS c(foo());

The compiler would have to call the copy constructor to copy the return of foo() into c.

Copy elision is a technique to skip calling the copy constructor so as not to pay for the overhead.

For example, the compiler can arrange that foo() will directly construct its return value into c.

Here's another example. Let's say you have a function:

void doit(CLASS c);

If you call it with an actual argument, the compiler has to invoke the copy constructor so that the original parameter cannot be modified:

CLASS c1;
doit(c1);

But now consider a different example, let's say you call your function like this:

doit(c1 + c1);

operator+ is going to have to create a temporary object (an rvalue). Instead of invoking the copy constructor before calling doit(), the compiler can pass the temporary that was created by operator+ and pass that to doit() instead.

What is the copy-and-swap idiom?

Overview

Why do we need the copy-and-swap idiom?

Any class that manages a resource (a wrapper, like a smart pointer) needs to implement The Big Three. While the goals and implementation of the copy-constructor and destructor are straightforward, the copy-assignment operator is arguably the most nuanced and difficult. How should it be done? What pitfalls need to be avoided?

The copy-and-swap idiom is the solution, and elegantly assists the assignment operator in achieving two things: avoiding code duplication, and providing a strong exception guarantee.

How does it work?

Conceptually, it works by using the copy-constructor's functionality to create a local copy of the data, then takes the copied data with a swap function, swapping the old data with the new data. The temporary copy then destructs, taking the old data with it. We are left with a copy of the new data.

In order to use the copy-and-swap idiom, we need three things: a working copy-constructor, a working destructor (both are the basis of any wrapper, so should be complete anyway), and a swap function.

A swap function is a non-throwing function that swaps two objects of a class, member for member. We might be tempted to use std::swap instead of providing our own, but this would be impossible; std::swap uses the copy-constructor and copy-assignment operator within its implementation, and we'd ultimately be trying to define the assignment operator in terms of itself!

(Not only that, but unqualified calls to swap will use our custom swap operator, skipping over the unnecessary construction and destruction of our class that std::swap would entail.)

An in-depth explanation

The goal

Let's consider a concrete case. We want to manage, in an otherwise useless class, a dynamic array. We start with a working constructor, copy-constructor, and destructor:

#include <algorithm> // std::copy
#include <cstddef> // std::size_t

class dumb_array
{
public:
    // (default) constructor
    dumb_array(std::size_t size = 0)
        : mSize(size),
          mArray(mSize ? new int[mSize]() : nullptr)
    {
    }

    // copy-constructor
    dumb_array(const dumb_array& other)
        : mSize(other.mSize),
          mArray(mSize ? new int[mSize] : nullptr)
    {
        // note that this is non-throwing, because of the data
        // types being used; more attention to detail with regards
        // to exceptions must be given in a more general case, however
        std::copy(other.mArray, other.mArray + mSize, mArray);
    }

    // destructor
    ~dumb_array()
    {
        delete [] mArray;
    }

private:
    std::size_t mSize;
    int* mArray;
};

This class almost manages the array successfully, but it needs operator= to work correctly.

A failed solution

Here's how a naive implementation might look:

// the hard part
dumb_array& operator=(const dumb_array& other)
{
    if (this != &other) // (1)
    {
        // get rid of the old data...
        delete [] mArray; // (2)
        mArray = nullptr; // (2) *(see footnote for rationale)

        // ...and put in the new
        mSize = other.mSize; // (3)
        mArray = mSize ? new int[mSize] : nullptr; // (3)
        std::copy(other.mArray, other.mArray + mSize, mArray); // (3)
    }

    return *this;
}

And we say we're finished; this now manages an array, without leaks. However, it suffers from three problems, marked sequentially in the code as (n).

The first is the self-assignment test.

This check serves two purposes: it's an easy way to prevent us from running needless code on self-assignment, and it protects us from subtle bugs (such as deleting the array only to try and copy it). But in all other cases it merely serves to slow the program down, and act as noise in the code; self-assignment rarely occurs, so most of the time this check is a waste.

It would be better if the operator could work properly without it.

The second is that it only provides a basic exception guarantee. If new int[mSize] fails, *this will have been modified. (Namely, the size is wrong and the data is gone!)

For a strong exception guarantee, it would need to be something akin to:

 dumb_array& operator=(const dumb_array& other)
 {
     if (this != &other) // (1)
     {
         // get the new data ready before we replace the old
         std::size_t newSize = other.mSize;
         int* newArray = newSize ? new int[newSize]() : nullptr; // (3)
         std::copy(other.mArray, other.mArray + newSize, newArray); // (3)

         // replace the old data (all are non-throwing)
         delete [] mArray;
         mSize = newSize;
         mArray = newArray;
     }

     return *this;
 }

The code has expanded! Which leads us to the third problem: code duplication.

Our assignment operator effectively duplicates all the code we've already written elsewhere, and that's a terrible thing.

In our case, the core of it is only two lines (the allocation and the copy), but with more complex resources this code bloat can be quite a hassle. We should strive to never repeat ourselves.

(One might wonder: if this much code is needed to manage one resource correctly, what if my class manages more than one?

While this may seem to be a valid concern, and indeed it requires non-trivial try/catch clauses, this is a non-issue.

That's because a class should manage one resource only!)

A successful solution

As mentioned, the copy-and-swap idiom will fix all these issues. But right now, we have all the requirements except one: a swap function. While The Rule of Three successfully entails the existence of our copy-constructor, assignment operator, and destructor, it should really be called "The Big Three and A Half": any time your class manages a resource it also makes sense to provide a swap function.

We need to add swap functionality to our class, and we do that as follows†:

class dumb_array
{
public:
    // ...

    friend void swap(dumb_array& first, dumb_array& second) // nothrow
    {
        // enable ADL (not necessary in our case, but good practice)
        using std::swap;

        // by swapping the members of two objects,
        // the two objects are effectively swapped
        swap(first.mSize, second.mSize);
        swap(first.mArray, second.mArray);
    }

    // ...
};

(Here is the explanation why public friend swap.) Now not only can we swap our dumb_array's, but swaps in general can be more efficient; it merely swaps pointers and sizes, rather than allocating and copying entire arrays. Aside from this bonus in functionality and efficiency, we are now ready to implement the copy-and-swap idiom.

Without further ado, our assignment operator is:

dumb_array& operator=(dumb_array other) // (1)
{
    swap(*this, other); // (2)

    return *this;
}

And that's it! With one fell swoop, all three problems are elegantly tackled at once.

Why does it work?

We first notice an important choice: the parameter argument is taken by-value. While one could just as easily do the following (and indeed, many naive implementations of the idiom do):

dumb_array& operator=(const dumb_array& other)
{
    dumb_array temp(other);
    swap(*this, temp);

    return *this;
}

We lose an important optimization opportunity. Not only that, but this choice is critical in C++11, which is discussed later. (On a general note, a remarkably useful guideline is as follows: if you're going to make a copy of something in a function, let the compiler do it in the parameter list.‡)

Either way, this method of obtaining our resource is the key to eliminating code duplication: we get to use the code from the copy-constructor to make the copy, and never need to repeat any bit of it. Now that the copy is made, we are ready to swap.

Observe that upon entering the function that all the new data is already allocated, copied, and ready to be used. This is what gives us a strong exception guarantee for free: we won't even enter the function if construction of the copy fails, and it's therefore not possible to alter the state of *this. (What we did manually before for a strong exception guarantee, the compiler is doing for us now; how kind.)

At this point we are home-free, because swap is non-throwing. We swap our current data with the copied data, safely altering our state, and the old data gets put into the temporary. The old data is then released when the function returns. (Where upon the parameter's scope ends and its destructor is called.)

Because the idiom repeats no code, we cannot introduce bugs within the operator. Note that this means we are rid of the need for a self-assignment check, allowing a single uniform implementation of operator=. (Additionally, we no longer have a performance penalty on non-self-assignments.)

And that is the copy-and-swap idiom.

What about C++11?

The next version of C++, C++11, makes one very important change to how we manage resources: the Rule of Three is now The Rule of Four (and a half). Why? Because not only do we need to be able to copy-construct our resource, we need to move-construct it as well.

Luckily for us, this is easy:

class dumb_array
{
public:
    // ...

    // move constructor
    dumb_array(dumb_array&& other) noexcept ††
        : dumb_array() // initialize via default constructor, C++11 only
    {
        swap(*this, other);
    }

    // ...
};

What's going on here? Recall the goal of move-construction: to take the resources from another instance of the class, leaving it in a state guaranteed to be assignable and destructible.

So what we've done is simple: initialize via the default constructor (a C++11 feature), then swap with other; we know a default constructed instance of our class can safely be assigned and destructed, so we know other will be able to do the same, after swapping.

(Note that some compilers do not support constructor delegation; in this case, we have to manually default construct the class. This is an unfortunate but luckily trivial task.)

Why does that work?

That is the only change we need to make to our class, so why does it work? Remember the ever-important decision we made to make the parameter a value and not a reference:

dumb_array& operator=(dumb_array other); // (1)

Now, if other is being initialized with an rvalue, it will be move-constructed. Perfect. In the same way C++03 let us re-use our copy-constructor functionality by taking the argument by-value, C++11 will automatically pick the move-constructor when appropriate as well. (And, of course, as mentioned in previously linked article, the copying/moving of the value may simply be elided altogether.)

And so concludes the copy-and-swap idiom.

Footnotes

*Why do we set mArray to null? Because if any further code in the operator throws, the destructor of dumb_array might be called; and if that happens without setting it to null, we attempt to delete memory that's already been deleted! We avoid this by setting it to null, as deleting null is a no-operation.

†There are other claims that we should specialize std::swap for our type, provide an in-class swap along-side a free-function swap, etc. But this is all unnecessary: any proper use of swap will be through an unqualified call, and our function will be found through ADL. One function will do.

‡The reason is simple: once you have the resource to yourself, you may swap and/or move it (C++11) anywhere it needs to be. And by making the copy in the parameter list, you maximize optimization.

††The move constructor should generally be noexcept, otherwise some code (e.g. std::vector resizing logic) will use the copy constructor even when a move would make sense. Of course, only mark it noexcept if the code inside doesn't throw exceptions.

What are copy elision and return value optimization?

Introduction

For a technical overview - skip to this answer.

For common cases where copy elision occurs - skip to this answer.

Copy elision is an optimization implemented by most compilers to prevent extra (potentially expensive) copies in certain situations. It makes returning by value or pass-by-value feasible in practice (restrictions apply).

It's the only form of optimization that elides (ha!) the as-if rule - copy elision can be applied even if copying/moving the object has side-effects.

The following example taken from Wikipedia:

struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};
 
C f() {
  return C();
}
 
int main() {
  std::cout << "Hello World!\n";
  C obj = f();
}

Depending on the compiler & settings, the following outputs are all valid:

Hello World!

A copy was made.

A copy was made.

Hello World!

A copy was made.

Hello World!

This also means fewer objects can be created, so you also can't rely on a specific number of destructors being called. You shouldn't have critical logic inside copy/move-constructors or destructors, as you can't rely on them being called.

If a call to a copy or move constructor is elided, that constructor must still exist and must be accessible. This ensures that copy elision does not allow copying objects which are not normally copyable, e.g. because they have a private or deleted copy/move constructor.

C++17: As of C++17, Copy Elision is guaranteed when an object is returned directly:

struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};
 
C f() {
  return C(); //Definitely performs copy elision
}
C g() {
    C c;
    return c; //Maybe performs copy elision
}
 
int main() {
  std::cout << "Hello World!\n";
  C obj = f(); //Copy constructor isn't called
}

C++17 copy elision and object destruction

The symmetric case where the source outlives the target is when the prvalue is a parameter:

struct A {
  static int *data;
  A() {if(!refs++) data=new int(42);}
  A(const A&) {++refs;}  // not movable
  ~A() {if(!--refs) delete data;}
private:
  static int refs;
};
int A::refs,*A::data;
int* f(A) {return A::data;}
A returnA();
int returnInt() {return *f(returnA());} // ok

Because the result of returnA() is a temporary, its lifetime extends to the end of the return statement’s full-expression. The implementation may identify it with f’s parameter, but may not destroy it when f returns, so the dereference in returnInt is valid. (Note that parameters may survive that long anyway.)

The adjustment in C++17 (along with such elision being guaranteed) is that if you (would) move the prvalue, it may be destroyed when the parameter is (since you shouldn’t be relying on its contents anyway). If that’s when f returns, the (ill-advised) code above becomes invalid if A is made movable.

How is memory managed during C++ copy elision?

Consider this line: A b = beta();. How does a compiler implement that?

Well, beta is some other function. And beta has a return value that is an A. So there are two possible ways to implement this. The compiler could make beta allocate the stack space for its A return value, but that could be problematic since the caller needs to use that return value. So instead, the compiler makes the caller allocate the stack space for the return value. After all, the caller does know the size/alignment of the return value, so it has everything it needs to know to allocate that space.

So let's go with the latter. This means that when the compiler calls beta, it passes in the address where beta's return value will be. But this also means that the compiler, for this particular call of beta could just give beta's return value the same address as it will give b.

So right there, we've elided the copy from the return value of the function into b.

So when the compiler goes to compile beta, it knows that the caller is going to give it a pointer to where the return value should go. So the return a; statement semantically copies from the a variable into this return value object.

But, the compiler can see the entirety of beta. And it can see that the a variable is a local variable, and it gets returned on all control paths. So instead of giving a a separate stack address, the compiler could just put a in the memory provided by the caller for the return value.

So again, we have elided a copy from a into the return value.

Move constructor vs Copy elision

There are plenty of cases where the move constructor will still get called and copy elision is not being used:

// inserting existing objects into a container
MyObject myobject;
std::vector<MyObject> myvector;
myvector.push_back(std::move(myobject));

// inserting temporary objects into a container
myvector.push_back(MyObject());

// swapping
MyObject other;
std::swap(myobject, other);

// calling functions with existing objects
void foo(MyObject x);

foo(std::move(myobject));

... and many more.

The only instance where there is mandatory copy elision (since C++17) is when constructing values from the result of a function call or a constructor. In such cases, the compiler isn't even allowed to use the move constructor. For example:

MyObject bar() {
    return MyObject();
}

void example() {
    MyObject x = bar(); // copy elision here
    MyObject y = MyObject(); // also here
}

In general, the purpose of copy elision is not to eliminate move construction alltogether, but to avoid unnecessary constructions when initializing variables from prvalues.

See cppreference on Copy Elision.

copy elision in c++03

Yes, copy ellision is permitted in C++03 and C++98. That's the paragraph for C++98 and C++03:

Non-mandatory elision of copy operations
Under the following circumstances, the compilers are permitted, but
not required to omit the copy construction of
class objects even if the copy constructor and the
destructor have observable side-effects. The objects are constructed
directly into the storage where they would otherwise be copied
to. This is an optimization: even when it takes place and the
copy constructor is not called, it still must be
present and accessible (as if no optimization happened at all),
otherwise the program is ill-formed:
In a return statement, when the operand is the name of a non-volatile object with automatic storage duration, which isn't a
function parameter or a catch clause parameter, and which is of the
same class type (ignoring cv-qualification) as the function return
type. This variant of copy elision is known as NRVO, "named return
value optimization".
In the initialization of an object, when the source object is a nameless temporary and is of the same class type (ignoring
cv-qualification) as the target object. When the nameless temporary is
the operand of a return statement, this variant of copy elision is
known as RVO, "return value optimization".
When copy elision occurs, the implementation treats the source and target of the omitted copy operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization

cppreference

I removed everything that's only valid since C++11.

The only differences between C++98, C++03 and C++11 regarding ellision are move operations and exception handling.

std::move versus copy elision

The two versions

m_thread = std::thread(&ThreadRunner::runInThread, this);

and

m_thread = std::move(std::thread(&ThreadRunner::runInThread, this));

behave identically. No elision is possible in either case, since this is assignment to, not initialization of, m_thread. The temporary object must be constructed and then there will be a move assignment from it in either version.

The hint is still correct though, since std::move on a temporary either doesn't have any effect at all (as here) or prevents elision if used in a context where copy elision would otherwise be allowed/mandatory, for example if this was the initializer of m_thread instead of an assignment to it.

What Is Copy Elision and How Does It Optimize the Copy-And-Swap Idiom