Should I Copy an Std::Function or How to Always Take a Reference to It

Should I copy an std::function or can I always take a reference to it?

Can I store the function as a reference since std::function is just a function-pointer and the 'executable code' of the function is guaranteed to stay in memory?

std::function is very much not just a function pointer. It's a wrapper around an arbitrary callable object, and manages the memory used to store that object. As with any other type, it's safe to store a reference only if you have some other way to guarantee that the referred object is still valid whenever that reference is used.

Unless you have a good reason for storing a reference, and a way to guarantee that it remains valid, store it by value.

Passing by const reference to the constructor is safe, and probably more efficient than passing a value. Passing by non-const reference is a bad idea, since it prevents you from passing a temporary, so the user can't directly pass a lambda, the result of bind, or any other callable object except std::function<int(int)> itself.

std::function internal memory organization and copies; passing reference vs value

(Note: the whole discussion below is a little simplified. AFAIK, none of it is wrong, but I did omit some details and edge cases and definitions and implementation stuff.)

The std::function does not copy any executable code. The executable code is always merely pointed to, by std::function. And when the std::function gets copied, the pointer gets duplicated (which is completely fine, because executable code is never freed either.) So far, there is no difference between a plain old function pointer and a std::function.

But that's not the whole story.

Contrary to function pointers, instances of std::function can carry around "state" as well as a pointer to the executable code, and the whole hubbub about std::function having to allocate/deallocate and copy/move data around is about this extra state, not the function pointer.

Suppose that you have code like this:

(And note that although I've used a lambda here, the following explanation would have been equally applicable for "functors" and "function objects" and "bind results" and other forms of callable things in C++, all except plain old function pointers.)

int x = 42, y = 17;
std::function<int()> f = [x, y] {return x + y;};

Here, f not only stores the pointer to the executable code for return x + y;, but it also has to remember the value of x and y. Since the amount of state that you can "capture" in this way is not limited, then - by definition - the std::function must allocate memory from the heap upon construction, and deallocate it, copy it and move it at appropriate times. Again, it is this extra "state" that gets copied, not the code.

Let's review: each std::function needs to be able to store at least a pointer to executable code, and 0 or more bytes of extra captured state. If there is no captured state, a std::function is essentially the same as a function pointer (although in practice, std::functions are usually implemented polymorphically and have other stuff in there.)

Some (most) implementations of std::function that I'm aware of employ an optimization that is called "Small Object Optimization". In these implementations, in addition to the space for the pointer to code, the std::function object has some more (fixed amount of) space inside its instance (i.e. as a member of its class, as opposed to somewhere else on the heap) and will use that area if the total number of bytes of the captured state would fit in there. This eliminates the heap allocation, which is important in some use cases and would balance out the additional memory used (when there is no or little state to capture.)

Should I assign a ref or a copy to a value returning function?

Your colleague is trying to do the compiler's job instead of trusting it, and is potentially pessimizing as a result. NRVO is very well supported, and if the functions are written with value semantics, NRVO can elide multiple copies. Binding to a reference will prevent that, since a reference variable will not satisfy the conditions for this optimization. A simple test to demonstrate:

#include <iostream>

struct Test {
    Test() { std::cout << "Created\n"; }
    Test(const Test&) { std::cout << "Copied\n"; }
};

Test foo() {
    Test t;
    return t;
}

Test bar_good() {
    const auto t = foo();
    return t;
}

Test bar_bad() {
    const auto& t = foo();
    return t;
}

int main() {
    const auto good = bar_good(); (void)good;

    std::cout << "===========\n";

    const auto& bad = bar_bad();  (void)bad;
}

Which gives the output:

Created
===========
Created
Copied

One object total when utilizing value semantics, but a redundant copy when using references. Depending on how expansive the copy (or even move) is, you could see a noticeable performance difference.

Advantages of pass-by-value and std::move over pass-by-reference

Did I understand correctly what is happening here?

Yes.

Is there any upside of using std::move over passing by reference and just calling m_name{name}?

An easy to grasp function signature without any additional overloads. The signature immediately reveals that the argument will be copied - this saves callers from wondering whether a const std::string& reference might be stored as a data member, possibly becoming a dangling reference later on. And there is no need to overload on std::string&& name and const std::string& arguments to avoid unnecessary copies when rvalues are passed to the function. Passing an lvalue

std::string nameString("Alex");
Creature c(nameString);

to the function that takes its argument by value causes one copy and one move construction. Passing an rvalue to the same function

std::string nameString("Alex");
Creature c(std::move(nameString));

causes two move constructions. In contrast, when the function parameter is const std::string&, there will always be a copy, even when passing an rvalue argument. This is clearly an advantage as long as the argument type is cheap to move-construct (this is the case for std::string).

But there is a downside to consider: the reasoning doesn't work for functions that assign the function argument to another variable (instead of initializing it):

void setName(std::string name)
{
    m_name = std::move(name);
}

will cause a deallocation of the resource that m_name refers to before it's reassigned. I recommend reading Item 41 in Effective Modern C++ and also this question.

Why is passing by value (if a copy is needed) recommended in C++11 if a const reference only costs a single copy as well?

When consuming data, you'll need an object you can consume. When you get a std::string const& you will have to copy the object independent on whether the argument will be needed.

When the object is passed by value the object will be copied if it has to be copied, i.e., when the object passed is not a temporary. However, if it happens to be a temporary the object may be constructed in place, i.e., any copies may have been elided and you just pay for a move construction. That is, there is a chance that no copy actually happens.

Should I store a reference to map in a member variable?

A reference member is allowed, but a dangerous thing. It needs to refer to another object outside the class object, where some other code makes sure the object referred to is certain to exist for as long as the class member might be used again.

And if you changed just the member type and had:

class Device {
private:
    int stateID;
public:
    const std::map<int, State>& possibleStates;
    Device(std::map<int, State> states) : possibleStates(states) {}
};

then this would almost certainly be wrong. The reference would bind to the parameter states of the constructor, and the lifetime of states ends as soon as the constructor body finishes, leaving possibleStates a dangling reference!

If you also changed the constructor parameter type, then it becomes sort of usable...

class Device {
private:
    int stateID;
public:
    const std::map<int, State>& possibleStates;
    Device(const std::map<int, State>& states) : possibleStates(states) {}
};

... but now every bit of code which creates a Device object needs to be careful and make sure the map argument passed in is one which will exist for as long as it's needed. Users of the class may not be expecting this, and even if it's known, it's too easy to accidentally get wrong, like by creating a Device from an unnamed temporary std::map object, which will very quickly be destroyed.

The real solution to avoid unnecessary copies is:

#include <utility>

class Device {
private:
    int stateID;
public:
    const std::map<int, State> possibleStates;
    Device(std::map<int, State> states) : possibleStates(std::move(states)) {}
};

The std::move lets the map constructor know that this code doesn't care about keeping a consistent value in the states object. So the move constructor for std::map<int, State> is permitted to "steal" allocated memory from states to create possibleStates, rather than duplicating all the memory for map internal details and string contents. This makes sense since states is local to the constructor definition and about to be destroyed, and nothing else even names the parameter variable again afterward.

Keeping the parameter type in Device(std::map<int, State>); as an object type rather than a reference type also allows the code which creates a Device to determine whether it should create that parameter object by copy, or by move, or using some other map constructor entirely.