Why Are By-Value Parameters Excluded from Nrvo

Are value parameters implicitly moved when returned by value?

If there is a move ctor for Foo, it should be selected.

Function parameters are explicitly excluded from copy elision in return statements (FDIS §12.9p31, first bullet):

in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter)

However, the next paragraph explicitly brings move ctors back into consideration:

When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. …

(Emphasis is mine in both quotes.)

Does introducing a new variable defeat return value optimisation?

No. Copy elision can still be applied here. In this specific case, it is called NRVO (named return value optimisation). You don't need a move constructor for copy elision to be performed; copy elision has been in the standard since C++98/03, when we only had copy constructors.

To maximise your chance of copy elision being used, you should make sure that either: all code paths return the same object (NRVO) or all code paths return a temporary object (RVO).

If you mix and match NRVO and RVO inside the same function it becomes hard to apply the optimisation.

Sample code demonstrating NRVO.

C++ Named Return Value Optimization with nested function calls

This happens in Test1 because the compiler is explicitly disallowed to apply NRVO from by value parameters from a function's argument list. And in Test1 you are accepting a W instance by value as a function parameter, so the compiler cannot elide the move on return.

See Why are by-value parameters excluded from NRVO? and my discussion with Howard Hinnant about the issue here Why does for_each return function by move in the comments

You cannot make Test1 work as efficiently as you did in the earlier case because of this.

The relevant quote from the standard

15.8.3 Copy/move elision [class.copy.elision]

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, ...

in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function parameter or a variable introduced by the exception-declaration of a handler (18.3)) with the same type (ignoring cv-qualification) as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function call’s return object

Why does Return Value Optimization not happen if no destructor is defined?

The language rule which allows this in case of returning a prvalue (the second example) is:

[class.temporary]
When an object of class type X is passed to or returned from a function, if X has at least one eligible copy or move constructor ([special]), each such constructor is trivial, and the destructor of X is either trivial or deleted, implementations are permitted to create a temporary object to hold the function parameter or result object.
The temporary object is constructed from the function argument or return value, respectively, and the function's parameter or return object is initialized as if by using the eligible trivial constructor to copy the temporary (even if that constructor is inaccessible or would not be selected by overload resolution to perform a copy or move of the object).
[Note: This latitude is granted to allow objects of class type to be passed to or returned from functions in registers.
— end note
]

Why does Return Value Optimization not happen [in some cases]?

The motivation for the rule is explained in the note of the quoted rule. Essentially, RVO is sometimes less efficient than no RVO.

If a destructor is defined by enabling the #if above, then the RVO does happen (and it also happens in some other cases such as defining a virtual method or adding a std::string member).

In the second case, this is explained by the rule because creating the temporary is only allowed when the destructor is trivial.

In the NRVO case, I suppose this is up to the language implementation.

Why do neither move semantics nor RVO work as expected?

By-value parameters aren't subject to NRVO (Why are by-value parameters excluded from NRVO?) so they are moved instead (Are value parameters implicitly moved when returned by value?)

A fairly simple solution is to take both parameters by const reference and copy within the function body:

Foo operator+(Foo const& rhs) const
{
    cout << "Summing Foo objects" << endl;
    Foo res{*this};
    res += rhs;
    return res;
}

Why isn't the copy-constructor called when returning LOCAL variable

The compiler optimizes away the return copy. This is known as NRVO (Named Return Value Optimization).

in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

The compiler is allowed to do this, even if the copy constructor has side effects.

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.

That is, if you give your copy/move constructors side effects, your program has multiple valid execution paths, depending on whether your compiler wants to optimize or not.

Copy elision, std::move, and chained function calls

Copy elision is the technique the compiler uses to prevent unneeded copies. Basically, it preallocates memory outside of the function and passes it in to be used. In case of your temporary, it will be on the stack.

Adding std::move to the return type doesn't help. You are already returning by value, so you have already an rvalue. Casting it no an rvalue with std::move should me a no-op.
I'm not aware of the details, however, for some cases adding it can hurt performance.

Focusing on 2:
Adding std::move to the function call only has a side effect when returned by non-const reference. In those cases, you most likely wrote a bug as the original will be moved away.

For number 3:
My favorite is using f(Arg &&a), as this requires all callers to pass rvalue. If performance is less important, for example: you did not find it in profiling. A value argument (some callers can copy) or even const-reference might do (function can't touch argument, so should copy).

As indicated by the comments, the implementation of the function should also write auto result = std::move(a) as your parameter doesn't benefits from NRVO.

Recent versions of Clang have very good warnings about when std::move should be used and when to remove it. I suggest enabling them. GCC might have some similar warnings, however I'm not up to date with it.

In short: your original code is the best version to use and trust your compiler if it has warnings about this.

why copy constructor is always called when return by value

I have tried gcc 4.8.4 and vc++ 2015, copy constructor of MyClass is called twice for (2) in both compilers.

Because that is how it is supposed to work!

Really, for the case where you want this to be optimized, there's the reference mechanism in C++; in your case it must not be optimized away, because "I checked" implies that you relied on a side effect of the constructor to show you it's being called; how should the compiler know you don't functionally need that side effect?