Why Can a T* Be Passed in Register, But a Unique_Ptr<T> Cannot

Why can a T* be passed in register, but a unique_ptrT cannot?

Is this actually an ABI requirement, or maybe it's just some pessimization in certain scenarios?

One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux:

If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor, it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class INTEGER).

An object with either a non-trivial copy constructor or a non-trivial destructor cannot be
passed by value because such objects must have well defined addresses. Similar issues apply
when returning an object from a function.

Note, that only 2 general purpose registers can be used for passing 1 object with a trivial copy constructor and a trivial destructor, i.e. only values of objects with sizeof no greater than 16 can be passed in registers. See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.

There are different ABIs for other CPU architectures.

There is also Itanium C++ ABI which most compilers comply with (apart from MSVC), which requires:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

A type is considered non-trivial for the purposes of calls if:

it has a non-trivial copy constructor, move constructor, or destructor, or

all of its copy and move constructors are deleted.

This definition, as applied to class types, is intended to be the complement of the definition in [class.temporary]p3 of types for which an extra temporary is allowed when passing or returning a type. A type which is trivial for the purposes of the ABI will be passed and returned according to the rules of the base C ABI, e.g. in registers; often this has the effect of performing a trivial copy of the type.

Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?

It is an implementation detail, but when an exception is handled, during stack unwinding, the objects with automatic storage duration being destroyed must be addressable relative to the function stack frame because the registers have been clobbered by that time. Stack unwinding code needs objects' addresses to invoke their destructors but objects in registers do not have an address.

Pedantically, destructors operate on objects:

An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction.

and an object cannot exist in C++ if no addressable storage is allocated for it because object's identity is its address.

When an address of an object with a trivial copy constructor kept in registers is needed the compiler can just store the object into memory and obtain the address. If the copy constructor is non-trivial, on the other hand, the compiler cannot just store it into memory, it rather needs to call the copy constructor which takes a reference and hence requires the address of the object in the registers. The calling convention probably cannot depend whether the copy constructor was inlined in the callee or not.

Another way to think about this, is that for trivially copyable types the compiler transfers the value of an object in registers, from which an object can be recovered by plain memory stores if necessary. E.g.:

void f(long*);
void g(long a) { f(&a); }

on x86_64 with System V ABI compiles into:

g(long):                             // Argument a is in rdi.
        push    rax                  // Align stack, faster sub rsp, 8.
        mov     qword ptr [rsp], rdi // Store the value of a in rdi into the stack to create an object.
        mov     rdi, rsp             // Load the address of the object on the stack into rdi.
        call    f(long*)             // Call f with the address in rdi.
        pop     rax                  // Faster add rsp, 8.
        ret                          // The destructor of the stack object is trivial, no code to emit.

In his thought-provoking talk Chandler Carruth mentions that a breaking ABI change may be necessary (among other things) to implement the destructive move that could improve things. IMO, the ABI change could be non-breaking if the functions using the new ABI explicitly opt-in to have a new different linkage, e.g. declare them in extern "C++20" {} block (possibly, in a new inline namespace for migrating existing APIs). So that only the code compiled against the new function declarations with the new linkage can use the new ABI.

Note that ABI doesn't apply when the called function has been inlined. As well as with link-time code generation the compiler can inline functions defined in other translation units or use custom calling conventions.

Does passing a `unique_ptr` by value have a performance penalty compared to a plain pointer?

System V ABI uses Itanium C++ ABI and refers to it. In particular, C++ Itanium ABI specifies that

If the parameter type is non-trivial for the purposes of calls, the
caller must allocate space for a temporary and pass that temporary by
reference.
Specifically:
...
If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception), at the end of enclosing full-expression.

So a simple answer to question "why it is not passed into register" is "because it can't".

Now, an interesting question might be 'why did C++ Itanium ABI decided to go with that'.

While I wouldn't claim that I have intimate knowledge with rationale, two things come to mind:

This allows for copy elision if the argument to the function is a temporary
This makes tail-call optimizations more powerful. If callee would need to call destructors of it's arguments, TCO wouldn't be possible for any function which accepts non-trivial arguments.

Cannot pass std::unique_ptr in std::function

The std::function requires the function object to be Copy-Constructible, so you can't expect a lamdba to be moved to it. On initialization, it attempts to copy the lambda and so the std::unique_ptr with it, which is a member of this lambda, and, expectedly, fails to do so. What you can do is store your lambda in a variable and pass it to function that accepts const std::function& using std::ref like that:

        void foo(const std::function<void()>& f); // function declaration
        auto a = [h = std::move(handle)]() mutable
        {
            std::cout << *h << std::endl;
        };
        foo(std::ref(a));

This is a related question with much more detailed answers: How to create an std::function from a move-capturing lambda expression?

Tring to create a unique pointer gives me an error

std::unique_ptr can't be copied, it doesn't have copy-constructor but has move constructor. You can use std::move to convert boid to rvalue then the move constructor could be used.

std::unique_ptr<Boid> boid = std::make_unique<Boid>
    (
    olc::vf2d(rand() % 600 * 1.0f, rand() % 300 * 1.0f),
    rand() % 7 * 1.0f,
    olc::Pixel(0, 0, (rand() % 150) + 100)
    );
boids.push_back(std::move(boid));

Or pass the tempoary (which is also rvalue) directly.

boids.push_back(std::make_unique<Boid>
    (
    olc::vf2d(rand() % 600 * 1.0f, rand() % 300 * 1.0f),
    rand() % 7 * 1.0f,
    olc::Pixel(0, 0, (rand() % 150) + 100)
    ));

Why this dead store of unique_ptr cannot be eliminated?

In both of these cases, the answer is: because the object you moved from will still be destroyed.

If you look at the code generated for a call to

void f(unique_ptr<int> u);

you will notice that the caller creates the object for parameter u and calls its destructor afterwards as mandated by the calling convention. In case the call to f() is inlined, the compiler will most likely be able to optimize this away. But the code generated for f() has no control over the destructor of u and, thus, has to set the internal pointer of u to zero assuming that the destructor of u will run after the function returns.

In your second example, we have sort of the inverse situation:

void h(int x) {
    auto p = make_unique<int>(x);
    f(move(p));
}

Contrary to what the name may suggest, std::move() does not actually move an object. All it does is cast to an rvalue reference which allows the recipient of that reference to move from the object referred to—if he so choses. The actual move only happens, e.g., when another object is constructed from the given argument via a move constructor. Since the compiler does not know anything about what happens inside f() at the point of definition of h(), it can't assume that f() will always move from the given object. For example, f() could simply return or move only in some cases and not in others. Therefore, the compiler has to assume that the function might return without moving from the object and has to emit the delete for the destructor. The function could also perform a move assignment instead of a move construction, in which case the outer destructor would still be needed to release ownership of the object previously held by whatever was assigned ownership of the new object…

Why Can a T* Be Passed in Register, But a Unique_Ptr<T> Cannot