How can std::unique_ptr have no size overhead?
The reason is that the typename _Dp = default_delete<_Tp>
is an empty class and the tuple
template employs empty base class optimization.
If you instantiate the unique_ptr
with a non-default delete, you should see the size increase.
How std::unique_ptr have no size overhead if using lambda
A capture-less lambda does not need to have any subobjects; it's just a type that has an operator()
overload. As such, it can be (but is not required to be) an empty type. unqiue_ptr
is allowed (but not required) to optimize the way it "contains" the deleter type so that, if the deleter type is an empty class type, then it can use various techniques to make sure that this type does not take up storage within the unique_ptr
instance itself.
There are several ways to do this. The unique_ptr
can inherit from the type, relying on EBO to optimize away the base class. With C++20, it can just make it a member subobject, relying on the [[no_unique_address]]
attribute to provide empty member optimization. In either case, the only actual storage the unique_ptr<T>
needs is for the pointer to T
.
By contrast, a function pointer is a function pointer. It's a fundamental type that has to have storage, because it could point to any function with that signature. A type essentially contains the member function to call as a part of the type itself; a function pointer does not. The instance of the type doesn't actually need storage to find its operator()
.
How can unique_ptr have no overhead if it needs to store the deleter?
std::unique_ptr<T>
is quite likely to be zero-overhead (with any sane standard-library implementation). std::unique_ptr<T, D>
, for an arbitrary D
, is not in general zero-overhead.
The reason is simple: Empty-Base Optimisation can be used to eliminate storage of the deleter in case it's an empty (and thus stateless) type (such as std::default_delete
instantiations).
Memory footprint of unique_ptr
As @JoachimPileborg suggested, with GCC 4.8 (x64) this code
std::cout << "sizeof(unique_ptr) = " << sizeof(std::unique_ptr<int>) << '\n';
produces this output:
sizeof(unique_ptr) = 8
So, under this implementation, the answer is yes.
This is not astonishing: after all, unique_ptr
doesn't add features to raw pointers ( e.g. a counter as shared_ptr
does. In fact, if I print sizeof(shared_ptr<int>)
the result this time is 16
). unique_ptr
takes care for you about of some aspects of pointers management.
By the way, being a unique_ptr
different from a raw one, the generated code will be different when using one or another. In particular, if a unique_ptr
goes out of scope in your code, the compiler will generate code for the destructor of that particular specialization and it will use that code every time a unique_ptr
of that type will be destroyed (and that is exactly what you want).
std::unique_ptr memory and performance
There likely is no "raw ptr in smart ptr". The smart pointer will just be the raw pointer. There is nothing you need that indirection would give you.
How much is the overhead of smart pointers compared to normal pointers in C++?
std::unique_ptr
has memory overhead only if you provide it with some non-trivial deleter.
std::shared_ptr
always has memory overhead for reference counter, though it is very small.
std::unique_ptr
has time overhead only during constructor (if it has to copy the provided deleter and/or null-initialize the pointer) and during destructor (to destroy the owned object).
std::shared_ptr
has time overhead in constructor (to create the reference counter), in destructor (to decrement the reference counter and possibly destroy the object) and in assignment operator (to increment the reference counter). Due to thread-safety guarantees of std::shared_ptr
, these increments/decrements are atomic, thus adding some more overhead.
Note that none of them has time overhead in dereferencing (in getting the reference to owned object), while this operation seems to be the most common for pointers.
To sum up, there is some overhead, but it shouldn't make the code slow unless you continuously create and destroy smart pointers.
unique_ptr deleter overhead
If there is only ever one lib_api
object, then you can have your deleter get a static pointer to it.
If there can be more than one lib_api
object then you have no choice but to store a pointer to it in the Deleter.
Why can a T* be passed in register, but a unique_ptrT cannot?
- Is this actually an ABI requirement, or maybe it's just some pessimization in certain scenarios?
One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux:
If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor, it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class INTEGER).An object with either a non-trivial copy constructor or a non-trivial destructor cannot be
passed by value because such objects must have well defined addresses. Similar issues apply
when returning an object from a function.
Note, that only 2 general purpose registers can be used for passing 1 object with a trivial copy constructor and a trivial destructor, i.e. only values of objects with sizeof
no greater than 16 can be passed in registers. See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.
There are different ABIs for other CPU architectures.
There is also Itanium C++ ABI which most compilers comply with (apart from MSVC), which requires:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
A type is considered non-trivial for the purposes of calls if:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
This definition, as applied to class types, is intended to be the complement of the definition in [class.temporary]p3 of types for which an extra temporary is allowed when passing or returning a type. A type which is trivial for the purposes of the ABI will be passed and returned according to the rules of the base C ABI, e.g. in registers; often this has the effect of performing a trivial copy of the type.
- Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
It is an implementation detail, but when an exception is handled, during stack unwinding, the objects with automatic storage duration being destroyed must be addressable relative to the function stack frame because the registers have been clobbered by that time. Stack unwinding code needs objects' addresses to invoke their destructors but objects in registers do not have an address.
Pedantically, destructors operate on objects:
An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction.
and an object cannot exist in C++ if no addressable storage is allocated for it because object's identity is its address.
When an address of an object with a trivial copy constructor kept in registers is needed the compiler can just store the object into memory and obtain the address. If the copy constructor is non-trivial, on the other hand, the compiler cannot just store it into memory, it rather needs to call the copy constructor which takes a reference and hence requires the address of the object in the registers. The calling convention probably cannot depend whether the copy constructor was inlined in the callee or not.
Another way to think about this, is that for trivially copyable types the compiler transfers the value of an object in registers, from which an object can be recovered by plain memory stores if necessary. E.g.:
void f(long*);
void g(long a) { f(&a); }
on x86_64 with System V ABI compiles into:
g(long): // Argument a is in rdi.
push rax // Align stack, faster sub rsp, 8.
mov qword ptr [rsp], rdi // Store the value of a in rdi into the stack to create an object.
mov rdi, rsp // Load the address of the object on the stack into rdi.
call f(long*) // Call f with the address in rdi.
pop rax // Faster add rsp, 8.
ret // The destructor of the stack object is trivial, no code to emit.
In his thought-provoking talk Chandler Carruth mentions that a breaking ABI change may be necessary (among other things) to implement the destructive move that could improve things. IMO, the ABI change could be non-breaking if the functions using the new ABI explicitly opt-in to have a new different linkage, e.g. declare them in extern "C++20" {}
block (possibly, in a new inline namespace for migrating existing APIs). So that only the code compiled against the new function declarations with the new linkage can use the new ABI.
Note that ABI doesn't apply when the called function has been inlined. As well as with link-time code generation the compiler can inline functions defined in other translation units or use custom calling conventions.
Related Topics
Where in Qt Creator Do I Pass Arguments to a Compiler
Is There Any Use for Local Function Declarations
Generate Include File Name in a MACro
C++ Fatal Error Lnk1120: 1 Unresolved Externals
What's the Difference Between a Const Member Function and a Non-Const Member Function
Can Raw Pointers Be Used Instead of Iterators with Stl Algorithms for Containers with Linear Storage
What Is the Vtable Layout and Vtable Pointer Location in C++ Objects in Gcc 3.X and 4.X
Is There a Functional Difference Between "2.00" and "2.00F"
Why Do You Use Typedef When Declaring an Enum in C++
C++ - <Unresolved Overloaded Function Type>
Get Current Username in C++ on Windows
Overriding Static Variables When Subclassing
Checking for Underflow/Overflow in C++
Why Does Nvcc Fails to Compile a Cuda File with Boost::Spirit