"Constructing" a Trivially-Copyable Object With Memcpy

constructing a trivially-copyable object with memcpy

This is unspecified which is supported by N3751: Object Lifetime, Low-level Programming, and
memcpy which says amongst other things:

The C++ standards is currently silent on whether the use of memcpy to
copy object representation bytes is conceptually an assignment or an
object construction. The difference does matter for semantics-based
program analysis and transformation tools, as well as optimizers,
tracking object lifetime. This paper suggests that

uses of memcpy to copy the bytes of two distinct objects of two different trivial copyable tables (but otherwise of the same size) be
allowed

such uses are recognized as initialization, or more generally as (conceptually) object construction.

Recognition as object construction will support binary IO, while still
permitting lifetime-based analyses and optimizers.

I can not find any meeting minutes that has this paper discussed, so it seems like it is still an open issue.

The C++14 draft standard currently says in 1.8 [intro.object]:

[...]An object is created by a definition (3.1), by a new-expression
(5.3.4) or by the implementation (12.2) when needed.[...]

which we don't have with the malloc and the cases covered in the standard for copying trivial copyable types seem to only refer to already existing objects in section 3.9 [basic.types]:

For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char.42 If the content of the array of
char or unsigned char is copied back into the object, the object shall
subsequently hold its original value[...]

and:

For any trivially copyable type T, if two pointers to T point to
distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a
base-class subobject, if the underlying bytes (1.7) making up obj1 are
copied into obj2,43 obj2 shall subsequently hold the same value as
obj1.[...]

which is basically what the proposal says, so that should not be surprising.

dyp points out a fascinating discussion on this topic from the ub mailing list: [ub] Type punning to avoid copying.

Propoal p0593: Implicit creation of objects for low-level object manipulation

The proposal p0593 attempts to solve this issues but AFAIK has not been reviewed yet.

This paper proposes that objects of sufficiently trivial types be created on-demand as necessary within newly-allocated storage to give programs defined behavior.

It has some motivating examples which are similar in nature including a current std::vector implementation which currently has undefined behavior.

It proposes the following ways to implicitly create an object:

We propose that at minimum the following operations be specified as implicitly creating objects:
Creation of an array of char, unsigned char, or std::byte implicitly creates objects within that array.

A call to malloc, calloc, realloc, or any function named operator new or operator new[] implicitly creates objects in its returned storage.

std::allocator::allocate likewise implicitly creates objects in its returned storage; the allocator requirements should require other allocator implementations to do the same.

A call to memmove behaves as if it

copies the source storage to a temporary area

implicitly creates objects in the destination storage, and then

copies the temporary storage to the destination storage.

This permits memmove to preserve the types of trivially-copyable objects, or to be used to reinterpret a byte representation of one object as that of another object.

A call to memcpy behaves the same as a call to memmove except that it introduces an overlap restriction between the source and destination.

A class member access that nominates a union member triggers implicit object creation within the storage occupied by the union member. Note that this is not an entirely new rule: this permission already existed in [P0137R1] for cases where the member access is on the left side of an assignment, but is now generalized as part of this new framework. As explained below, this does not permit type punning through unions; rather, it merely permits the active union member to be changed by a class member access expression.
A new barrier operation (distinct from std::launder, which does not create objects) should be introduced to the standard library, with semantics equivalent to a memmove with the same source and destination storage. As a strawman, we suggest:
// Requires: [start, (char*)start + length) denotes a region of allocated
// storage that is a subset of the region of storage reachable through start.
// Effects: implicitly creates objects within the denoted region.
void std::bless(void *start, size_t length);
In addition to the above, an implementation-defined set of non-stasndard memory allocation and mapping functions, such as mmap on POSIX systems and VirtualAlloc on Windows systems, should be specified as implicitly creating objects.

Note that a pointer reinterpret_cast is not considered sufficient to trigger implicit object creation.

Is memcpy of a trivially-copyable type construction or assignment?

It is clear to me that using std::memcpy results in neither construction nor assignment. It is not construction, since no constructor will be called. Nor is it assignment, as the assignment operator will not be called. Given that a trivially copyable object has trivial destructors, (copy/move) constructors, and (copy/move) assignment operators, the point is rather moot.

You seem to have quoted ¶2 from §3.9 [basic.types]. On ¶3, it states:

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2,⁴¹ obj2 shall subsequently hold the same value as obj1. [ Example:

  T* t1p;

  T* t2p;

          // provided that t2p points to an initialized object ...

  std::memcpy(t1p, t2p, sizeof(T));

          // at this point, every subobject of trivially copyable type in *t1p contains

          // the same value as the corresponding subobject in *t2p

— end example ]

_{41) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.}

Clearly, the standard intended to allow *t1p to be useable in every way *t2p would be.

Continuing on to ¶4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.⁴²

_{42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.}

The use of the word the in front of both defined terms implies that any given type only has one object representation and a given object has only one value representation. Your hypothetical morphing internal type should not exist. The footnote makes it clear that the intention is for trivially copyable types to have a memory layout compatible with C. The expectation is then that even an object with non-standard layout, copying it around will still allow it to be useable.

Using std::memcpy to object of non-trivially copyable type

You asked:

What potential problem we could get if we applied that function to an object of non-trivially copyable type?

Here's a very simple example that illustrates the problem of using std::memcpy for objects of non-trivially copyable type.

#include <cstring>

struct A
{
   A(int size) : size_(size), data_(new int[size]) {}
   ~A() { delete [] data_; }

   // The copy constructor and the copy assignment operator need
   // to be implemented for the class too. They have been omitted
   // to keep the code here minimal.

   int size_;
   int* data_;
};

int main()
{
   A a1(10);
   A a2(20);
   std::memcpy(&a1, &a2, sizeof(A));

   // When we return from the function, the original data_ of a1
   // is a memory leak. The data_ of a2 is deleted twice.

   return 0;
}

Does memcpy preserve a trivial object's validity?

Copying an object of a trivial type with std::memcpy into properly sized and aligned storage will implicitly begin the lifetime of a new object at that location.

There is a category of types called implicit-lifetime type whose requirements are :

a scalar type, or
an array type, or
an aggregate class type, or
a class type that has
at least one trivial eligible constructor, and
a trivial, non-deleted destructor,
or a cv-qualified version of one of above types.

Trivial class types meet these requiements.

Objects of implicit-lifetime type have the property that their lifetime can be started implicitly be several functions or operations :

operations that begin lifetime of an array of type char, unsigned char, or std::byte, (since C++17) in which case such objects are created in the array,
call to following allocating functions, in which case such objects are
created in the allocated storage:
operator new
operator new[]
std::malloc
std::calloc
std::realloc
std::aligned_alloc (since C++17)
call to following object representation copying functions, in which case such objects are created in the destination region of storage or the result:
std::memcpy
std::memmove
std::bit_cast (since C++20)

Copy trivially copyable types using temporary storage areas: is it allowed?

It reads fine to me.

You've copied the underlying bytes of obj1 into obj2. Both are trivial and of the same type. The prose you quote permits this explicitly.

The fact that said underlying bytes were temporarily stored in a correctly-sized and correctly-aligned holding area, via an also-explicitly-permitted reinterpretation as char*, doesn't seem to change that. They're still "those bytes". There's no rule that says copying must be "direct" in order to satisfy features like this.

Indeed, this is not only a completely common pattern when dealing with network transfer (conventional use of course doesn't make it right on its own), but also a historically normal thing to do that the standard would be mad not to account for (which gives me all the assurance I need that it is indeed intended).

I can see how there may be doubt, given that the rule is first given for copying those bytes back into the original object, then given again for copying those bytes into a new object. But I can't detect any logical difference between the two circumstances, and therefore find the first quoted wording to be largely redundant. It's possible the author just wanted to be crystal clear that this safety applies identically in both cases.

Is std::memcpy between different trivially copyable types undefined behavior?

The standard may fail to say properly that this is allowed, but it's almost certainly supposed to be, and to the best of my knowledge, all implementations will treat this as defined behaviour.

In order to facilitate the copying into an actual char[N] object, the bytes making up the f object can be accessed as if they were a char[N]. This part, I believe, is not in dispute.

Bytes from a char[N] that represent a uint32_t value may be copied into an uint32_t object. This part, I believe, is also not in dispute.

Equally undisputed, I believe, is that e.g. fwrite may have written the bytes in one run of the program, and fread may have read them back in another run, or even another program entirely.

Because of that last part, I believe it does not matter where the bytes came from, as long as they form a valid representation of some uint32_t object. You could have cycled through all float values, using memcmp on each until you got the representation you wanted, that you knew would be identical to that of the uint32_t value you're interpreting it as. You could even have done that in another program, a program that the compiler has never seen. That would have been valid.

If from the implementation's perspective, your code is indistinguishable from unambiguously valid code, your code must be seen as valid.

Is copying trivially copyable objects always defined in C++14?

There are a couple things at play here:

an expression evaluating to an indeterminate value causes undefined behavior, with certain exceptions (8.5p12)
unsigned char (and possibly char, if unsigned) is the exception
variables with automatic storage duration and whose types have trivial default initialization initially have indeterminate values (5.3.4p17)

This means that

unsigned char is fine, no matter whether using memcpy or memmove or copy-assignment or copy-constructor
memcpy and memmove is presumably fine for all types, because the result is not "produced by an evaluation" (to meet this requirement, an implementation can use unsigned char internally, or take advantage of implementation-specific guarantees made for other types)
copy-constructor and copy-assignment for other types will fail if the right-hand-side is an indeterminate value

Of course, even the valid methods for copying an indeterminate value create another indeterminate value.

Paragraph numbers correspond to draft n4527

"Constructing" a Trivially-Copyable Object With Memcpy