Why Would the Behavior of Std::Memcpy Be Undefined For Objects That Are Not Triviallycopyable

Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?

It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call.

Using the target object - calling its member functions, accessing its data members - is clearly undefined[basic.life]/6, and so is a subsequent, implicit destructor call[basic.life]/4 for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5:

However, if any such execution contains an undefined operation, this
International Standard places no requirement on the implementation
executing that program with that input (not even with regard to
operations preceding the first undefined operation
).

If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the memcpy call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make.

It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. std::copy on pointers to trivially copyable types usually calls memcpy on the underlying bytes. So does swap.

So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.

Is std::memcpy between different trivially copyable types undefined behavior?

The standard may fail to say properly that this is allowed, but it's almost certainly supposed to be, and to the best of my knowledge, all implementations will treat this as defined behaviour.

In order to facilitate the copying into an actual char[N] object, the bytes making up the f object can be accessed as if they were a char[N]. This part, I believe, is not in dispute.

Bytes from a char[N] that represent a uint32_t value may be copied into an uint32_t object. This part, I believe, is also not in dispute.

Equally undisputed, I believe, is that e.g. fwrite may have written the bytes in one run of the program, and fread may have read them back in another run, or even another program entirely.

Because of that last part, I believe it does not matter where the bytes came from, as long as they form a valid representation of some uint32_t object. You could have cycled through all float values, using memcmp on each until you got the representation you wanted, that you knew would be identical to that of the uint32_t value you're interpreting it as. You could even have done that in another program, a program that the compiler has never seen. That would have been valid.

If from the implementation's perspective, your code is indistinguishable from unambiguously valid code, your code must be seen as valid.

Why does std::memcpy (as an alternative to type-punning) not cause undefined behaviour?

Why does type-punning using std::memcpy not cause undefined behaviour?

Beause the language says so (latest draft):

[basic.types]

For any object (other than a potentially-overlapping subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std​::​byte ([cstddef.syn]).
If the content of that array is copied back into the object, the object shall subsequently hold its original value.

Note however the condition on that rule. Your code can potentially have undefined behaviour, but not (unless some other rule says so) in case the copied value was originally copied from another double, or in practice, if the value could have been copied from a double.

If it was not a double, but a complex class object, accessing it surely would not be defined either, would it?

Depends on what you mean by complexity. The conditions where this applies are in the quoted rule.

Why (if that is the case) does the standard say that copying uninitialized memory with memcpy is UB?

Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard?

Depends on the type of the member. Standard says:

[basic.indet]

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned ordinary character type ([basic.fundamental]) or std​::​byte type ([cstddef.syn]) is produced by the evaluation of:

    • the second or third operand of a conditional expression,
    • the right operand of a comma expression,
    • the operand of a cast or conversion ([conv.integral], [expr.type.conv], [expr.static.cast], [expr.cast]) to an unsigned ordinary character type or std​::​byte type ([cstddef.syn]), or
    • a discarded-value expression,

    then the result of the operation is an indeterminate value.

  • If an indeterminate value of unsigned ordinary character type or std​::​byte type is produced by the evaluation of the right operand of a simple assignment operator ([expr.ass]) whose first operand is an lvalue of unsigned ordinary character type or std​::​byte type, an indeterminate value replaces the value of the object referred to by the left operand.

  • If an indeterminate value of unsigned ordinary character type is produced by the evaluation of the initialization expression when initializing an object of unsigned ordinary character type, that object is initialized to an indeterminate value.
    If an indeterminate value of unsigned ordinary character type or std​::​byte type is produced by the evaluation of the initialization expression when initializing an object of std​::​byte type, that object is initialized to an indeterminate value.

None of the exceptional cases apply to your example object, so UB applies.



with memcpy is UB?

It is not. std::memcpy interprets the object as an array of bytes, in which exceptional case there is no UB. You still have UB if you attempt to read the indeterminate copy (unless the exceptions above apply).



why?

The C++ standard doesn't include a rationale for most rules. This particular rule has existed since the first standard. It is slightly stricter than the related C rule which is about trap representations. To my understanding, there is no established convention for trap handling, and the authors didn't wish to restrict implementations by specifying it, and instead opted to specify it as UB. This also has the effect of allowing optimiser to deduce that indeterminate values will never be read.



I might want to move this object into a container:

Moving an uninitialised object into a container is typically a logic error. It is unclear why you might want to do such thing.

Using std::memcpy to object of non-trivially copyable type

You asked:

What potential problem we could get if we applied that function to an object of non-trivially copyable type?

Here's a very simple example that illustrates the problem of using std::memcpy for objects of non-trivially copyable type.

#include <cstring>

struct A
{
A(int size) : size_(size), data_(new int[size]) {}
~A() { delete [] data_; }

// The copy constructor and the copy assignment operator need
// to be implemented for the class too. They have been omitted
// to keep the code here minimal.

int size_;
int* data_;
};

int main()
{
A a1(10);
A a2(20);
std::memcpy(&a1, &a2, sizeof(A));

// When we return from the function, the original data_ of a1
// is a memory leak. The data_ of a2 is deleted twice.

return 0;
}

Does memcpy preserve a trivial object's validity?

Copying an object of a trivial type with std::memcpy into properly sized and aligned storage will implicitly begin the lifetime of a new object at that location.

There is a category of types called implicit-lifetime type whose requirements are :

  • a scalar type, or
  • an array type, or
  • an aggregate class type, or
  • a class type that has
    • at least one trivial eligible constructor, and
    • a trivial, non-deleted destructor,
  • or a cv-qualified version of one of above types.

Trivial class types meet these requiements.

Objects of implicit-lifetime type have the property that their lifetime can be started implicitly be several functions or operations :

  • operations that begin lifetime of an array of type char, unsigned char, or std::byte, (since C++17) in which case such objects are created in the array,
  • call to following allocating functions, in which case such objects are
    • created in the allocated storage:
    • operator new
    • operator new[]
    • std::malloc
    • std::calloc
    • std::realloc
    • std::aligned_alloc (since C++17)
  • call to following object representation copying functions, in which case such objects are created in the destination region of storage or the result:
    • std::memcpy
    • std::memmove
    • std::bit_cast (since C++20)

Why is it not undefined behavior that `std::uninitialized_copy` typically dereferences an iterator to uninitialized memory?

I'm aware that dereferencing a pointer or iterator that points to uninitialized memory is illegal

Not quite. The indirection alone is not illegal. Behaviour is only undefined in case of performing operations such as those that depend on the value.

std::addressof does not access the value of referred object. It only takes its address. This is something that is allowed on objects before and after their lifetime while their storage has been allocated.

Even if this wasn't true due to some technicality in the rules, standard library implementation is not necessarily limited by the rules of the language.


Standard quotes (latest draft):

[basic.life]

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated ... any pointer that represents the address of the storage location where the object will be ... located may be used but only in limited ways.
For an object under construction or destruction, see [class.cdtor].
Otherwise, such a pointer refers to allocated storage ([basic.stc.dynamic.allocation]), and using the pointer as if the pointer were of type void* is well-defined.
Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below.
The program has undefined behavior if: (no cases that apply here)

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated ... any glvalue that refers to the original object may be used but only in limited ways.
For an object under construction or destruction, see [class.cdtor].
Otherwise, such a glvalue refers to allocated storage ([basic.stc.dynamic.allocation]), and using the properties of the glvalue that do not depend on its value is well-defined.
The program has undefined behavior if: (no cases that apply here)



Related Topics



Leave a reply



Submit