Why Isn't Memcpy Guaranteed to Be Safe for Non-Pod Types

memcpy a non-POD object

The reason a trivially copyable class (C++11 mostly uses the concepts trivial class and standard-layout class instead of POD) can be memcpy'ed is not related to dynamic allocation as other answers/comments suggest. Granted, if you do try a shallow copy of a type that has dynamic allocation, you are inviting trouble. But you could very well have a type with a pointer that does dynamic allocation in a user provided constructor (as long as it has a default constructor) and qualify as trivial class.

The actual reason a memcpy can be guaranteed is that trivially copyable (and also standard-layout) types are required to occupy contiguous bytes of storage whereas other objects are not.

N3690

1.8.5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more
bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout
type (3.9) shall occupy contiguous bytes of storage.

memcpy a non-POD object

The reason a trivially copyable class (C++11 mostly uses the concepts trivial class and standard-layout class instead of POD) can be memcpy'ed is not related to dynamic allocation as other answers/comments suggest. Granted, if you do try a shallow copy of a type that has dynamic allocation, you are inviting trouble. But you could very well have a type with a pointer that does dynamic allocation in a user provided constructor (as long as it has a default constructor) and qualify as trivial class.

The actual reason a memcpy can be guaranteed is that trivially copyable (and also standard-layout) types are required to occupy contiguous bytes of storage whereas other objects are not.

N3690

1.8.5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more
bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout
type (3.9) shall occupy contiguous bytes of storage.

C++ guarantee and name for POD-like data, memcpy capable

In C++0x, the concept of PODness is broken out into several individually useful categories:

A trivially copyable class is a class that (draft 3242, section [class]):

  • has no non-trivial copy constructors (12.8),
  • has no non-trivial move constructors (12.8),
  • has no non-trivial copy assignment operators (13.5.3, 12.8),
  • has no non-trivial move assignment operators (13.5.3, 12.8), and
  • has a trivial destructor (12.4).

A trivial class is a class that has a trivial default constructor (12.1) and is trivially copyable.

[ Note: In particular, a trivially copyable or trivial class does not have virtual functions or virtual base
classes. — end note ]

A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with
    non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

The requirements for trivial constructors, assignment operators, and destructor are scattered throughout section 12 "Special Member Functions" [special].

Is it safe to use sizeof operator on non-POD types in C++

Of course it's safe. And note also that sizeof is evaluated at compile-time.

The sizeof a non-POD type will never be less than the total of the size of all its members, excepting any empty base class optimisations.

It could well be greater than the total of the size of all its members, due to packing arrangements and the fact that a class with no members will have a non-zero sizeof.

Can I use memcpy in C++ to copy classes that have no pointers or virtual functions

According to the Standard, if no copy constructor is provided by the programmer for a class, the compiler will synthesize a constructor which exhibits default memberwise initialization. (12.8.8) However, in 12.8.1, the Standard also says,

A class object can be copied in two
ways, by initialization (12.1, 8.5),
including for function argument
passing (5.2.2) and for function value
return (6.6.3), and by assignment
(5.17). Conceptually, these two
operations are implemented by a copy
constructor (12.1) and copy assignment
operator (13.5.3).

The operative word here is "conceptually," which, according to Lippman gives compiler designers an 'out' to actually doing memberwise initialization in "trivial" (12.8.6) implicitly defined copy constructors.

In practice, then, compilers have to synthesize copy constructors for these classes that exhibit behavior as if they were doing memberwise initialization. But if the class exhibits "Bitwise Copy Semantics" (Lippman, p. 43) then the compiler does not have to synthesize a copy constructor (which would result in a function call, possibly inlined) and do bitwise copy instead. This claim is apparently backed up in the ARM, but I haven't looked this up yet.

Using a compiler to validate that something is Standard-compliant is always a bad idea, but compiling your code and viewing the resulting assembly seems to verify that the compiler is not doing memberwise initialization in a synthesized copy constructor, but doing a memcpy instead:

#include <cstdlib>

class MyClass
{
public:
MyClass(){};
int a,b,c;
double x,y,z;
};

int main()
{
MyClass c;
MyClass d = c;

return 0;
}

The assembly generated for MyClass d = c; is:

000000013F441048  lea         rdi,[d] 
000000013F44104D lea rsi,[c]
000000013F441052 mov ecx,28h
000000013F441057 rep movs byte ptr [rdi],byte ptr [rsi]

...where 28h is the sizeof(MyClass).

This was compiled under MSVC9 in Debug mode.

EDIT:

The long and the short of this post is that:

1) So long as doing a bitwise copy will exhibit the same side effects as memberwise copy would, the Standard allows trivial implicit copy constructors to do a memcpy instead of memberwise copies.

2) Some compilers actually do memcpys instead of synthesizing a trivial copy constructor which does memberwise copies.

Is it serializing object representations by memcpy without creating objects safe as far as you don't directly access the values it contains?

The std::allocator<unsigned char>::allocate is specified as calling ::operator new (C++03 lib.allocator.members/3). So this question is substantially similar to "constructing" a trivially-copyable object with memcpy , although without the attempt to alias a value afterwards.

If we replaced the call to memcpy with a char assignment loop: unsigned char *p = (unsigned char *)&d; for (int i = 0; i < sizeof d; ++i) buf[i] = p[i]; then it is definitely undefined behaviour, since the assignment operator only has defined behaviour when the left hand side refers to an object that exists. See this answer for more detail.

However for the memcpy version, the question is: is memcpy the same as this char assignment loop, or something else?

The C++03 standard only defines memcpy by deferring to ISO C90, which says:

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

But it is unclear how to interpret this, since C has a different object model to C++. In C "object" means storage, whereas in C++ "object" and "storage" mean different things, and in the code in this question there is storage with no objects.

The answer by Shafik Yaghmour therefore describes the situation as "unspecified", although I think "not specified" or "unclear" would be better descriptions, since the term "unspecified" has a specific meaning in C++.

Footnote: Nothing substantial changed on this topic as of C++17. But in C++20 this will be well-defined, accepted proposal detail.

How does binary I/O of POD types not break the aliasing rules?

Strict aliasing is about accessing an object through a pointer/reference to a type other than that object's actual type. However, the rules of strict aliasing permit accessing any object of any type through a pointer to an array of bytes. And this rule has been around for at least since C++14.

Now, that doesn't mean much, since something has to define what such an access means. For that (in terms of writing), we only really have two rules: [basic.types]/2 and /3, which cover copying the bytes of Trivially Copyable types. The question ultimately boils down to this:

Are you reading the "the underlying bytes making up [an] object" from the file?

If the data you're reading into your s was in fact copied from the bytes of a live instance of S, then you're 100% fine. It's clear from the standard that performing fwrite writes the given bytes to a file, and performing fread reads those bytes from the file. Therefore, if you write the bytes of an existing S instance to a file, and read those written bytes to an existing S, you have perform the equivalent of copying those bytes.

Where you run into technical issues is when you start getting into the weeds of interpretation. It is reasonable to interpret the standard as defining the behavior of such a program even when the writing and the reading happen in different invocations of the same program.

Concerns arise in one of two cases:

1: When the program which wrote the data is actually a different program than the one who read it.

2: When the program which wrote the data did not actually write an object of type S, but instead wrote bytes that just so happen to be legitimately interpret-able as an S.

The standard doesn't govern interoperability between two programs. However, C++20 does provide a tool that effectively says "if the bytes in this memory contain a legitimate object representation of a T, then I'll return a copy of what that object would look like." It's called std::bit_cast; you can pass it an array of bytes of sizeof(T), and it'll return a copy of that T.

And you get undefined behavior if you're a liar. And bit_cast doesn't even compile if T is not trivially copyable.

However, to do a byte copy directly into a live S from a source that wasn't technically an S but totally could be an S, is a different matter. There isn't wording in the standard to make that work.

Our friend P0593 proposes a mechanism for explicitly declaring such an assumption, but it didn't quite make it into C++20.



Related Topics



Leave a reply



Submit