Is It Ub to Access a Member by Casting an Object Pointer to 'Char *', Then Doing '*(Member_Type*)(Pointer + Offset)'

Can I manually access fields by their raw offset in C++?

Your proposed solution contains multiple instances of Undefined Behavior related to pointer arithmetic.

First (char*)(&s2) - (char*)(this) is Undefined Behavior. This expression is governed by expr.add#5. Since the pointers aren't nullptr and they don't point to elements in the same array, the behavior is undefined.

Second ((char*)(this) + offset) is Undefined Behavior. This time the applicable paragraph is expr.add#4. Since (char*)(this) isn't an element of an array, the only legal value for offset would be 0. Any other value is Undefined Behavior.

But C++ already provides the tool necessary to solve the problem you are describing : pointer to data member. These pointers point to a member of a type instead of a member of an instance. It can be combined with a pointer to an instance (in this case a this pointer) to get a normal object pointer.

Here is your example modified to use a pointer to data member (https://godbolt.org/z/161vT158q) :

#include <cstddef>
#include <iostream>
#include <string>

class Test {
    std::string s1{"s1"}, s2{"s2"};

    // A pointer to an `std::string` member of the type `Test`
    using t_member_pointer = std::string Test::*;

    // Points to `Test::s2`
    t_member_pointer s_ptr = &Test::s2;

public:
    std::string& get() { 
        // Combine the data member pointer with an instance to get an object
        return (this->*s_ptr);
    }
};

int main() {
    Test test1;
    Test test2 = test1;
    std::cout << test2.get(); // note the copy
}

Notice that s_ptr points to Test::s2 and not this->s2. The value of a data member pointer is independent of any instance, it is compatible with any instance of that type. It therefore does not need to be corrected during copy or move, it will behave as expected if simply copied by value between instances.

Is reinterpret_cast char* (myTypePtr) assumed to point to an array?

auto ptr = reinterpret_cast<char*>(myTypePtr);

The standard permit this conversion, due to:

An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note ]

So, the conversion is equivalent to:

assume myTypePtr has no any cv qualifier.

auto ptr = static_cast<char*>(static_cast<void*>(myTypePtr))

And you are permitted to dereference myTypePtr to access the value within the object(the pointer point to), due to:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
a char, unsigned char, or std::byte type.

If myTypePtr is not an object of array of char type, as long as you applied addition to ptr, It will result in undefined behavior, due to:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,86 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[j + p] if 0 ≤ i+j≤n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 ≤ i - j≤n ; otherwise, the behavior is undefined.

For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.

Because the element of myTypePtr is not of type char. Hence applying addition to ptr result in undefined behavior.

Or maybe std::memcpy must be used for that purpose?

Yes, If the object to which myTypePtr point subject to the following rules:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).43 If the content of that array is copied back into the object, the object shall subsequently hold its original value.

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,44 obj2 shall subsequently hold the same value as obj1.

However, It's unfortunately we can't implement such a memcpy subject to the current standard.

Standard layout, taking address of member and indexing past it to next member

As I understand it, if we have an instance of X, we can take its address and cast it to char* and look at the content of X as if it was an array of bytes.

The standard doesn't actually allow this currently. Casting to char* will not change the pointer value (per [expr.static.cast]/13) and as a result you will not be allowed to apply pointer arithmetic on it as it violates [expr.add]/4 and/or [expr.add]/6.

This is however often assumed to be allowed in practice and probably considered a defect in the standard. The paper P1839 by Timur Doumler and Krystian Stasiowski is trying to address that.

But even applying the proposed wording in this paper (revision P1839R5)

X x;
char* p = reinterpret_cast<char*>(&x.obj) + sizeof(Object);
*((int*)p) = 1234; // write an int into buf

will have undefined behavior, at least assuming I am interpreting its proposed wording and examples correctly. (I might not be though.)

First of all, there is no guarantee that buf will be correctly aligned for an int. If it isn't, then the cast (int*)p will produce an unspecified pointer value. But also, there is no guarantee in general that there is no padding between obj and buf.

Even if you assume correct alignment and no padding, because e.g. you have guarantees from your ABI or compiler, there are still problems.

First, the proposal would only allow unsigned char*, not char* or std::byte*, to access the object representation. See "Known issues" section.

Second, after fixing that, p would be a pointer one-past the object representation of obj, so it doesn't point to an object. As a consequence the cast (int*)p cannot point to any int object that might have been implicitly created in buf when X x;'s lifetime started. Instead [expr.static.cast]/13 will apply and the value of the pointer remains unchanged.

Trying to dereference the int* pointer pointing one-past-the-end of the object representation of obj will then cause undefined behavior (as it is not pointing to an object).

You also can't save this using std::launder on the pointer, because a pointer to an int nested inside buf would give you access to bytes which are not reachable through a pointer to the object representation of buf, violating std::launder's precondition, see [ptr.launder]/4.

In a broader picture, if you look at how e.g. std::launder is specified, it seems to me that the intention is definitively not to allow this. The way it is specified, it is impossible to use a pointer (in)to a member of a class (except the first if standard layout) to access memory of other (non-overlapping) members. This specifically seems to be intended to allow a compiler to do optimization by pointer analysis based on assuming that these other members are unreachable. (I don't know whether there is any compiler actually doing this though.)

Is adding to a char * pointer UB, when it doesn't actually point to a char array?

See CWG 1314

According to 6.9 [basic.types] paragraph 4,

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).

and 4.5 [intro.object] paragraph 5,

An object of trivially copyable or standard-layout type (6.9 [basic.types]) shall occupy contiguous bytes of storage.

Do these passages make pointer arithmetic (8.7 [expr.add] paragraph 5) within a standard-layout object well-defined (e.g., for writing one's own version of memcpy?

Rationale (August, 2011):

The current wording is sufficiently clear that this usage is permitted.

I strongly disagree with CWG's statement that "the current wording is sufficiently clear", but nevertheless, that's the ruling we have.

I interpret CWG's response as suggesting that a pointer to unsigned char into an object of trivially copyable or standard-layout type, for the purposes of pointer arithmetic, ought to be interpreted as a pointer to an array of unsigned char whose size equals the size of the object in question. I don't know whether they intended that it would also work using a char pointer or (as of C++17) a std::byte pointer. (Maybe if they had decided to actually clarify it instead of claiming the existing wording was clear enough, then I would know the answer.)

(A separate issue is whether std::launder is required to make the OP's code well-defined. I won't go into this here; I think it deserves a separate question.)

Is It Ub to Access a Member by Casting an Object Pointer to 'Char ', Then Doing '(Member_Type*)(Pointer + Offset)'

Can I manually access fields by their raw offset in C++?

Is reinterpret_cast char* (myTypePtr) assumed to point to an array?

Standard layout, taking address of member and indexing past it to next member

Is adding to a char * pointer UB, when it doesn't actually point to a char array?

Related Topics

Leave a reply