Pointers to Members Representations

Pointers to members representations

Danny Kalev explains this quite nicely:

The Underlying Representation of Pointers to Members
Although pointers to members behave like ordinary pointers, behind the scenes their representation is quite different. In fact, a pointer to member usually consists of a struct containing up to four fields in certain cases. This is because pointers to members have to support not only ordinary member functions, but also virtual member functions, member functions of objects that have multiple base classes, and member functions of virtual base classes. Thus, the simplest member function can be represented as a set of two pointers: one holding the physical memory address of the member function, and a second pointer that holds the this pointer. However, in cases like a virtual member function, multiple inheritance and virtual inheritance, the pointer to member must store additional information. Therefore, you can't cast pointers to members to ordinary pointers nor can you safely cast between pointers to members of different types.
To get a notion of how your compiler represents pointers to members, use the sizeof operator. In the following example, the sizes of a pointer to data member and a pointer to a member function are taken. As you can see, they have different sizes, hence, different representations:

struct A
{
 int x;
 void f();
};
int A::*pmi = &A::x;
void (A::*pmf)() = &A::f;
int n = sizeof (pmi); // 8 byte with my compiler
int m = sizeof (pmf); // 12 bytes with my compiler

Note that each of these pointers may have a different representation, depending on the class in question and whether the member function is virtual.

member function pointers to virtual functions

A non-virtual class method is, basically, an ordinary function, so a pointer to a non-virtual class method is functionally equivalent to an ordinary function pointer, the function's address.

Every non-static class method, whether virtual or not, receives an internal pointer. You know it as "this". This is, typically, an additional, hidden function parameter.

Every class with virtual inheritance has a hidden internal pointer as one of its class members. It's generated by the compiler and the compiler automatically generates the appropriate code to initialize it when an instance of the class gets created. This pointer points to compiler-generated metadata that, amongst other things, records the pointer to the metadata for the instantiated class and what all the real overridden virtual functions are, for that instance of the class.

A pointer to a virtual class method is an address of a function that digs into this, and uses the this's virtual function dispatch metadata to look up the actual, instantiated class and then use the hidden pointer to virtual class's metadata to look up the appropriate virtual function override, for this object, then (after a few more bookkeeping procedures) jumps to the appropriate, real, virtual function.

So the address of a virtual function is, typically, also an address of a function, except that it's not any specific virtual function, but rather a compiler-generated function that figures out what "real" object it's being invoked for and its appropriate overridden virtual function.

This is the capsule summary of a typical compiler implementation. Some details have been omitted. There are minor variations that differ from compiler to compiler.

Pointer to class data member ::*

It's a "pointer to member" - the following code illustrates its use:

#include <iostream>
using namespace std;

class Car
{
    public:
    int speed;
};

int main()
{
    int Car::*pSpeed = &Car::speed;

    Car c1;
    c1.speed = 1;       // direct access
    cout << "speed is " << c1.speed << endl;
    c1.*pSpeed = 2;     // access via pointer to member
    cout << "speed is " << c1.speed << endl;
    return 0;
}

As to why you would want to do that, well it gives you another level of indirection that can solve some tricky problems. But to be honest, I've never had to use them in my own code.

Edit: I can't think off-hand of a convincing use for pointers to member data. Pointer to member functions can be used in pluggable architectures, but once again producing an example in a small space defeats me. The following is my best (untested) try - an Apply function that would do some pre &post processing before applying a user-selected member function to an object:

void Apply( SomeClass * c, void (SomeClass::*func)() ) {
    // do hefty pre-call processing
    (c->*func)();  // call user specified function
    // do hefty post-call processing
}

The parentheses around c->*func are necessary because the ->* operator has lower precedence than the function call operator.

How to get string representation for the member function?

cout << &MyStruct::member;
outputs 1 though in debugger I can see the address.

There is no overload for ostream::operator<<(decltype(&MyStruct::member)). However, the member function pointer is implicitly convertible to bool and for that, there exists an overload and that is the best match for overload resolution. The converted value is true if the pointer is not null. true is output as 1.

string s{ to_string(reinterpret_cast<uintptr_t>(&MyStruct::member)) };
Gives a compile-time error cannot convert. So it seems that not any pointer can be converted.

Perhaps confusingly, in standardese pointer is not an umbrella term for object pointers, pointers-to-members, pointers-to-functions and pointers-to-member-functions. Pointers mean just data pointers specifically.

So, the quoted rule does not apply to pointers-to-member-functions. It only applies to (object) pointers.

What else can I do to get a string representation?

You can use a buffer of unsigned char, big enough to represent the pointer, and use std::memcpy. Then print it in the format of your own choice. I recommend hexadecimal.

As Martin Bonner points out, the pointer-to-member may contain padding in which case two values that point to the same member may actually have a different value in the buffer. Therefore the printed value is not of much use because two values are not comparable without knowing which bits (if any) are padding - which is implementation defined.

Unfortunately I need a robust solution so because of this padding I can't use.

No portable robust solution exists.

As Jonathan Wakely points out, there is no padding in the Itanium ABI, so if your compiler uses that, then the suggested memcpy method would work.

How to access an object representation according to the c++ standard?

1) Your approach works:

Working with const pointers ensure that constness is not casted away:

5.2.10/2 The reinterpret_cast operator shall not cast away constness.

The pointer conversion is safe, because char has not a stricter alignment requirement than some_type, so that you may convert rep back to a some_type*:

5.2.10/7 An object pointer can be explicitly converted to an object pointer of a different type. (...) Converting a prvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value.

Edit: In my understanding, there is no doubt about inter-convertibility between the pointer to an object and the pointer to its representation:

1.8/6: Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first
byte it occupies.

3.9/4: The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T,
where N equals sizeof(T).

I understand that "taken up" is a synonym of "occupies". Note also that, the & operator guarantees that:

5.3.1/3: (...) if the type of the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the
designated object

2) The object representation is initialized with the object:

This is induced from the definition of the value representation, taken together with the memory model and the object lifecylcle.

However, your example is more complex:

rep[0] may despite this property remain an undetermined value, if it is composed solely of padding bits. This is the case in your example, because the object has at least a size of 1, but as you have no member in it, the value representation is empty.
rep[1] can be undefined behavior, if sizeof(some_type)<2 because dereferencing a pointer passed the last element of an array is UB.

3) What is the object representation (in plain language) ?

Let's take a simple example:

class some_other_type {
    int a;
    std::string s;
};

There is an ambiguity when speaking about the memory occupied by an object:

is it only the fixed size contiguous memory corresponding to the its type (i.e. an int, some size_t for the string's length and some pointer to the chars in the string, like it would be done in C) ?
or is it all the values stored in memory for the object, including at some values stored in memory places allocated somewhere else (e.g. also the bytes required to store the value of our string) ?

The object representation corresponds to the first part. For objects that are not trivially copiable, the object representation is not self sufficient (i.e. in our example, the bytes stored in the string are not necessarily part of the object representation).

The value representation corresponds to the second part (and would include the bytes required to store the value of the string).

In plain words, this means that the address of an object is the address of its representation, but the object representation may contain padding and may not be sufficient to hold every data that belongs to the object.

pointer to member function of incomplete type

The MSVC compiler uses different sizes for pointers to member functions as an optimization. This optimization violates the Standard. Kudos to Igor Tandetnik mentioning reinterpret_cast in a MSDN form post, [expr.reinterpret.cast]p10

A prvalue of type “pointer to member of X of type T1” can be
explicitly converted to a prvalue of a different type “pointer to
member of Y of type T2” if T1 and T2 are both function types
or both object types. The null member pointer value is converted to
the null member pointer value of the destination type. The result of
this conversion is unspecified, except in the following cases:

converting a prvalue of type “pointer to member function” to a different pointer to member function type and back to its original
type yields the original pointer to member value.

So there's a roundtrip guarantee, this effectively forces conforming implementations to use the same size for all pointer to member function types.

The MSVC optimization is performed if the /vmb switch is set. For the case of single inheritance, the optimised pointer to member function requires only a void*-sized storage, see The Old New Thing: Pointers to member functions are very strange animals.

If you only forward-declare the type CL and then form a pointer-to-member function, the optimization hopefully is deactivated (I could not find any documentation on that, unfortunately). Otherwise, you might get inconsistent sizes before and after the definition of CL.

By the way, you can get inconsistent sizes for enumerations in VS2010, if you forward-declare them without specifying an underlying type and later explicitly define the underlying type for the definition of the enum. This works only with language extensions activated.

Pointers to Members Representations