Pointers to Virtual Member Functions. How Does It Work

Pointers to virtual member functions. How does it work?

Here is way too much information about member function pointers. There's some stuff about virtual functions under "The Well-Behaved Compilers", although IIRC when I read the article I was skimming that part, since the article is actually about implementing delegates in C++.

http://www.codeproject.com/KB/cpp/FastDelegate.aspx

The short answer is that it depends on the compiler, but one possibility is that the member function pointer is implemented as a struct containing a pointer to a "thunk" function which makes the virtual call.

How do pointers to member functions work?

This of course depends on the compiler and the target architecture, and there is more than one single way to do it. But I'll describe how it works on the system I use most, g++ for Linux x86_64.

g++ follows the Itanium C++ ABI, which describes a lot of the details of one way various C++ features including virtual functions can be implemented behind the scenes for most architectures.

The ABI says this about pointers to member functions, in section 2.3:

A pointer to member function is a pair as follows:

   ptr:

For a non-virtual function, this field is a simple function pointer. ... For a virtual function, it is 1 plus the virtual table offset (in bytes) of the function, represented as a ptrdiff_t. The value zero represents a NULL pointer, independent of the adjustment field value below.

   adj:

The required adjustment to this, represented as a ptrdiff_t.

It has the size, data size, and alignment of a class containing those two members, in that order.

The +1 to ptr for a virtual function helps detect whether or not the function is virtual, since for most platforms all function pointer values and vtable offsets are even. It also makes sure a null member function pointer has a distinct value from any valid member function pointer.

The vtable / vptr setup for your class A will work something like this C code:

struct A__virt_funcs {
int (*func2)(A*, int);
};

struct A__vtable {
ptrdiff_t offset_to_top;
const std__typeinfo* typeinfo;
struct A__virt_funcs funcs;
};

struct A {
const struct A__virt_funcs* vptr;
};

int A__func1(struct A*, int v) {
std__operator__ltlt(&std__cout, "fun1");
return v;
}

int A__func2(struct A*, int v) {
std__operator__ltlt(&std__cout, "fun2");
return v;
}

extern const std__typeinfo A__typeinfo;

const struct A__vtable vt_for_A = { 0, &A__typeinfo, { &A__func2 } };

void A__initialize(A* a) {
a->vptr = &vt_for_A.funcs;
}

(Yes, a real name mangling scheme would need to do something with function parameter types to allow for overloading, and more things since the operator<< involved is actually a function template specialization. But that's beside the point here.)

Now let's look at the assembly I get for your main() (with options -O0 -fno-stack-protector). My comments are added.

Dump of assembler code for function main:
// Standard stack adjustment for function setup.
0x00000000004007e6 <+0>: push %rbp
0x00000000004007e7 <+1>: mov %rsp,%rbp
0x00000000004007ea <+4>: push %rbx
0x00000000004007eb <+5>: sub $0x38,%rsp
// Put argc in the stack at %rbp-0x34.
0x00000000004007ef <+9>: mov %edi,-0x34(%rbp)
// Put argv in the stack at %rbp-0x40.
0x00000000004007f2 <+12>: mov %rsi,-0x40(%rbp)
// Construct "a" on the stack at %rbp-0x20.
// 0x4009c0 is &vt_for_A.funcs.
0x00000000004007f6 <+16>: mov $0x4009c0,%esi
0x00000000004007fb <+21>: mov %rsi,-0x20(%rbp)
// Check if argc is more than 2.
// In both cases, "pf" will be on the stack at %rbp-0x30.
0x00000000004007ff <+25>: cmpl $0x2,-0x34(%rbp)
0x0000000000400803 <+29>: jle 0x400819 <main+51>
// if (argc <= 2) {
// Initialize pf to { &A__func2, 0 }.
0x0000000000400805 <+31>: mov $0x4008ce,%ecx
0x000000000040080a <+36>: mov $0x0,%ebx
0x000000000040080f <+41>: mov %rcx,-0x30(%rbp)
0x0000000000400813 <+45>: mov %rbx,-0x28(%rbp)
0x0000000000400817 <+49>: jmp 0x40082b <main+69>
// } else { [argc > 2]
// Initialize pf to { 1, 0 }.
0x0000000000400819 <+51>: mov $0x1,%eax
0x000000000040081e <+56>: mov $0x0,%edx
0x0000000000400823 <+61>: mov %rax,-0x30(%rbp)
0x0000000000400827 <+65>: mov %rdx,-0x28(%rbp)
// }
// Test whether pf.ptr is even or odd:
0x000000000040082b <+69>: mov -0x30(%rbp),%rax
0x000000000040082f <+73>: and $0x1,%eax
0x0000000000400832 <+76>: test %rax,%rax
0x0000000000400835 <+79>: jne 0x40083d <main+87>
// int (*funcaddr)(A*, int); [will be in %rax]
// if (is_even(pf.ptr)) {
// Just do:
// funcaddr = pf.ptr;
0x0000000000400837 <+81>: mov -0x30(%rbp),%rax
0x000000000040083b <+85>: jmp 0x40085c <main+118>
// } else { [is_odd(pf.ptr)]
// Compute A* a2 = (A*)((char*)&a + pf.adj); [in %rax]
0x000000000040083d <+87>: mov -0x28(%rbp),%rax
0x0000000000400841 <+91>: mov %rax,%rdx
0x0000000000400844 <+94>: lea -0x20(%rbp),%rax
0x0000000000400848 <+98>: add %rdx,%rax
// Compute funcaddr =
// (int(*)(A*,int)) (((char*)(a2->vptr))[pf.ptr-1]);
0x000000000040084b <+101>: mov (%rax),%rax
0x000000000040084e <+104>: mov -0x30(%rbp),%rdx
0x0000000000400852 <+108>: sub $0x1,%rdx
0x0000000000400856 <+112>: add %rdx,%rax
0x0000000000400859 <+115>: mov (%rax),%rax
// }
// Compute A* a3 = (A*)((char*)&a + pf.adj); [in %rcx]
0x000000000040085c <+118>: mov -0x28(%rbp),%rdx
0x0000000000400860 <+122>: mov %rdx,%rcx
0x0000000000400863 <+125>: lea -0x20(%rbp),%rdx
0x0000000000400867 <+129>: add %rdx,%rcx
// Call int r = (*funcaddr)(a3, argc);
0x000000000040086a <+132>: mov -0x34(%rbp),%edx
0x000000000040086d <+135>: mov %edx,%esi
0x000000000040086f <+137>: mov %rcx,%rdi
0x0000000000400872 <+140>: callq *%rax
// Standard stack cleanup for function exit.
0x0000000000400874 <+142>: add $0x38,%rsp
0x0000000000400878 <+146>: pop %rbx
0x0000000000400879 <+147>: pop %rbp
// Return r.
0x000000000040087a <+148>: retq
End of assembler dump.

But then what's the deal with the member function pointer's adj value? The assembly added it to the address of a before doing the vtable lookup and also before calling the function, whether the function was virtual or not. But both cases in main set it to zero, so we haven't really seen it in action.

The adj value comes in when we have multiple inheritance. So now suppose we have:

class B
{
public:
virtual void func3() {}
int n;
};

class C : public B, public A
{
public:
int func4(int v) { return v; }
int func2(int v) override { return v; }
};

The layout of an object of type C contains a B subobject (which contains another vptr and an int) and then an A subobject. So the address of the A contained in a C is not the same as the address of the C itself.

As you might be aware, any time code implicitly or explicitly converts a (non-null) C* pointer to an A* pointer, the C++ compiler accounts for this difference by adding the correct offset to the address value. C++ also allows converting from a pointer to member function of A to a pointer to member function of C (since any member of A is also a member of C), and when that happens (for a non-null member function pointer), a similar offset adjustment needs to be made. So if we have:

int (A::*pf1)(int) = &A::func1;
int (C::*pf2)(int) = pf1;

the values within the member function pointers under the hood would be pf1 = { &A__func1, 0 }; and pf2 = { &A__func1, offset_A_in_C };.

And then if we have

C c;
int n = (c.*pf2)(3);

the compiler will implement the call to the member function pointer by adding the offset pf2.adj to the address &c to find the implicit "this" parameter, which is good because then it will be a valid A* value as A__func1 expects.

The same thing goes for a virtual function call, except that as the disassembly dump showed, the offset is needed both to find the implicit "this" parameter and to find the vptr which contains the actual function code address. There's an added twist to the virtual case, but it's one which is needed for both ordinary virtual calls and calls using a pointer to member function: The virtual function func2 will be called with an A* "this" parameter since that's where the original overridden declaration is, and the compiler won't in general be able to know if the "this" argument is actually of any other type. But the definition of override C::func2 expects a C* "this" parameter. So when the most derived type is C, the vptr within the A subobject will point at a vtable which has an entry pointing not at the code for C::func2 itself, but at a tiny "thunk" function, which does nothing but subtract offset_A_in_C from the "this" parameter and then pass control to the actual C::func2.

member function pointers to virtual functions

A non-virtual class method is, basically, an ordinary function, so a pointer to a non-virtual class method is functionally equivalent to an ordinary function pointer, the function's address.

Every non-static class method, whether virtual or not, receives an internal pointer. You know it as "this". This is, typically, an additional, hidden function parameter.

Every class with virtual inheritance has a hidden internal pointer as one of its class members. It's generated by the compiler and the compiler automatically generates the appropriate code to initialize it when an instance of the class gets created. This pointer points to compiler-generated metadata that, amongst other things, records the pointer to the metadata for the instantiated class and what all the real overridden virtual functions are, for that instance of the class.

A pointer to a virtual class method is an address of a function that digs into this, and uses the this's virtual function dispatch metadata to look up the actual, instantiated class and then use the hidden pointer to virtual class's metadata to look up the appropriate virtual function override, for this object, then (after a few more bookkeeping procedures) jumps to the appropriate, real, virtual function.

So the address of a virtual function is, typically, also an address of a function, except that it's not any specific virtual function, but rather a compiler-generated function that figures out what "real" object it's being invoked for and its appropriate overridden virtual function.

This is the capsule summary of a typical compiler implementation. Some details have been omitted. There are minor variations that differ from compiler to compiler.

Pointer-to-member-function performs virtual dispatch?

Refer to [expr.call], specifically here

[If the selected function is virtual], its final overrider in the dynamic type of the object expression is called; such a call is referred to as a virtual function call

Whether you call the function through a pointer or by class member access is the same (ref); the actual function called for a virtual function ultimately depends on the actual type of the object it is being called on.

A few (non-normative) notes in the standard under [class.virtual] say much the same thing:

[Note 3: The interpretation of the call of a virtual function depends on the type of the object for which it is called (the dynamic type)

[Note 4: [...] a virtual function call relies on a specific object for determining which function to invoke.


If you would like to know how the virtual function dispatch takes place, you will need to seek out a specific implementation, because the how is not standardized.

(I really enjoyed the article A basic glance at the virtual table, which shows one possible way you could implement it using C)

Pointers to a virtual member functions

Example:

#include <iostream>
#include <functional>

class BASE
{
public:
virtual void test_f(){std::cout<<"BASE::test_f\n";}
};

class DERIVED:public BASE
{
public:
virtual void test_f(){std::cout<<"DERIVED::test_f\n";}
};

int main()
{
// prints Derived
void (BASE::*p_base)() = &BASE::test_f;
DERIVED a;
(a.*p_base)();
auto f = std::mem_fun(&BASE::test_f);
f(&a);

// prints Base
a.BASE::test_f();
auto callLater = [&a]() { a.BASE::test_f();};
callLater();
}

Pointer to a base virtual member function

You might define a non-virtual function real_g called from g so code

struct Base
{
void real_g() const {
std::cout << "Base" << std::endl;
}
virtual void g() const { real_g(); };
};

Then later in main

std::mem_fn(&Base::real_g)(d);

See wikipage on virtual method table, this C++ reference and the C++ standard n3337 or better. Read also a good C++ programming book and the documentation of your C++ compiler, e.g. GCC

See also this answer (explaining naively what are vtables, in simple cases)



Related Topics



Leave a reply



Submit