What Is a Vtable in C++

What is a vtable in C++

V-tables (or virtual tables) are how most C++ implementations do polymorphism. For each concrete implementation of a class, there is a table of function pointers to all the virtual methods. A pointer to this table (called the virtual table) exists as a data member in all the objects. When one calls a virtual method, we lookup the object's v-table and call the appropriate derived class method.

When is a vtable created in C++?

Beyond "vtables are implementation-specific" (which they are), if a vtable is used: there will be unique vtables for each of your classes. Even though B::f and C::f are not declared virtual, because there is a matching signature on a virtual method from a base class (A in your code), B::f and C::f are both implicitly virtual. Because each class has at least one unique virtual method (B::f overrides A::f for B instances and C::f similarly for C instances), you need three vtables.

You generally shouldn't worry about such details. What matters is whether you have virtual dispatch or not. You don't have to use virtual dispatch, by explicitly specifying which function to call, but this is generally only useful when implementing a virtual method (such as to call the base's method). Example:

struct B {
virtual void f() {}
virtual void g() {}
};

struct D : B {
virtual void f() { // would be implicitly virtual even if not declared virtual
B::f();
// do D-specific stuff
}
virtual void g() {}
};

int main() {
{
B b; b.g(); b.B::g(); // both call B::g
}
{
D d;
B& b = d;
b.g(); // calls D::g
b.B::g(); // calls B::g

b.D::g(); // not allowed
d.D::g(); // calls D::g

void (B::*p)() = &B::g;
(b.*p)(); // calls D::g
// calls through a function pointer always use virtual dispatch
// (if the pointed-to function is virtual)
}
return 0;
}

Some concrete rules that may help; but don't quote me on these, I've likely missed some edge cases:

  • If a class has virtual methods or virtual bases, even if inherited, then instances must have a vtable pointer.
  • If a class declares non-inherited virtual methods (such as when it doesn't have a base class), then it must have its own vtable.
  • If a class has a different set of overriding methods than its first base class, then it must have its own vtable, and cannot reuse the base's. (Destructors commonly require this.)
  • If a class has multiple base classes, with the second or later base having virtual methods:

    • If no earlier bases have virtual methods and the Empty Base Optimization was applied to all earlier bases, then treat this base as the first base class.
    • Otherwise, the class must have its own vtable.
  • If a class has any virtual base classes, it must have its own vtable.

Remember that a vtable is similar to a static data member of a class, and instances have only pointers to these.

Also see the comprehensive article C++: Under the Hood (March 1994) by Jan Gray. (Try Google if that link dies.)

Example of reusing a vtable:

struct B {
virtual void f();
};
struct D : B {
// does not override B::f
// does not have other virtuals of its own
void g(); // still might have its own non-virtuals
int n; // and data members
};

In particular, notice B's dtor isn't virtual (and this is likely a mistake in real code), but in this example, D instances will point to the same vtable as B instances.

How are virtual functions and vtable implemented?

How are virtual functions implemented at a deep level?

From "Virtual Functions in C++":

Whenever a program has a virtual function declared, a v - table is constructed for the class. The v-table consists of addresses to the virtual functions for classes that contain one or more virtual functions. The object of the class containing the virtual function contains a virtual pointer that points to the base address of the virtual table in memory. Whenever there is a virtual function call, the v-table is used to resolve to the function address. An object of the class that contains one or more virtual functions contains a virtual pointer called the vptr at the very beginning of the object in the memory. Hence the size of the object in this case increases by the size of the pointer. This vptr contains the base address of the virtual table in memory. Note that virtual tables are class specific, i.e., there is only one virtual table for a class irrespective of the number of virtual functions it contains. This virtual table in turn contains the base addresses of one or more virtual functions of the class. At the time when a virtual function is called on an object, the vptr of that object provides the base address of the virtual table for that class in memory. This table is used to resolve the function call as it contains the addresses of all the virtual functions of that class. This is how dynamic binding is resolved during a virtual function call.

Can the vtable be modified or even directly accessed at runtime?

Universally, I believe the answer is "no". You could do some memory mangling to find the vtable but you still wouldn't know what the function signature looks like to call it. Anything that you would want to achieve with this ability (that the language supports) should be possible without access to the vtable directly or modifying it at runtime. Also note, the C++ language spec does not specify that vtables are required - however that is how most compilers implement virtual functions.

Does the vtable exist for all objects, or only those that have at least one virtual function?

I believe the answer here is "it depends on the implementation" since the spec doesn't require vtables in the first place. However, in practice, I believe all modern compilers only create a vtable if a class has at least 1 virtual function. There is a space overhead associated with the vtable and a time overhead associated with calling a virtual function vs a non-virtual function.

Do abstract classes simply have a NULL for the function pointer of at least one entry?

The answer is it is unspecified by the language spec so it depends on the implementation. Calling the pure virtual function results in undefined behavior if it is not defined (which it usually isn't) (ISO/IEC 14882:2003 10.4-2). In practice it does allocate a slot in the vtable for the function but does not assign an address to it. This leaves the vtable incomplete which requires the derived classes to implement the function and complete the vtable. Some implementations do simply place a NULL pointer in the vtable entry; other implementations place a pointer to a dummy method that does something similar to an assertion.

Note that an abstract class can define an implementation for a pure virtual function, but that function can only be called with a qualified-id syntax (ie., fully specifying the class in the method name, similar to calling a base class method from a derived class). This is done to provide an easy to use default implementation, while still requiring that a derived class provide an override.

Does having a single virtual function slow down the whole class or only the call to the function that is virtual?

This is getting to the edge of my knowledge, so someone please help me out here if I'm wrong!

I believe that only the functions that are virtual in the class experience the time performance hit related to calling a virtual function vs. a non-virtual function. The space overhead for the class is there either way. Note that if there is a vtable, there is only 1 per class, not one per object.

Does the speed get affected if the virtual function is actually overridden or not, or does this have no effect so long as it is virtual?

I don't believe the execution time of a virtual function that is overridden decreases compared to calling the base virtual function. However, there is an additional space overhead for the class associated with defining another vtable for the derived class vs the base class.

Additional Resources:

http://www.codersource.net/published/view/325/virtual_functions_in.aspx (via way back machine)

http://en.wikipedia.org/wiki/Virtual_table

http://www.codesourcery.com/public/cxx-abi/abi.html#vtable

Object oriented C: Building vtables

You can use ((Dog *)a) -> weight = 42.0; or ((Cat *)a) -> numberOfLives = 9;

Implementing basic vtable in C

A basic vtable is nothing more than an ordinary struct containing function pointers, which can be shared between object instances. There are two basic ways one can implement them. One is to make the vtable pointer an ordinary struct member (this is how it works in C++ under the hood):

#include <stdio.h>
#include <stdlib.h>

typedef struct Person Person;
typedef struct Person_VTable Person_VTable;

struct Person {
int id;
char *name;
const Person_VTable *vtable;
};

struct Person_VTable {
void (*print)(Person *self);
};

void print_name(Person *person) {
printf("Hello %s\n", person->name);
}

static const Person_VTable vtable_Person = {
.print = print_name
};

Person *init_person(void) {
Person *person = malloc(sizeof(Person));
person->vtable = &vtable_Person;
return person;
}

int main(void) {
Person *p = init_person();
p->name = "Greg";
p->vtable->print(p);
return 0;
}

Another is to use fat pointers (this is how it’s implemented in Rust):

#include <stdio.h>
#include <stdlib.h>

typedef struct Person Person;
typedef struct Person_VTable Person_VTable;

typedef struct Person_Ptr {
Person *self;
const Person_VTable *vtable;
} Person_Ptr;

struct Person {
int id;
char *name;
const Person_VTable *vtable;
};

struct Person_VTable {
void (*print)(Person_Ptr self);
};

void print_name(Person_Ptr person) {
printf("Hello %s\n", person.self->name);
}

static const Person_VTable vtable_Person = {
.print = print_name
};

Person_Ptr init_person(void) {
Person_Ptr person;
person.self = malloc(sizeof(Person));
person.vtable = &vtable_Person;
return person;
}

int main(void) {
Person_Ptr p = init_person();
p.self->name = "Greg";
p.vtable->print(p);
return 0;
}

In C, the preferred way is the former, but that’s mostly for syntax reasons: passing structs between functions by value doesn’t have a widely-agreed-upon ABI, while passing two separate pointers is rather unwieldy syntactically. The other method is useful when attaching a vtable to an object whose memory layout is not under your control.

In essence, the only advantages of vtables over ordinary function pointer members is that they conserve memory (each instance of the struct only needs to carry one vtable pointer) and protect against memory corruption (the vtables themselves can reside in read-only memory).

What is nm telling me regarding the vtable in this program compiled with gcc on Linux?

The vtable stores the addresses of the implemented virtual methods. If all methods of a class are pure-virtual and none are implemented, then no vtable needs to be generated yet*, because there is no way to instantiate such a class by itself (in debug mode the vtable may still be generated, pointing everything to a trap function).

When you compile Derived.cpp with Base.h having a non-pure virtual function, it references the vtable of Base.

When you subsequently change Base.h to have only pure virtual functions and rebuild Base.o, the vtable from Base.o disappears. At this point you need to rebuild Derived.o, otherwise it will keep on referencing the non-existing vtable.

When you rebuild Derived.o, the compiler sees that Base is a pure-virtual class and generates a vtable for it in Derived.o itself because it knows there isn't one in Base.o.

Another potential issue arises after reordering virtual functions in the base class. Then derived classes, if not rebuilt, can end up invoking the wrong functions in their parent class.

That's why it is important to get the dependency chain right to make sure dependent object files are rebuilt when necessary.

Derived.o: Derived.cpp Derived.h Base.h

* the gory details are compiler-dependent but the way GCC does it is: since it's impossible to instantiate a pure-virtual class, the vtable generation is actually postponed until there is at least one implementation, because only then it is actually possible to have an instance of the class. So the vtable is generated with every derived implementation and exported as a "weak" object (type V) to allow for potential duplicates to be merged at link time.

Why do we need a virtual table?

Without virtual tables you wouldn't be able to make runtime polymorphism work since all references to functions would be bound at compile time. A simple example

struct Base {
virtual void f() { }
};

struct Derived : public Base {
virtual void f() { }
};

void callF( Base *o ) {
o->f();
}

int main() {
Derived d;
callF( &d );
}

Inside the function callF, you only know that o points to a Base object. However, at runtime, the code should call Derived::f (since Base::f is virtual). At compile time, the compiler can't know which code is going to be executed by the o->f() call since it doesn't know what o points to.

Hence, you need something called a "virtual table" which is basically a table of function pointers. Each object that has virtual functions has a "v-table pointer" that points to the virtual table for objects of its type.

The code in the callF function above then only needs to look up the entry for Base::f in the virtual table (which it finds based on the v-table pointer in the object), and then it calls the function that table entry points to. That might be Base::f but it is also possible that it points to something else - Derived::f, for instance.

This means that due to the virtual table, you're able to have polymorphism at runtime because the actual function being called is determined at runtime by looking up a function pointer in the virtual table and then calling the function via that pointer - instead of calling the function directly (as is the case for non-virtual functions).

How does the compiler know which entry in vtable corresponds to a virtual function?

I'll modify your example a little so it shows more interesting aspects of object orientation.

Suppose we have the following:

#include <iostream>

struct Animal
{
int age;
Animal(int a) : age {a} {}
virtual int setAge(int);
virtual void sayHello() const;
};

int
Animal::setAge(int a)
{
int prev = this->age;
this->age = a;
return prev;
}

void
Animal::sayHello() const
{
std::cout << "Hello, I'm an " << this->age << " year old animal.\n";
}

struct Tiger : Animal
{
int stripes;
Tiger(int a, int s) : Animal {a}, stripes {s} {}
virtual void sayHello() const override;
virtual void doTigerishThing();
};

void
Tiger::sayHello() const
{
std::cout << "Hello, I'm a " << this->age << " year old tiger with "
<< this->stripes << " stripes.\n";
}

void
Tiger::doTigerishThing()
{
this->stripes += 1;
}

int
main()
{
Tiger * tp = new Tiger {7, 42};
Animal * ap = tp;
tp->sayHello(); // call overridden function via derived pointer
tp->doTigerishThing(); // call child function via derived pointer
tp->setAge(8); // call parent function via derived pointer
ap->sayHello(); // call overridden function via base pointer
}

I'm ignoring the good advice that classes with virtual function members should have a virtual destructor for the purpose of this example. I'm going to leak the object anyway.

Let's see how we can translate this example into good old C where there are no member functions, leave alone with virtual ones. All of the following code is C, not C++.

The struct animal is simple:

struct animal
{
const void * vptr;
int age;
};

In addition to the age member, we have added a vptr that will be the pointer to the vtable. I'm using a void pointer for this because we'll have to do ugly casts anyway and using void * reduces the ugliness a little.

Next, we can implement the member functions.

static int
animal_set_age(void * p, int a)
{
struct animal * this = (struct animal *) p;
int prev = this->age;
this->age = a;
return prev;
}

Note the additional 0-th argument: the this pointer that is passed implicitly in C++. Again, I'm using a void * pointer as it will simplify things later on. Note that inside any member function, we always know the type of the this pointer statically so the cast is no problem. (And at the machine level, it doesn't do anything at all anyways.)

The sayHello member is defined likewise except that the this pointer is const qualified this time.

static void
animal_say_hello(const void * p)
{
const struct animal * this = (const struct animal *) p;
printf("Hello, I'm an %d year old animal.\n", this->age);
}

Time for the animal vtable. First we have to give it a type, which is straight-forward.

struct animal_vtable_type
{
int (*setAge)(void *, int);
void (*sayHello)(const void *);
};

Then we create a single instance of the vtable and set it up with the correct member functions. If Animal had have a pure virtual member, the corresponding entry would have a NULL value and were better not dereferenced.

static const struct animal_vtable_type animal_vtable = {
.setAge = animal_set_age,
.sayHello = animal_say_hello,
};

Note that animal_set_age and animal_say_hello were declared static. That's onkay because they will never be referred to by-name but only via the vtable (and the vtable only via the vptr so it can be static too).

We can now implement the constructor for Animal

void
animal_ctor(void * p, int age)
{
struct animal * this = (struct animal *) p;
this->vptr = &animal_vtable;
this->age = age;
}

…and the corresponding operator new:

void *
animal_new(int age)
{
void * p = malloc(sizeof(struct animal));
if (p != NULL)
animal_ctor(p, age);
return p;
}

About the only thing interesting is the line where the vptr is set in the constructor.

Let's move on to tigers.

Tiger inherits from Animal so it gets a struct tiger sub-object. I'm doing this by placing a struct animal as the first member. It is essential that this is the first member because it means that the first member of that object – the vptr – has the same address as our object. We'll need this later when we'll do some tricky casting.

struct tiger
{
struct animal base;
int stripes;
};

We could also have simply copied the members of struct animal lexically at the beginning of the definition of struct tiger but that might be harder to maintain. A compiler doesn't care about such stylistic issues.

We already know how to implement the member functions for tigers.

void
tiger_say_hello(const void * p)
{
const struct tiger * this = (const struct tiger *) p;
printf("Hello, I'm an %d year old tiger with %d stripes.\n",
this->base.age, this->stripes);
}

void
tiger_do_tigerish_thing(void * p)
{
struct tiger * this = (struct tiger *) p;
this->stripes += 1;
}

Note that we are casting the this pointer to struct tiger this time. If a tiger function is called, the this pointer had better point to a tiger, even if we are called through a base pointer.

Next to the vtable:

struct tiger_vtable_type
{
int (*setAge)(void *, int);
void (*sayHello)(const void *);
void (*doTigerishThing)(void *);
};

Note that the first two members are exactly the same as for animal_vtable_type. This is essential and basically the the direct answer to your question. It would have been more explicit, perhaps, if I had placed a struct animal_vtable_type as the first member. I want to emphasize that the object layout would have been exactly the same except that we couldn't play our nasty casting tricks in this case. Again, these are aspects of the C language, not present at machine level so a compiler is not bothered by this.

Create a vtable instance:

static const struct tiger_vtable_type tiger_vtable = {
.setAge = animal_set_age,
.sayHello = tiger_say_hello,
.doTigerishThing = tiger_do_tigerish_thing,
};

And implement the constructor:

void
tiger_ctor(void * p, int age, int stripes)
{
struct tiger * this = (struct tiger *) p;
animal_ctor(this, age);
this->base.vptr = &tiger_vtable;
this->stripes = stripes;
}

The first thing the tiger constructor does is calling the animal constructor. Remember how the animal constructor sets the vptr to &animal_vtable? This is the reason why calling virtual member functions from a base class constructor ofter surprises people. Only after the base class constructor has run, we re-assign the vptr to the derived type and then do our own initialization.

operator new is just boilerplate.

void *
tiger_new(int age, int stripes)
{
void * p = malloc(sizeof(struct tiger));
if (p != NULL)
tiger_ctor(p, age, stripes);
return p;
}

We're done. But how do we call a virtual member function? For this, I'll define a helper macro.

#define INVOKE_VIRTUAL_ARGS(STYPE, THIS, FUNC, ...)                     \
(*((const struct STYPE ## _vtable_type * *) (THIS)))->FUNC( THIS, __VA_ARGS__ )

Now, this is ugly. What it does is taking the static type STYPE, a this pointer THIS and the name of the member function FUNC and any additional arguments to pass to the function.

Then, it constructs the type name of the vtable from the static type. (The ## is the preprocessor's token pasting operator. For example, if STYPE is animal, then STYPE ## _vtable_type will expand to animal_vtable_type.)

Next, the THIS pointer is casted to a pointer to a pointer to the just derived vtable type. This works because we've made sure to put the vptr as the first member in every object so it has the same address. This is essential.

Once this is done, we can dereference the pointer (to get the actual vptr) and then ask for its FUNC member and finally call it. (__VA_ARGS__ expands to the additional variadic macro arguments.) Note that we also pass the THIS pointer as the 0-th argument to the member function.

Now, the acatual truth is that I had to define an almost identical macro again for functions that take no arguments because the preprocessor does not allow a variadic macro argument pack to be empty. So shall it be.

#define INVOKE_VIRTUAL(STYPE, THIS, FUNC)                               \
(*((const struct STYPE ## _vtable_type * *) (THIS)))->FUNC( THIS )

And it works:

#include <stdio.h>
#include <stdlib.h>

/* Insert all the code from above here... */

int
main()
{
struct tiger * tp = tiger_new(7, 42);
struct animal * ap = (struct animal *) tp;
INVOKE_VIRTUAL(tiger, tp, sayHello);
INVOKE_VIRTUAL(tiger, tp, doTigerishThing);
INVOKE_VIRTUAL_ARGS(tiger, tp, setAge, 8);
INVOKE_VIRTUAL(animal, ap, sayHello);
return 0;
}

You might be wondering what happens in the

INVOKE_VIRTUAL_ARGS(tiger, tp, setAge, 8);

call. What we are doing is to invoke the non-overridden setAge member of Animal on a Tiger object referred to via a struct tiger pointer. This pointer is first implicitly casted to a void pointer and as such passed as the this pointer to animal_set_age. That function then casts it to a struct animal pointer. Is this correct? It is, because we were careful to put the struct animal as the very first member in struct tiger so the address of the struct tiger object is the same as the address for the struct animal sub-object. It's the same trick (only one level less) we were playing with the vptr.



Related Topics



Leave a reply



Submit