How Are Virtual Functions and Vtable Implemented

How are virtual functions and vtable implemented?

How are virtual functions implemented at a deep level?

From "Virtual Functions in C++":

Whenever a program has a virtual function declared, a v - table is constructed for the class. The v-table consists of addresses to the virtual functions for classes that contain one or more virtual functions. The object of the class containing the virtual function contains a virtual pointer that points to the base address of the virtual table in memory. Whenever there is a virtual function call, the v-table is used to resolve to the function address. An object of the class that contains one or more virtual functions contains a virtual pointer called the vptr at the very beginning of the object in the memory. Hence the size of the object in this case increases by the size of the pointer. This vptr contains the base address of the virtual table in memory. Note that virtual tables are class specific, i.e., there is only one virtual table for a class irrespective of the number of virtual functions it contains. This virtual table in turn contains the base addresses of one or more virtual functions of the class. At the time when a virtual function is called on an object, the vptr of that object provides the base address of the virtual table for that class in memory. This table is used to resolve the function call as it contains the addresses of all the virtual functions of that class. This is how dynamic binding is resolved during a virtual function call.

Can the vtable be modified or even directly accessed at runtime?

Universally, I believe the answer is "no". You could do some memory mangling to find the vtable but you still wouldn't know what the function signature looks like to call it. Anything that you would want to achieve with this ability (that the language supports) should be possible without access to the vtable directly or modifying it at runtime. Also note, the C++ language spec does not specify that vtables are required - however that is how most compilers implement virtual functions.

Does the vtable exist for all objects, or only those that have at least one virtual function?

I believe the answer here is "it depends on the implementation" since the spec doesn't require vtables in the first place. However, in practice, I believe all modern compilers only create a vtable if a class has at least 1 virtual function. There is a space overhead associated with the vtable and a time overhead associated with calling a virtual function vs a non-virtual function.

Do abstract classes simply have a NULL for the function pointer of at least one entry?

The answer is it is unspecified by the language spec so it depends on the implementation. Calling the pure virtual function results in undefined behavior if it is not defined (which it usually isn't) (ISO/IEC 14882:2003 10.4-2). In practice it does allocate a slot in the vtable for the function but does not assign an address to it. This leaves the vtable incomplete which requires the derived classes to implement the function and complete the vtable. Some implementations do simply place a NULL pointer in the vtable entry; other implementations place a pointer to a dummy method that does something similar to an assertion.

Note that an abstract class can define an implementation for a pure virtual function, but that function can only be called with a qualified-id syntax (ie., fully specifying the class in the method name, similar to calling a base class method from a derived class). This is done to provide an easy to use default implementation, while still requiring that a derived class provide an override.

Does having a single virtual function slow down the whole class or only the call to the function that is virtual?

This is getting to the edge of my knowledge, so someone please help me out here if I'm wrong!

I believe that only the functions that are virtual in the class experience the time performance hit related to calling a virtual function vs. a non-virtual function. The space overhead for the class is there either way. Note that if there is a vtable, there is only 1 per class, not one per object.

Does the speed get affected if the virtual function is actually overridden or not, or does this have no effect so long as it is virtual?

I don't believe the execution time of a virtual function that is overridden decreases compared to calling the base virtual function. However, there is an additional space overhead for the class associated with defining another vtable for the derived class vs the base class.

Additional Resources:

http://www.codersource.net/published/view/325/virtual_functions_in.aspx (via way back machine)

http://en.wikipedia.org/wiki/Virtual_table

http://www.codesourcery.com/public/cxx-abi/abi.html#vtable

How is virtual function table generated in subclass

Simple answer is: it depends on the compiler. The standard does not specify how this should be implemented, only how it should behave.

In practice, implementations tend to simply have a derived vtable which begins with the same structure as the base vtable, than appends the derived class methods on the end (that is to say new methods in the derived class, not overrides).

The vtable pointer just points the the beginning of the whole table. If the object is being accessed through a base pointer type, then no one should ever look beyond the end of the base class methods.

If the pointer is of the derived type, then the same pointer will allow access further down the table to virtual methods declared in the derived class.

ADDENDUM: Multiple Inheritance

Multiple inheritance follows the same basic concepts, but quickly becomes complex for obvious reasons. But there's one important feature to bear in mind.

A multiply derived class has one vtable pointer for each of its base classes, pointing to different vtables or different locations in the same vtable (implementation dependent).

But it's important to remember it's one per immediate base class per object.

Thus if you had a multiply derived class with one int of data and three immediate base classes, the size of each object would actually be 16 bytes (on a 32-bit system; more on 64-bit). 4 for the int and four each for each of the vtable pointers. Plus, of course, the sizes of each of the base classes themselves.

That means that in C++, interfaces are not cheap. (Obviously there are no true interfaces in C++, but a base class with no data and only pure virtual methods emulates it.) Each such interface costs the size of a pointer per object.

In languages like C# and Java, where interfaces are part of the language, there is a slightly different mechanism whereby all interfaces are routed through a single vtable pointer. This is slightly slower, but means only one vtable pointer per object, however many interfaces are implemented.

I'd still follow an interface style approach in C++ for design reasons, but always be aware of this additional overhead.

(And none of this even touches on virtual inheritance.)

How does a virtual table handle pure virtual functions

The c++ standard doesn't specify the implementation of virtual methods.

Generally it's implemented as something like an array of function pointers, pure virtual functions are often pointers to some special function which throws an error.

You have to override pure virtual functions as otherwise when something tries to call those functions what would happen? If you don't want to have to override a particular function don't make it pure virtual in the base class.

For example you could emulate virtual functions with code like this:

#include <iostream>
#include <string>
#include <vector>

class A
{
public:
    A() : vtable(2)
    {
        vtable[0] = &A::aimpl;
        // B is pure virtual
        vtable[1] = &A::pureVirtualFunction;
    }

    void a()
    {
        ((*this).*(vtable[0]))();
    }

    void b()
    {
        ((*this).*(vtable[1]))();
    }

protected:
    std::vector<void (A::*)()> vtable;

private:
    void aimpl()
    {
        std::cout << "A::a\n";
    }

    void pureVirtualFunction()
    {
        throw std::runtime_error("function is pure virtual"); 
    }
};

class B : public A
{
public:
    B()
    {
        // Note: undefined behaviour!!! Don't do this in real code
        vtable[1] = reinterpret_cast<void (A::*)()>(&B::bimpl);
    }

private:
    void bimpl()
    {
        std::cout << "B::b\n";
    }
};

int main()
{
    A a;
    a.a();
    try
    {
        a.b();
    }
    catch (std::exception& ex)
    {
        std::cout << ex.what() << "\n";
    }
    B b;
    b.a();
    b.b();
    return 0;
}

The real implementation is more complex with derived classes being able to add to the vtable, merging vtables from multiple inheritance etc.

Why does virtual inheritance need a vtable even if no virtual functions are involved?

The complete class inheritance hierarchy is already known in compile time.

True enough; so if the compiler knows the type of a most derived object, then it knows the offset of every subobject within that object. For such a purpose, a vtable is not needed.

For example, if B and C both virtually derive from A, and D derives from both B and C, then in the following code:

D d;
A* a = &d;

the conversion from D* to A* is, at most, adding a static offset to the address.

However, now consider this situation:

A* f(B* b) { return b; }
A* g(C* c) { return c; }

Here, f must be able to accept a pointer to any B object, including a B object that may be a subobject of a D object or of some other most derived class object. When compiling f, the compiler doesn't know the full set of derived classes of B.

If the B object is a most derived object, then the A subobject will be located at a certain offset. But what if the B object is part of a D object? The D object only contains one A object and it can't be located at its usual offsets from both the B and C subobjects. So the compiler has to pick a location for the A subobject of D, and then it has to provide a mechanism so that some code with a B* or C* can find out where the A subobject is. This depends solely on the inheritance hierarchy of the most derived type---so a vptr/vtable is an appropriate mechanism.

How can C++ virtual functions be implemented except vtable?

Another known mechanism is type dispatch functions. Effectively, you replace the vtable pointer by a typeid (small enum). The (dyanmic) linker collects all overrides of a given virtual function, and wraps them in one big switch statement on the typeid field.

The theoretical justifcation is that this replaces an indirect jump (non-predicatble) by lots of predicatable jumps. With some smarts in choosing enum values, the switch statement can be fairly effective too (i.e. better than lineair)

How could one implement C++ virtual functions in C

Stolen from here.

From the C++ class

class A {
protected:
    int a;
public:
    A() {a = 10;}
    virtual void update() {a++;}
    int access() {update(); return a;}
};

a C code fragment can be derived. The three C++ member functions of class A are rewritten using out-of-line (standalone) code and collected by address into a struct named A_functable. The data members of A and combined with the function table into a C struct named A.

struct A;

typedef struct {
    void (*A)(struct A*);
    void (*update)(struct A*);
    int (*access)(struct A*);
} A_functable;

typedef struct A{
    int a;
    A_functable *vmt;
} A;

void A_A(A *this);
void A_update(A* this);
int A_access(A* this);

A_functable A_vmt = {A_A, A_update, A_access};

void A_A(A *this) {this->vmt = &A_vmt; this->a = 10;}
void A_update(A* this) {this->a++;}
int A_access(A* this) {this->vmt->update(this); return this->a;}

/*
class B: public A {
public:
    void update() {a--;}
};
*/

struct B;

typedef struct {
    void (*B)(struct B*);
    void (*update)(struct B*);
    int (*access)(struct A*);
} B_functable;

typedef struct B {
    A inherited;
} B;

void B_B(B *this);
void B_update(B* this);

B_functable B_vmt = {B_B, B_update, A_access};

void B_B(B *this) {A_A(this); this->inherited.vmt = &B_vmt; }
void B_update(B* this) {this->inherited.a--;}
int B_access(B* this) {this->inherited.vmt->update(this); return this->inherited.a;}

int main() {
    A x;
    B y;
    A_A(&x);
    B_B(&y);
    printf("%d\n", x.vmt->access(&x));
    printf("%d\n", y.inherited.vmt->access(&y));
}

More elaborate than necessary, but it gets the point across.

Why do we need virtual functions in C++?

Without "virtual" you get "early binding". Which implementation of the method is used gets decided at compile time based on the type of the pointer that you call through.

With "virtual" you get "late binding". Which implementation of the method is used gets decided at run time based on the type of the pointed-to object - what it was originally constructed as. This is not necessarily what you'd think based on the type of the pointer that points to that object.

class Base
{
  public:
            void Method1 ()  {  std::cout << "Base::Method1" << std::endl;  }
    virtual void Method2 ()  {  std::cout << "Base::Method2" << std::endl;  }
};

class Derived : public Base
{
  public:
    void Method1 ()  {  std::cout << "Derived::Method1" << std::endl;  }
    void Method2 ()  {  std::cout << "Derived::Method2" << std::endl;  }
};

Base* basePtr = new Derived ();
  //  Note - constructed as Derived, but pointer stored as Base*

basePtr->Method1 ();  //  Prints "Base::Method1"
basePtr->Method2 ();  //  Prints "Derived::Method2"

EDIT - see this question.

Also - this tutorial covers early and late binding in C++.

How Are Virtual Functions and Vtable Implemented