Using Extern Template (C++11)

using extern template (C++11)

You should only use extern template to force the compiler to not instantiate a template when you know that it will be instantiated somewhere else. It is used to reduce compile time and object file size.

For example:

// header.h

template<typename T>
void ReallyBigFunction()
{
    // Body
}

// source1.cpp

#include "header.h"
void something1()
{
    ReallyBigFunction<int>();
}

// source2.cpp

#include "header.h"
void something2()
{
    ReallyBigFunction<int>();
}

This will result in the following object files:

source1.o
    void something1()
    void ReallyBigFunction<int>()    // Compiled first time

source2.o
    void something2()
    void ReallyBigFunction<int>()    // Compiled second time

If both files are linked together, one void ReallyBigFunction<int>() will be discarded, resulting in wasted compile time and object file size.

To not waste compile time and object file size, there is an extern keyword which makes the compiler not compile a template function. You should use this if and only if you know it is used in the same binary somewhere else.

Changing source2.cpp to:

// source2.cpp

#include "header.h"
extern template void ReallyBigFunction<int>();
void something2()
{
    ReallyBigFunction<int>();
}

Will result in the following object files:

source1.o
    void something1()
    void ReallyBigFunction<int>() // compiled just one time

source2.o
    void something2()
    // No ReallyBigFunction<int> here because of the extern

When both of these will be linked together, the second object file will just use the symbol from the first object file. No need for discard and no wasted compile time and object file size.

This should only be used within a project, like in times when you use a template like vector<int> multiple times, you should use extern in all but one source file.

This also applies to classes and function as one, and even template member functions.

Understanding extern templates in c++

Because the constructor and destructor are not defined anywhere, you only declare them. You correctly explicitly instantiate the template in the Foo.cpp file, but the functions are still not defined.

If you are only going to use Foo<int>, then you can define the constructor and destructor in Foo.cpp,

template<typename T>
Foo<T>::Foo(){...} // or = default

and

template<typename T>
Foo<T>::~Foo(){...}

and due to the fact that you explicitly instantiate the template in the Foo.cpp the linker will find the definition. Otherwise, you need to provide the definition in the header file.

Using `extern template` with third-party header-only library

Unfortunately, there’s no way to avoid these instantiations. An explicit instantiation declaration of a class template doesn’t prevent (implicit) instantiation of that template; it merely prevents instantiating its non-inline, non-template member functions (which is often none of them!) because some other translation unit will supply the actual function symbols and object code.

It’s not that seeing the template definition causes instantiation (which specialization would be instantiated?). The reason is that code which requires that the class be complete still needs to know its layout and member function declarations (for overload resolution), and in general there’s no way to know those short of instantiating the class:

template<class T> struct A : T::B {
  typename std::conditional<sizeof(T)<8,long,short>::type first;
  typename T::X second;
  A() noexcept(T::y)=default;  // perhaps deleted
  using T::B::foo;
  void foo(T);
  // and so on…
};

void f() {A<C> a; a.foo(a.first);}  // …maybe?

This “transparency” extends to several other kinds of templated entities as well: if compilation needs the definition of a template, the symbols generated for the linker are irrelevant.

The good news is that C++20’s modules should help with situations like this: an explicit instantiation definition in a module interface will cause a typical implementation to cache the instantiated class definition with the rest of the module interface data, avoiding both parsing and instantiation in importing translation units. Modules also remove the implicit inline on class members and friends defined in the class (which hasn’t meant much in a long time anyway), increasing the number (or, put differently, the convenience) of functions for which explicit instantiation declarations do prevent implicit instantiation.

Using `extern template` to prevent implicit instantiation of a template class

Why does my code snippet link? What is actually happening here?

Well, there's nothing to link. For one has to consider the effects of the explicit instantiation. From n3337:

[temp.explicit] (emphasis mine)

10 Except for inline functions and class template
specializations, explicit instantiation declarations have the effect
of suppressing the implicit instantiation of the entity to which they
refer. [ Note: The intent is that an inline function that is the
subject of an explicit instantiation declaration will still be
implicitly instantiated when odr-used ([basic.def.odr]) so that the
body can be considered for inlining, but that no out-of-line copy of
the inline function would be generated in the translation unit. — end
note ]

So the implicit instantiation of the class template specialization X<int>, is not suppressed. It's also an aggregate, so its initialization occurs inline, and we get nothing to link against. However, if it had any members, those would be suppressed under paragraph 8:

An explicit instantiation that names a class template specialization
is also an explicit instantiation of the same kind (declaration or
definition) of each of its members (not including members inherited
from base classes) that has not been previously explicitly specialized
in the translation unit containing the explicit instantiation, except
as described below.

So if you had instead of an aggregate something akin to this:

template <typename>
struct X {
    X();
};

template <typename T>
X<T>::X() {}     

extern template struct X<int>;

int main()
{
    X<int>{};
}

That would fail as you expect, since it ODR uses a constructor whose definition is never instantiated. The declaration is instantiated, because the enclosing specialization is instantiated, as mentioned above. But we never get any definition, under the suppressing effect of the explicit instantiation declaration.

extern template declaration with alias payload

An explicit instantiation declaration of a specialization of a class template doesn’t prevent instantiating that specialization. After all, you still need to be able to refer to members of the resulting class, which means knowing everything about its layout. What it does do is prevent instantiation of its (non-inline) member functions, so that no code need be emitted for them.

Additionally, an explicit instantiation declaration must precede any source of implicit instantiation to be effective. If you want to name that specialization’s template argument just once, introduce a type alias for it before the explicit instantiation declaration and the definition of Derived.

How does extern template actually generate code?

It is actually quite straightforward. Here is a header file that defines template class foo<T>:

foo.hpp

#ifndef FOO_HPP
#define FOO_HPP

template<typename T>
struct foo
{
    T const & get() const {
        return _t;
    }
    void set(T const & t) {
        _t = t;
    }

private:
    T _t;
}

#endif

Here is a source file that explicitly instantiates the definition of class foo<int>:

foo_int.cpp

#include "foo.hpp"

// An explicit instantiation definition
template struct foo<int>;

When we compile foo_int.cpp to foo_int.o, that object file will define all
the symbols that accrue from instantiating foo<int>:

$ g++ -Wall -Wextra -pedantic -c foo_int.cpp

$ nm --defined-only foo_int.o
0000000000000000 W _ZN3fooIiE3setERKi
0000000000000000 W _ZNK3fooIiE3getEv

which with de-mangling is:

$ nm -C --defined-only foo_int.o
0000000000000000 W foo<int>::set(int const&)
0000000000000000 W foo<int>::get() const

(Note that symbols are defined weakly
- W - just as they would be as a result of implicit instantiation. Note too that
the compiler saw no need to generate any definitions at all for any of the implicitly defaulted
special member functions.)

Here is a header file that declares an explicit instantiation of foo<int> such
as we just defined in foo_int.o:

foo_int.hpp

#ifndef FOO_INT_HPP
#define FOO_INT_HPP

#include "foo.hpp"

// An explicit instantiation declaration
extern template struct foo<int>;

#endif

Here is a source file that references the explicit instantiation of foo<int>
that we declared in foo_int.hpp:

make_foo_int.cpp

#include "make_foo_int.hpp"

foo<int> make_foo_int(int i)
{
    foo<int> fi;
    fi.set(i);
    return fi;
}

and the associated header file:

make_foo_int.hpp

#ifndef MAKE_FOO_INT_HPP
#define MAKE_FOO_INT_HPP
#include "foo_int.hpp"

foo<int> make_foo_int(int i = 0);

#endif

Note that make_foo_int.cpp is a translation unit of the sort that puzzles
you. It #includes make_foo_int.hpp, which #includes foo_int.hpp,
which #includes foo.hpp - the template definition. And then it "does stuff" with
foo<int>.

When we compile make_foo_int.cpp to make_foo_int.o, that object file
will only contain undefined references to any symbols that accrue from
the instantiation of foo<int>:

$ g++ -Wall -Wextra -pedantic -c make_foo_int.cpp

$ nm -C --defined-only make_foo_int.o
0000000000000000 T make_foo_int(int)

$ nm -C --undefined-only make_foo_int.o
                 U _GLOBAL_OFFSET_TABLE_
                 U __stack_chk_fail
                 U foo<int>::set(int const&)

Does the compiler simply not generate any code involving Foo<int> when compiling this translation unit?

The compiler generates a call to the undefined external function foo<int>::set(int const&). Here is
the assembly:

make_foo_int.s

    .file   "make_foo_int.cpp"
    .text
    .globl  _Z12make_foo_inti
    .type   _Z12make_foo_inti, @function
_Z12make_foo_inti:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    leaq    -20(%rbp), %rdx
    leaq    -12(%rbp), %rax
    movq    %rdx, %rsi
    movq    %rax, %rdi
    call    _ZN3fooIiE3setERKi@PLT
    movl    -12(%rbp), %eax
    movq    -8(%rbp), %rcx
    xorq    %fs:40, %rcx
    je  .L3
    call    __stack_chk_fail@PLT
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z12make_foo_inti, .-_Z12make_foo_inti
    .ident  "GCC: (Ubuntu 8.2.0-7ubuntu1) 8.2.0"
    .section    .note.GNU-stack,"",@progbits

int which:

call    _ZN3fooIiE3setERKi@PLT

is the call to foo<int>::set(int const&) through the Procedure Lookup Table,
just as it might generate a call to any undefined external function that is to be
resolved at linktime.

Now here is a source file for a program that calls make_foo_int and also foo<int>::get:

main.cpp

#include "make_foo_int.hpp"
#include <iostream>

int main()
{
    std::cout << make_foo_int(42).get() << std::endl;
    return 0;
}

If we compile main.cpp, the object file will also contain only undefined references
to symbols that accrue from the instantiation of foo<int>:

$ g++ -Wall -Wextra -pedantic -c main.cpp

$ nm -C --defined-only main.o | grep foo; echo Done
Done

$ nm -C --undefined-only main.o | grep foo; echo Done
                 U make_foo_int(int)
                 U foo<int>::get() const
Done

If we attempt to link a program using just main.o and make_foo_int.o:

$ g++ -o prog main.o make_foo_int.o
/usr/bin/ld: main.o: in function `main':
main.cpp:(.text+0x2c): undefined reference to `foo<int>::get() const'
/usr/bin/ld: make_foo_int.o: in function `make_foo_int(int)':
make_foo_int.cpp:(.text+0x29): undefined reference to `foo<int>::set(int const&)'
collect2: error: ld returned 1 exit status

it fails with undefined references to foo<int>::get() and foo<int>::set(int const&).

If we relink with the addition of the necessary foo_int.o and ask the linker to
report the references and definitions of those symbols:

$ g++ -o prog main.o make_foo_int.o foo_int.o -Wl,-trace-symbol=_ZN3fooIiE3setERKi,-trace-symbol=_ZNK3fooIiE3getEv
/usr/bin/ld: main.o: reference to _ZNK3fooIiE3getEv
/usr/bin/ld: make_foo_int.o: reference to _ZN3fooIiE3setERKi
/usr/bin/ld: foo_int.o: definition of _ZNK3fooIiE3getEv
/usr/bin/ld: foo_int.o: definition of _ZN3fooIiE3setERKi

we succeed, and see that the linker finds a reference to foo<int>::get() in main.o,
a reference to foo<int>::set(int const&) in make_foo_int.o and
the definitions of both symbols in foo_int.o. foo<int> was instantiated
only once, in foo_int.o.

Later...

Per your comments, you still don't see how the function make_foo_int(int) can be
compiled without the compiler instantiating foo<int> if only for the purpose
of computing the size that the automatic object foo<int> fi that is defined
in the function will occupy on the stack.

The better to address that, I first need draw out a point that was probably insufficiently
clear before when I noted that the explicit instantiation:

template struct foo<int>;

in foo_int.cpp generates only definitions of the member functions that are defined
by the template, as shown by:

$ nm -C --defined-only foo_int.o
0000000000000000 W foo<int>::set(int const&)
0000000000000000 W foo<int>::get() const

and does not generate definitions of the implicitly defaulted special members of
the class - constructors, etc.

So, a problem very like yours is: How can the function make_foo_int(int) be compiled without the compiler instantiating at least the default constructor that is
executed by:

foo<int> fi;

? The answer is: it does instantiate that constructor, inline, just as it usually would.
(At least, it will do if the constructor is not a no-op). But it only does so because
we did not define that constructor in the template that we explicitly instantiated
in foo_int.cpp.

Let's change the template slightly too:

foo.hpp (2)

#ifndef FOO_HPP
#define FOO_HPP

template<typename T>
struct foo
{
    T const & get() const {
        return _t;
    }
    void set(T const & t) {
        _t = t;
    }

private:
    T _t = 257;  // <- Default initializer
};

#endif

Then recompile make_foo_int.cpp, saving the assembly:

$ g++ -Wall -Wextra -pedantic -c make_foo_int.cpp -save-temps

which now makes it obvious that the default constructor foo<int>()
is inlined, whereas foo<int>::set(T const &) is called externally:

make_foo_int.s (2)

    .file   "make_foo_int.cpp"
    .text
    .globl  _Z12make_foo_inti
    .type   _Z12make_foo_inti, @function
_Z12make_foo_inti:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movl    $257, -12(%rbp) ; <- Default initializer
    leaq    -20(%rbp), %rdx
    leaq    -12(%rbp), %rax
    movq    %rdx, %rsi
    movq    %rax, %rdi
    call    _ZN3fooIiE3setERKi@PLT  ; <- External call
    movl    -12(%rbp), %eax
    movq    -8(%rbp), %rcx
    xorq    %fs:40, %rcx
    je  .L3
    call    __stack_chk_fail@PLT
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z12make_foo_inti, .-_Z12make_foo_inti
    .ident  "GCC: (Ubuntu 8.2.0-7ubuntu1) 8.2.0"
    .section    .note.GNU-stack,"",@progbits

The compiler is able to inline, as usual, any special member functions of
foo<int> that we have not defined in the template because the template
definition must be available to it whenever it sees:

extern template struct foo<int>;

as we can verify by changing foo_int.hpp to:

foo_int.hpp (2)

#ifndef FOO_INT_HPP
#define FOO_INT_HPP

//#include "foo.hpp"  <- Hide the template definition

template <typename T> struct foo;

// An explicit instantiation declaration
extern template struct foo<int>;

#endif

and attempting:

$ g++ -Wall -Wextra -pedantic -c make_foo_int.cpp -save-temps
In file included from make_foo_int.hpp:3,
                 from make_foo_int.cpp:1:
foo_int.hpp:9:24: error: explicit instantiation of ‘struct foo<int>’ before definition of template
 extern template struct foo<int>;
                        ^~~~~~~~

So here it is quite true to say that the compiler, as you surmised, is "at least partially instantiating foo<int>"
in make_foo_int.o. But it only instantiating a part - the default constructor - that
is not provided as an external reference by:

 extern template struct foo<int>;

and that default constructor is not so provided because we did not define it in
template struct foo<T>.

If we do define constructors in the template, say:

foo.hpp (3)

#ifndef FOO_HPP
#define FOO_HPP

template<typename T>
struct foo
{
    foo()
    : _t{257}{}
    foo(foo const & other)
    : _t{other._t}{}
    T const & get() const {
        return _t;
    }
    void set(T const & t) {
        _t = t;
    }

private:
    T _t;
};

#endif

then we will find them defined in foo_int.o:

$ g++ -Wall -Wextra -pedantic -c foo_int.cpp
$ nm -C foo_int.o
0000000000000000 W foo<int>::set(int const&)
0000000000000000 W foo<int>::foo(foo<int> const&)
0000000000000000 W foo<int>::foo()
0000000000000000 W foo<int>::foo(foo<int> const&)
0000000000000000 W foo<int>::foo()
0000000000000000 n foo<int>::foo(foo<int> const&)
0000000000000000 n foo<int>::foo()
0000000000000000 W foo<int>::get() const

(It looks as if they are multiply defined, but this an illusion and a distraction!¹). And if we
recompile make_foo_int.cpp with foo.hpp 3 and our original foo_int.hpp:
and inspect the new assembly:

$ g++ -Wall -Wextra -pedantic -O0 -c make_foo_int.cpp -save-temps
$ mv make_foo_int.s make_foo_int.s.before   # Save that for later
$ cat make_foo_int.s.before
    .file   "make_foo_int.cpp"
    .text
    .globl  _Z12make_foo_inti
    .type   _Z12make_foo_inti, @function
_Z12make_foo_inti:
.LFB4:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movq    %rdi, -24(%rbp)
    movl    %esi, -28(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movq    -24(%rbp), %rax
    movq    %rax, %rdi
    call    _ZN3fooIiEC1Ev@PLT      ; <- External ctor call
    leaq    -28(%rbp), %rdx
    movq    -24(%rbp), %rax
    movq    %rdx, %rsi
    movq    %rax, %rdi
    call    _ZN3fooIiE3setERKi@PLT  ; <- External `set` call
    nop
    movq    -24(%rbp), %rax
    movq    -8(%rbp), %rcx
    xorq    %fs:40, %rcx
    je  .L3
    call    __stack_chk_fail@PLT
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE4:
    .size   _Z12make_foo_inti, .-_Z12make_foo_inti
    .ident  "GCC: (Ubuntu 8.2.0-7ubuntu1) 8.2.0"
    .section    .note.GNU-stack,"",@progbits

we see now that the default constructor _ZN3fooIiEC1E as well
as the set member function _ZN3fooIiE3setERKi are called externally.

Relinking our original program, it runs:

$ g++ -Wall -Wextra -pedantic -O0 -o prog main.cpp make_foo_int.cpp foo_int.cpp
$ ./prog
42

Which eventually preps us for the question: How can the compiler know the size
of the object foo<int> fi in order to compile the function make_foo_int, without
instantiating foo<int>?

As make_foo_int.s.before makes clear, the compiler does not need to compute the size
of any such object because, in the code that it generates, no such object exists. C++
classes and instances of classes are unknown in assembly and object code. In object
code, there are only functions and objects of fundamental integral or floating point
types whose sizes are all known from the start. A function is executed, with 0 or more arguments; it possibly acts upon objects
of those fundamental types residing in the stack, the heap or static storage, and
it (typically) returns control to the antecedent context. The C++ statement:

foo<int> fi;

in the body of make_foo_int does not literally compile to the placing of the object
fi on the stack. It compiles to the execution of a function that is the default constructor
of foo<int> - possibly inlined, possibly called externally; it doesn't matter - which places
an integer = 257 on its stack and finishes leaving that integer still on the stack for
its caller. The caller, as ever, does not need to know the callee's net consumption of stack.
We could redefine template struct foo<T> in a (rather insane) way that makes foo<int> a 1000
times bigger:

foo.hpp (4)

#ifndef FOO_HPP
#define FOO_HPP

template<typename T>
struct foo
{
    foo() {
        for (unsigned i = 0; i < 1000; ++i) {
            _t[i] = 257;
        }
    }
    foo(foo const & other) {
        for (unsigned i = 0; i < 1000; ++i) {
            _t[i] = other._t[i];
        }
    }
    T const & get() const {
        return _t[999];
    }
    void set(T const & t) {
        _t[0] = t;
    }

private:
    T _t[1000];
};

#endif

then recompile make_foo_int.cpp:

$ g++ -Wall -Wextra -pedantic -O0 -c make_foo_int.cpp -save-temps
$ mv make_foo_int.s make_foo_int.s.after

and it makes no difference at all to the assembly of make_foo_int.o:

$ diff make_foo_int.s.before make_foo_int.s.after; echo Done
Done

$ cat make_foo_int.s.after
    .file   "make_foo_int.cpp"
    .text
    .globl  _Z12make_foo_inti
    .type   _Z12make_foo_inti, @function
_Z12make_foo_inti:
.LFB4:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movq    %rdi, -24(%rbp)
    movl    %esi, -28(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movq    -24(%rbp), %rax
    movq    %rax, %rdi
    call    _ZN3fooIiEC1Ev@PLT
    leaq    -28(%rbp), %rdx
    movq    -24(%rbp), %rax
    movq    %rdx, %rsi
    movq    %rax, %rdi
    call    _ZN3fooIiE3setERKi@PLT
    nop
    movq    -24(%rbp), %rax
    movq    -8(%rbp), %rcx
    xorq    %fs:40, %rcx
    je  .L3
    call    __stack_chk_fail@PLT
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE4:
    .size   _Z12make_foo_inti, .-_Z12make_foo_inti
    .ident  "GCC: (Ubuntu 8.2.0-7ubuntu1) 8.2.0"
    .section    .note.GNU-stack,"",@progbits

although it makes a difference to our program:

$ g++ -Wall -Wextra -pedantic -O0 -o prog main.cpp make_foo_int.cpp foo_int.cpp
$ ./prog
257

I readily recant my opening comment that "It is actually quite straightforward" :)

[1] The output:

$ nm -C foo_int.o
0000000000000000 W foo<int>::set(int const&)
0000000000000000 W foo<int>::foo(foo<int> const&)
0000000000000000 W foo<int>::foo()
0000000000000000 W foo<int>::foo(foo<int> const&)
0000000000000000 W foo<int>::foo()
0000000000000000 n foo<int>::foo(foo<int> const&)
0000000000000000 n foo<int>::foo()
0000000000000000 W foo<int>::get() const

seems to say that each of the constructors has two weakly global definitions
and is additionally defined as a comdat symbol! But if we disable demangling
this appearance disappears:

$ nm foo_int.o
0000000000000000 W _ZN3fooIiE3setERKi
0000000000000000 W _ZN3fooIiEC1ERKS0_
0000000000000000 W _ZN3fooIiEC1Ev
0000000000000000 W _ZN3fooIiEC2ERKS0_
0000000000000000 W _ZN3fooIiEC2Ev
0000000000000000 n _ZN3fooIiEC5ERKS0_
0000000000000000 n _ZN3fooIiEC5Ev
0000000000000000 W _ZNK3fooIiE3getEv

and we see that all the symbols are in reality distinct. The ABI mangling
maps all three of:

_ZN3fooIiEC1ERKS0_
_ZN3fooIiEC2ERKS0_
_ZN3fooIiEC5ERKS0_

to foo<int>::foo(foo<int> const&), and similarly all of:

_ZN3fooIiEC1Ev
_ZN3fooIiEC2Ev
_ZN3fooIiEC5Ev

to foo<int>::foo(). In the GCC recipe for compiling these constructors,
the symbol variants containing C1 and C2 are symbols that in fact are
equivalent but are logically distinguished in the ABI spec, and the
variant with C5 simply names a section-group in which the compiler
places the function-section in which the constructor is defined.

Why does extern template instantiation not work on move-only types?

Explicit instantiation definition (aka template class ...) will instantiate all member functions (that are not templated themselves).

Among other things, it will try to instantiate the copy constructor for the vector (and other functions requiring copyability), and will fail at it for obvious reasons.

It could be prevented with requires, but std::vector doesn't use it. Interestingly, Clang ignores requires in this case, so I reported a bug.

Using Extern Template (C++11)