How Does C++ Link Template Instances

How does the linker handle identical template instantiations across translation units?

C++ requires that an inline function definition
be present in a translation unit that references the function. Template member
functions are implicitly inline, but also by default are instantiated with external
linkage. Hence the duplication of definitions that will be visible to the linker when
the same template is instantiated with the same template arguments in different
translation units. How the linker copes with this duplication is your question.

Your C++ compiler is subject to the C++ Standard, but your linker is not subject
to any codified standard as to how it shall link C++: it is a law unto itself,
rooted in computing history and indifferent to the source language of the object
code it links. Your compiler has to work with what a target linker
can and will do so that you can successfully link your programs and see them do
what you expect. So I'll show you how the GCC C++ compiler interworks with
the GNU linker to handle identical template instantiations in different translation units.

This demonstration exploits the fact that while the C++ Standard requires -
by the One Definition Rule
- that the instantiations in different translation units of the same template with
the same template arguments shall have the same definition, the compiler -
of course - cannot enforce any requirement like that on relationships between different
translation units. It has to trust us.

So we'll instantiate the same template with the same parameters in different
translation units, but we'll cheat by injecting a macro-controlled difference into
the implementations in different translation units that will subsequently show
us which definition the linker picks.

If you suspect this cheat invalidates the demonstration, remember: the compiler
cannot know whether the ODR is ever honoured across different translation units,
so it cannot behave differently on that account, and there's no such thing
as "cheating" the linker. Anyhow, the demo will demonstrate that it is valid.

First we have our cheat template header:

thing.hpp

#ifndef THING_HPP
#define THING_HPP
#ifndef ID
#error ID undefined
#endif

template<typename T>
struct thing
{
    T id() const {
        return T{ID};
    }
};

#endif

The value of the macro ID is the tracer value we can inject.

Next a source file:

foo.cpp

#define ID 0xf00
#include "thing.hpp"

unsigned foo()
{
    thing<unsigned> t;
    return t.id();
}

It defines function foo, in which thing<unsigned> is
instantiated to define t, and t.id() is returned. By being a function with
external linkage that instantiates thing<unsigned>, foo serves the purposes
of:-

obliging the compiler to do that instantiating at all
exposing the instantiation in linkage so we can then probe what the
linker does with it.

Another source file:

boo.cpp

#define ID 0xb00
#include "thing.hpp"

unsigned boo()
{
    thing<unsigned> t;
    return t.id();
}

which is just like foo.cpp except that it defines boo in place of foo and
sets ID = 0xb00.

And lastly a program source:

main.cpp

#include <iostream>

extern unsigned foo();
extern unsigned boo();

int main()
{
    std::cout << std::hex 
    << '\n' << foo()
    << '\n' << boo()
    << std::endl;
    return 0;
}

This program will print, as hex, the return value of foo() - which our cheat should make
= f00 - then the return value of boo() - which our cheat should make = b00.

Now we'll compile foo.cpp, and we'll do it with -save-temps because we want
a look at the assembly:

g++ -c -save-temps foo.cpp

This writes the assembly in foo.s and the portion of interest there is
the definition of thing<unsigned int>::id() const (mangled = _ZNK5thingIjE2idEv):

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $3840, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

Three of the directives at the top are significant:

.section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat

This one puts the function definition in a linkage section of its own called
.text._ZNK5thingIjE2idEv that will be output, if it's needed, merged into the
.text (i.e. code) section of program in which the object file is linked. A
linkage section like that, i.e. .text.<function_name> is called a function-section.
It's a code section that contains only the definition of function <function_name>.

The directive:

.weak   _ZNK5thingIjE2idEv

is crucial. It classifies thing<unsigned int>::id() const as a weak symbol.
The GNU linker recognises strong symbols and weak symbols. For a strong symbol, the
linker will accept only one definition in the linkage. If there are more, it will give a multiple
-definition error. But for a weak symbol, it will tolerate any number of definitions,
and pick one. If a weakly defined symbol also has (just one) strong definition in the linkage then the
strong definition will be picked. If a symbol has multiple weak definitions and no strong definition,
then the linker can pick any one of the weak definitions, arbitrarily.

The directive:

.type   _ZNK5thingIjE2idEv, @function

classifies thing<unsigned int>::id() as referring to a function - not data.

Then in the body of the definition, the code is assembled at the address
labelled by the weak global symbol _ZNK5thingIjE2idEv, the same one locally
labelled .LFB2. The code returns 3840 ( = 0xf00).

Next we'll compile boo.cpp the same way:

g++ -c -save-temps boo.cpp

and look again at how thing<unsigned int>::id() is defined in boo.s

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $2816, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

It's identical, except for our cheat: this definition returns 2816 ( = 0xb00).

While we're here, let's note something that might or might not go without saying:
Once we're in assembly (or object code), classes have evaporated. Here,
we're down to: -

data
code
symbols, which can label data or label code.

So nothing here specifically represents the instantiation of thing<T> for
T = unsigned. All that's left of thing<unsigned> in this instance is
the definition of _ZNK5thingIjE2idEv a.k.a thing<unsigned int>::id() const.

So now we know what the compiler does about instantiating thing<unsigned>
in a given translation unit. If it is obliged to instantiate a thing<unsigned>
member function, then it assembles the definition of the instantiated member
function at a weakly global symbol that identifies the member function, and it
puts this definition into its own function-section.

Now let's see what the linker does.

First we'll compile the main source file.

g++ -c main.cpp

Then link all the object files, requesting a diagnostic trace on _ZNK5thingIjE2idEv,
and a linkage map file:

g++ -o prog main.o foo.o boo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
foo.o: definition of _ZNK5thingIjE2idEv
boo.o: reference to _ZNK5thingIjE2idEv

So the linker tells us that the program gets the definition of _ZNK5thingIjE2idEv from
foo.o and calls it in boo.o.

Running the program shows it's telling the truth:

./prog

f00
f00

Both foo() and boo() are returning the value of thing<unsigned>().id()
as instantiated in foo.cpp.

What has become of the other definition of thing<unsigned int>::id() const
in boo.o? The map file shows us:

prog.map

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf boo.o
 ...
 ...

The linker chucked away the function-section in boo.o that
contained the other definition.

Let's now link prog again, but this time with foo.o and boo.o in the
reverse order:

$ g++ -o prog main.o boo.o foo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
boo.o: definition of _ZNK5thingIjE2idEv
foo.o: reference to _ZNK5thingIjE2idEv

This time, the program gets the definition of _ZNK5thingIjE2idEv from boo.o and
calls it in foo.o. The program confirms that:

$ ./prog

b00
b00

And the map file shows:

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf foo.o
 ...
 ...

that the linker chucked away the function-section .text._ZNK5thingIjE2idEv
from foo.o.

That completes the picture.

The compiler emits, in each translation unit, a weak definition of
each instantiated template member in its own function section. The linker
then just picks the first of those weak definitions that it encounters
in the linkage sequence when it needs to resolve a reference to the weak
symbol. Because each of the weak symbols addresses a definition, any
one one of them - in particular, the first one - can be used to resolve all references
to the symbol in the linkage, and the rest of the weak definitions are
expendable. The surplus weak definitions must be ignored, because
the linker can only link one definition of a given symbol. And the surplus
weak definitions can be discarded by the linker, with no collateral
damage to the program, because the compiler placed each one in a linkage section all by itself.

By picking the first weak definition it sees, the linker is effectively
picking at random, because the order in which object files are linked is arbitrary.
But this is fine, as long as we obey the ODR accross multiple translation units,
because it we do, then all of the weak definitions are indeed identical. The usual practice of #include-ing a class template everywhere from a header file (and not macro-injecting any local edits when we do so) is a fairly robust way of obeying the rule.

Linking template class functions

The short answer is no, because a class template is not a class unless you are instantiating it with a certain template parameter(s). So there is no Foo::AddMystery() to implement.

Think of it this way: member function have knowledge of the dependent type, because they have an implicit first parameter, which is a pointer to an object of their class. So

Foo<int> f;
f.AddMistery();

is equivalent to

Foo<int> f;
Foo<int>::AddMistery(&f);

If you have instantiations of the template for certain types, then you could implement, say, Foo<int>::AddMistery in a .cpp file, but this has nothing to do with the function not requiring the template argument in its body.

Template issue causes linker error (C++)

You have to have your template definitions available at the calling site. That means no .cpp files.

The reason is templates cannot be compiled. Think of functions as cookies, and the compiler is an oven.

Templates are only a cookie cutter, because they don't know what type of cookie they are. It only tells the compiler how to make the function when given a type, but in itself, it can't be used because there is no concrete type being operated on. You can't cook a cookie cutter. Only when you have the tasty cookie dough ready (i.e., given the compiler the dough [type])) can you cut the cookie and cook it.

Likewise, only when you actually use the template with a certain type can the compiler generate the actual function, and compile it. It can't do this, however, if the template definition is missing. You have to move it into the header file, so the caller of the function can make the cookie.

Implementing specific template instances in source with inherited template parameters

If you want to stick to templates, you can implement this 'mirrored inheritance' as follows: make B<B_data> inherit from A<A_data> instead of A<B_data> by declaring in B_data (or other possible template argument types) an alias base to a whatever type you want to consider base for your purposes (it's ambiguous in general, since there may be many bases of a class) and making B<C> inherit from A<C::base>. Then, for B<C>, since d is of type C::base*, it can still hold a pointer to C which is derived from C::base (assuming C::base is set adequately and is an actual base). But in this case, we need to provide a custom default constructor in B so that it constructs C object for d, not C::base which would happen if default constructor for A<C::base> would be called by implicitly defaulted default constructor of B<C> (note that there's no need to provide destructor since delete d from A<C::base>'s destructor will work correctly if C::base has virtual destructor). Then, since inside B<C> we know how we constructed the object d points to, we can safely assume we know its dynamic type and access it through static_cast-ed pointer (for safety you could e.g. add dynamic_cast-check inside assert for Debug-builds check).

Example implementation (see also comments, in particular for some changes of other aspects of your code):

// A.hpp

class A_data {
public:
    int foo = 1;

    virtual ~A_data() = default;
};

template <class C = A_data> class A {
public:
    int foo() const;

    A() { d = new C; }
    // 'virtual' is not strictly needed in our particular case since we don't delete B
    // through A*, but still it is safer and best practice with it than without
    virtual ~A() { delete d; }

    C* d;
};

class B_data : public A_data {
public:
    using base = A_data;
    int bar = 2;

    // virtual ~B_data() = default; implicitly defaulted and virtual
};

template <class C = B_data> class B : public A<typename C::base> {
public:
    B() { this->d = new C; } // 'this' needed to trigger dependent name lookup
    int bar() const;
};

// A.cpp

#include "A.hpp"

template <> int A<A_data>::foo() const { return d->foo; }
template <> int B<B_data>::bar() const { return static_cast<B_data*>(d)->bar; }

// main.cpp

#include <iostream>

#include "A.hpp"

int main() {
    B b;
    std::cout << b.bar();
    std::cout << b.foo();
}

Note also that if you want to use different template arguments C, you may want to avoid code duplication in specialization definitions in A.cpp. There, you can define your templated functions usually (not as specializations), but to force instantiations use explicit template instantiations. E.g. suppose there's B_data_2 class defined like this:

// in A.hpp
class B_data_2 : public A_data {
public:
    using base = A_data;
    int bar = 3;
};

It also has bar public data member, but with different default initializer. Now we can add another specialization for B<B_data_2>::bar() to A.cpp, but it will be basically the same as B<B_data>::bar(), so code duplication arises:

// in A.cpp
template <> int A<A_data>::foo() const { return d->foo; }
template <> int B<B_data>::bar() const { return static_cast<B_data*>(d)->bar; }
template <> int B<B_data_2>::bar() const { return static_cast<B_data_2*>(d)->bar; }

Instead, if we replace specializations with general definitions followed by explicit instantiation definitions of methods, we have:

// in A.cpp
template<class C> int A<C>::foo() const { return d->foo; }
template int A<A_data>::foo() const;

template<class C> int B<C>::bar() const {
    return static_cast<C*>(this->d)->bar; // Again 'this' to enable dependent name lookup
}
template int B<B_data>::bar() const;
template int B<B_data_2>::bar() const;

This obviously scales better, but to avoid doing explicit instantiation for every member function of the same specialization, we can do even better - explicitly instantiate whole class specializations, which will, in particular, automatically instantiate all (non-templated) member functions of that specializations:

// in A.cpp
template<class C> int A<C>::foo() const { return d->foo; }

template<class C> int B<C>::bar() const {
    return static_cast<C*>(this->d)->bar; // Yet again 'this' to enable dependent name lookup
}

template class A<A_data>;
template class B<B_data>;
template class B<B_data_2>;

Now we can test and see that it compiles and works:

// main.cpp
#include <iostream>

#include "A.hpp"

int main() {
    B b;
    B<B_data_2> b_2;
    std::cout << b.bar() << b.foo() << std::endl; // 21
    std::cout << b_2.bar() << b_2.foo() << std::endl; // 31
}

How does C++ partial compilation with templates work?

This is a slightly more complicated question than what most people will realize.

In the general and simplest case, the template definition is present in the header, and it behaves as inline functions. The compiler will generate the code for those functions needed in each translation unit that needs them. Then the linker will resolve the duplicate symbols by removing all but one. Since the standard requires that they are exactly equivalent, the linker can pick any one from the list.

If the template need only work with a couple of types, you can move the definition to a single translation unit and explicitly instantiate the template for those types there. This would behave as a non-inline function in the general case.

Somewhere in between, if the template can be instantiated with any type but it is commonly instantiated with a few of them, the implementor of the template can use a mixed approach, where the template and the members are defined in the header, but explicit instantiations are also declared. Then in a single translation unit, those explicit instantiations can be done.

This approach can be used, for example, to minimize compile and link time when using std::string (which is really std::basic_string<char, std::char_traits<char>, std::allocator<char> >). The compiler can, in a single translation unit provide all of the functions for the common instantiation, but still provide the definition of the template functions in the header so that if you opt to use a different instantiation of the basic_string template it will still work for you. In all translation units that only use std::string, the compiler knows not to generate the code for all members as those will be available to the linker.

Does the linker usually optimize away duplicated code from different c++ template instances?

gold linker does exactly that.

Safe ICF: Pointer Safe and Unwinding Aware Identical Code Folding in Gold:

We have found that large C++ applications and shared libraries tend to have many functions whose code is identical with another function. As much as 10% of the code could theoretically be eliminated by merging such identical functions into a single copy. This optimization, Identical Code Folding (ICF), has been implemented in the gold linker. At link time, ICF detects functions with identical object code and merges them into a single copy.

Where are template functions instantiated?

In what object file is func<int> instantiated?

In every object file (aka translation unit) that invokes it or takes an address of it when the template definition is available.

Why will the One Definition Rule not be violated?

Because the standard says so in [basic.def.odr].13.

Also see https://en.cppreference.com/w/cpp/language/definition

There can be more than one definition in a program of each of the following: class type, enumeration type, inline function, inline variable (since C++17), templated entity (template or member of template, but not full template specialization), as long as all of the following is true...

For this basic application of templates, is there any benefit to explicitly instantiating func<int> separate from its use?

In this case you get no inlining but possibly smaller code. If you use link-time code generation, then inlining may still happen.

Why can templates only be implemented in the header file?

Caveat: It is not necessary to put the implementation in the header file, see the alternative solution at the end of this answer.

Anyway, the reason your code is failing is that, when instantiating a template, the compiler creates a new class with the given template argument. For example:

template<typename T>
struct Foo
{
    T bar;
    void doSomething(T param) {/* do stuff using T */}
};

// somewhere in a .cpp
Foo<int> f;

When reading this line, the compiler will create a new class (let's call it FooInt), which is equivalent to the following:

struct FooInt
{
    int bar;
    void doSomething(int param) {/* do stuff using int */}
}

Consequently, the compiler needs to have access to the implementation of the methods, to instantiate them with the template argument (in this case int). If these implementations were not in the header, they wouldn't be accessible, and therefore the compiler wouldn't be able to instantiate the template.

A common solution to this is to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header.

Foo.h

template <typename T>
struct Foo
{
    void doSomething(T param);
};

#include "Foo.tpp"

Foo.tpp

template <typename T>
void Foo<T>::doSomething(T param)
{
    //implementation
}

This way, implementation is still separated from declaration, but is accessible to the compiler.

Alternative solution

Another solution is to keep the implementation separated, and explicitly instantiate all the template instances you'll need:

Foo.h

// no implementation
template <typename T> struct Foo { ... };

Foo.cpp

// implementation of Foo's methods

// explicit instantiations
template class Foo<int>;
template class Foo<float>;
// You will only be able to use Foo with int or float

If my explanation isn't clear enough, you can have a look at the C++ Super-FAQ on this subject.

Linking templates with g++

If you are making template classes, it is not a good idea to put them in .cpp files and compile separately. The right way is to put them in .h files (both declaration and definition) and include them where you need them.

The reason is that templates won't actually be compiled unless their template arguments are defined.

(Intentionally avoiding mentioning the export keyword.)