Addresses of Identical Function Template Instantiations Across Compilation Units

Addresses of identical function template instantiations across compilation units

This is covered under the one definition rule:

3.2 One definition rule [basic.def.odr]

Paragraph 5:

There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then

There is a whole list of criteria that follow that have to be-adhered to or its undefined behavior. In the above these do hold. Then ...

If the definitions of D satisfy all these requirements, then the program shall behave as if there were a single definition of D.

So technically you can have a copy of the function in each translation unit.

It looks like the wording in the last phrase though makes it a requirement that they all behave the same. This means taking the address of any of these objects should result in the same address.

How does the linker handle identical template instantiations across translation units?

C++ requires that an inline function definition
be present in a translation unit that references the function. Template member
functions are implicitly inline, but also by default are instantiated with external
linkage. Hence the duplication of definitions that will be visible to the linker when
the same template is instantiated with the same template arguments in different
translation units. How the linker copes with this duplication is your question.

Your C++ compiler is subject to the C++ Standard, but your linker is not subject
to any codified standard as to how it shall link C++: it is a law unto itself,
rooted in computing history and indifferent to the source language of the object
code it links. Your compiler has to work with what a target linker
can and will do so that you can successfully link your programs and see them do
what you expect. So I'll show you how the GCC C++ compiler interworks with
the GNU linker to handle identical template instantiations in different translation units.

This demonstration exploits the fact that while the C++ Standard requires -
by the One Definition Rule
- that the instantiations in different translation units of the same template with
the same template arguments shall have the same definition, the compiler -
of course - cannot enforce any requirement like that on relationships between different
translation units. It has to trust us.

So we'll instantiate the same template with the same parameters in different
translation units, but we'll cheat by injecting a macro-controlled difference into
the implementations in different translation units that will subsequently show
us which definition the linker picks.

If you suspect this cheat invalidates the demonstration, remember: the compiler
cannot know whether the ODR is ever honoured across different translation units,
so it cannot behave differently on that account, and there's no such thing
as "cheating" the linker. Anyhow, the demo will demonstrate that it is valid.

First we have our cheat template header:

thing.hpp

#ifndef THING_HPP
#define THING_HPP
#ifndef ID
#error ID undefined
#endif

template<typename T>
struct thing
{
    T id() const {
        return T{ID};
    }
};

#endif

The value of the macro ID is the tracer value we can inject.

Next a source file:

foo.cpp

#define ID 0xf00
#include "thing.hpp"

unsigned foo()
{
    thing<unsigned> t;
    return t.id();
}

It defines function foo, in which thing<unsigned> is
instantiated to define t, and t.id() is returned. By being a function with
external linkage that instantiates thing<unsigned>, foo serves the purposes
of:-

obliging the compiler to do that instantiating at all
exposing the instantiation in linkage so we can then probe what the
linker does with it.

Another source file:

boo.cpp

#define ID 0xb00
#include "thing.hpp"

unsigned boo()
{
    thing<unsigned> t;
    return t.id();
}

which is just like foo.cpp except that it defines boo in place of foo and
sets ID = 0xb00.

And lastly a program source:

main.cpp

#include <iostream>

extern unsigned foo();
extern unsigned boo();

int main()
{
    std::cout << std::hex 
    << '\n' << foo()
    << '\n' << boo()
    << std::endl;
    return 0;
}

This program will print, as hex, the return value of foo() - which our cheat should make
= f00 - then the return value of boo() - which our cheat should make = b00.

Now we'll compile foo.cpp, and we'll do it with -save-temps because we want
a look at the assembly:

g++ -c -save-temps foo.cpp

This writes the assembly in foo.s and the portion of interest there is
the definition of thing<unsigned int>::id() const (mangled = _ZNK5thingIjE2idEv):

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $3840, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

Three of the directives at the top are significant:

.section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat

This one puts the function definition in a linkage section of its own called
.text._ZNK5thingIjE2idEv that will be output, if it's needed, merged into the
.text (i.e. code) section of program in which the object file is linked. A
linkage section like that, i.e. .text.<function_name> is called a function-section.
It's a code section that contains only the definition of function <function_name>.

The directive:

.weak   _ZNK5thingIjE2idEv

is crucial. It classifies thing<unsigned int>::id() const as a weak symbol.
The GNU linker recognises strong symbols and weak symbols. For a strong symbol, the
linker will accept only one definition in the linkage. If there are more, it will give a multiple
-definition error. But for a weak symbol, it will tolerate any number of definitions,
and pick one. If a weakly defined symbol also has (just one) strong definition in the linkage then the
strong definition will be picked. If a symbol has multiple weak definitions and no strong definition,
then the linker can pick any one of the weak definitions, arbitrarily.

The directive:

.type   _ZNK5thingIjE2idEv, @function

classifies thing<unsigned int>::id() as referring to a function - not data.

Then in the body of the definition, the code is assembled at the address
labelled by the weak global symbol _ZNK5thingIjE2idEv, the same one locally
labelled .LFB2. The code returns 3840 ( = 0xf00).

Next we'll compile boo.cpp the same way:

g++ -c -save-temps boo.cpp

and look again at how thing<unsigned int>::id() is defined in boo.s

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $2816, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

It's identical, except for our cheat: this definition returns 2816 ( = 0xb00).

While we're here, let's note something that might or might not go without saying:
Once we're in assembly (or object code), classes have evaporated. Here,
we're down to: -

data
code
symbols, which can label data or label code.

So nothing here specifically represents the instantiation of thing<T> for
T = unsigned. All that's left of thing<unsigned> in this instance is
the definition of _ZNK5thingIjE2idEv a.k.a thing<unsigned int>::id() const.

So now we know what the compiler does about instantiating thing<unsigned>
in a given translation unit. If it is obliged to instantiate a thing<unsigned>
member function, then it assembles the definition of the instantiated member
function at a weakly global symbol that identifies the member function, and it
puts this definition into its own function-section.

Now let's see what the linker does.

First we'll compile the main source file.

g++ -c main.cpp

Then link all the object files, requesting a diagnostic trace on _ZNK5thingIjE2idEv,
and a linkage map file:

g++ -o prog main.o foo.o boo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
foo.o: definition of _ZNK5thingIjE2idEv
boo.o: reference to _ZNK5thingIjE2idEv

So the linker tells us that the program gets the definition of _ZNK5thingIjE2idEv from
foo.o and calls it in boo.o.

Running the program shows it's telling the truth:

./prog

f00
f00

Both foo() and boo() are returning the value of thing<unsigned>().id()
as instantiated in foo.cpp.

What has become of the other definition of thing<unsigned int>::id() const
in boo.o? The map file shows us:

prog.map

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf boo.o
 ...
 ...

The linker chucked away the function-section in boo.o that
contained the other definition.

Let's now link prog again, but this time with foo.o and boo.o in the
reverse order:

$ g++ -o prog main.o boo.o foo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
boo.o: definition of _ZNK5thingIjE2idEv
foo.o: reference to _ZNK5thingIjE2idEv

This time, the program gets the definition of _ZNK5thingIjE2idEv from boo.o and
calls it in foo.o. The program confirms that:

$ ./prog

b00
b00

And the map file shows:

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf foo.o
 ...
 ...

that the linker chucked away the function-section .text._ZNK5thingIjE2idEv
from foo.o.

That completes the picture.

The compiler emits, in each translation unit, a weak definition of
each instantiated template member in its own function section. The linker
then just picks the first of those weak definitions that it encounters
in the linkage sequence when it needs to resolve a reference to the weak
symbol. Because each of the weak symbols addresses a definition, any
one one of them - in particular, the first one - can be used to resolve all references
to the symbol in the linkage, and the rest of the weak definitions are
expendable. The surplus weak definitions must be ignored, because
the linker can only link one definition of a given symbol. And the surplus
weak definitions can be discarded by the linker, with no collateral
damage to the program, because the compiler placed each one in a linkage section all by itself.

By picking the first weak definition it sees, the linker is effectively
picking at random, because the order in which object files are linked is arbitrary.
But this is fine, as long as we obey the ODR accross multiple translation units,
because it we do, then all of the weak definitions are indeed identical. The usual practice of #include-ing a class template everywhere from a header file (and not macro-injecting any local edits when we do so) is a fairly robust way of obeying the rule.

Identity of function template instantiation in multiple translation units

I'm wondering whether the same applies to function template instantiations without the inline specifier.

The same applies to templates. See §3.2 One definition rule:

There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. ... If D is a template and is defined in more than one translation unit, then the preceding requirements shall apply both to names from the template’s enclosing scope used in the template definition (14.6.3), and also to dependent names at the point of instantiation (14.6.2). If the definitions of D satisfy all these requirements, then the program shall behave as if there were a single definition of D. If the definitions of D do not satisfy these requirements, then the behavior is undefined.

Separate compilation and template explicit instantiation

After another look at the standard, it seems to me that the only reasonable option is to use single explicit template class instantiation combined with explicit member function instantiations of a small number of "difficult" functions.

This (according to 14.7.2p9) will instantiate the class and all members which have been defined up to this point (which should include everything except "difficult" members). Then those selected members can be explicitly instantiated in other translation units containing their definitions.

That would make my example look like below (assuming that TA1.cpp contains easy functions and the only "difficult" function in TA is func2)

file TA1.cpp:

template <typename T>
void TA<T>::func1() { /* "simple" function definition */ }

template class TA<sometype>; /* expl. inst. of class */

file TA2.cpp:

template <typename T>
void TA<T>::func2() { /* "difficult" function definition */ }

template void TA<sometype>::func2(); /* expl. inst. of member */

This method requires us to write explicit instantiation definition for every "difficult" function, which is tedious but also makes us think twice whether we really want to keep it separately or not.

Disclaimer

When that can be useful? Not often. As other people here mentioned, it is not advised to split definitions of classes over several files. In my particular case "difficult" functions contain complicated mathematical operations on instances of non-trivial classes. C++ templates are not famous for fast compilation speeds, but in this case it was unbearable. These functions call each other which sends compiler on long and memory-consuming journey of expanding/inlining overloaded operators/templates/etc to optimize everything it sees, with pretty much zero improvement, but making compilation last for hours. This trick of isolating some functions in separate files speeds up compilation 20 times (and allows to parallelize it as well).

Is a template class compiled more than once in different compilation units in C++?

In general, yes, template classes are usually compiled every time they're encountered by the compiler.

Where are template functions instantiated?

In what object file is func<int> instantiated?

In every object file (aka translation unit) that invokes it or takes an address of it when the template definition is available.

Why will the One Definition Rule not be violated?

Because the standard says so in [basic.def.odr].13.

Also see https://en.cppreference.com/w/cpp/language/definition

There can be more than one definition in a program of each of the following: class type, enumeration type, inline function, inline variable (since C++17), templated entity (template or member of template, but not full template specialization), as long as all of the following is true...

For this basic application of templates, is there any benefit to explicitly instantiating func<int> separate from its use?

In this case you get no inlining but possibly smaller code. If you use link-time code generation, then inlining may still happen.

Explicit template instantiation - when is it used?

Directly copied from https://learn.microsoft.com/en-us/cpp/cpp/explicit-instantiation:

You can use explicit instantiation to create an instantiation of a templated class or function without actually using it in your code. Because this is useful when you are creating library (.lib) files that use templates for distribution, uninstantiated template definitions are not put into object (.obj) files.

(For instance, libstdc++ contains the explicit instantiation of std::basic_string<char,char_traits<char>,allocator<char> > (which is std::string) so every time you use functions of std::string, the same function code doesn't need to be copied to objects. The compiler only need to refer (link) those to libstdc++.)

Addresses of Identical Function Template Instantiations Across Compilation Units