What Is a "Translation Unit" in C++

What is a translation unit in C++?

From here: (wayback machine link)

According to standard C++ (wayback machine link) :
A translation unit is the basic unit
of compilation in C++. It consists of
the contents of a single source file,
plus the contents of any header files
directly or indirectly included by it,
minus those lines that were ignored
using conditional preprocessing
statements.

A single translation unit can be
compiled into an object file, library,
or executable program.

The notion of a translation unit is
most often mentioned in the contexts
of the One Definition Rule, and
templates.

Translation unit in C and C++

A translation unit is not "a header and a source file". It could include a thousand header files (and a thousand source files too).

A translation unit is simply what is commonly known as "a source file" or a ".cpp file" after being preprocessed. If the source file #includes other files the text of those files gets included in the translation unit by the preprocessor. There is no difference between C and C++ on this matter.

C: clarification on translation unit

Here's what the C standard has to say about that:

A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. [..] Previously translated translation
units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

(Source: C99 draft standard, 5.1.1.1 §1)

So in both of your cases you have two translation units. One of them comes from the compiler preprocessing main.c and everything that is included through #include directives—that is, sub.h and probably <stdio.h> and other headers. The second comes from the compiler doing the same thing with sub.c.

The difference from your first to your second example is that in the latter you are explicitly storing the "different translated translation units" as object files.

Notice that there is no rule associating one object file with any number of translation units. The GNU linker is one example of linker that is capable of joining two .o files together.

The standard, as far as I know, does not specify the extension of source files. Notwithstanding, in practical aspects you are free to #include a .c file into other, or placing your entire program in a .h file. With gcc you can use the option -x c to force a .h file to be treated as the starting point of a translation unit.

The distinction made here:

A source file together with all the headers and source files included via the preprocessing directive #include [...]

is because a header need not be a source file. Similarly, the contents of <...> in an #include directive need not be a valid file name. How exactly the compiler uses the named headers <...> and "..." is implementation-defined.

Example of a translation unit vs file scope

A translation unit is a source file along with all of its included headers that is compiled as a single unit.

In this example, myprogram.c along with the header stdio.h is one translation unit. The file friendsprogram.c is another translation unit.

Note that this doesn't change when you compile like this:

gcc myprogram.c friendsprogram.c -o out

Because this command line combines compiling and linking into a single step. A temporary object file is created for myprogram.c and another for friendsprogram.c, then those object files are linked to create the file "out".

C++ Translation Unit

A translation unit is basically the chunk of code that you give to a compiler to process. The compiler processes it and produces object code for the linker. The linker combines the object code from all of your translation units to form the executable. (Sometimes you'll see details that vary from this, such as not seeing a file for the object code when you have only one translation unit. The concept is still valid even though implementation details may vary.)

So typically, there is a one-to-one correspondence between .o (or .obj) files produced when compiling and translation units. Also typically, you get one .o file for each .cpp file. Hence, it's typically reasonable to consider each .cpp file to be its own translation unit. Until you do something unconventional.

When you use an #include directive, you tell the compiler to replace that one line with the entire contents of the included file. That is, the chunk of code given to the compiler includes the code from both the original file and the included file. If you include one .cpp file into another, the chunk of code given to the compiler will include the code from two .cpp files, breaking the equivalence between .cpp files and translation units. This is generally considered a Bad idea.

Let's look at an example. Suppose you had a file named ext.cpp that contained the following:

namespace
{
    void extFunction()
    {
        std::cout << "Called Unnamed Namespace's function.\n";
    }
}

Also suppose you had a file named main.cpp that contained the following:

#include <iostream>
#include "ext.cpp"


int main()
{
    extFunction();
    return 0;
}

If you were to compile main.cpp, one of the first things the compiler would do is preprocess main.cpp. This modifies the file's contents, changing what the compiler sees. After preprocessing, the chunk of code that the compiler will process would look like the following.

[lots of code from the library header named "iostream"]
namespace
{
    void extFunction()
    {
        std::cout << "Called Unnamed Namespace's function.\n";
    }
}


int main()
{
    extFunction();
    return 0;
}

At this point, there is no problem calling extFunction since the compiler sees the unnamed namespace in the chunk of code it is processing.

Another example for the requested information about using unnamed namespaces. Similar to the above, but different. Suppose you had a file named ext.cpp that contained the following:

#include <iostream>
namespace
{
    void extFunction()
    {
        std::cout << "Called Unnamed Namespace's function in EXT.\n";
    }
}

void extPublic()
{
    extFunction();
}

Let's also provide a header (ext.h) that will declare the function that has external linkage.

void extPublic();

Now move on to main.cpp:

#include <iostream>
#include "ext.h"  // <-- Including the header, not the source.

namespace
{
    void extFunction()
    {
        std::cout << "Called Unnamed Namespace's function in MAIN.\n";
    }
}
int main()
{
    extFunction();
    extPublic();
    return 0;
}

Look at that! There are two definitions for the function named extFunction! Won't the linker get confused? Not at all. Those functions are not seen outside their translation units, so there is no conflict. If you compile main.cpp, compile ext.cpp, and link main.o and ext.o into a single executable, you get the following output.

Called Unnamed Namespace's function in MAIN.

Called Unnamed Namespace's function in EXT.

One benefit of an unnamed namespace is that you don't have to worry about conflicting with names in another source file's unnamed namespace. (This becomes a much bigger benefit when your project grows to encompass hundreds of source files.)

Translation unit vs Compilation unit vs object file vs executable vs.... in C++

Translation unit is the same as compilaition unit (so your source and all the header files it includes)

Object file, in typical cases, is the result of the compilation unit being compiled.

Executable file is the result of linking the object file(s) of a project, together with the runtime library function.

Exactly what files are actually generated during compilation depends on the compiler, but most modern compilers will simply read the source file and headers, then produce the object file, which is passed to the linker directly if you only have one source file. This produces the executable file.

Older compilers would "preprocess" as a separate step, so you'd end up with all the compile-unit in a temporary file. Similarly, in the old days, instead of generating machine code in the object file, assembler code would be output by the compiler, and then assembled through the an assembler, and this would make the object file. Linking remains similar.

Note that this is just practice, there is nothing in the C or C++ standards about executable files or object files. It's up to the compiler implementation to solve those things in whatever fashion they like.

C++ compilation vs translation unit

First, translation and compilation units are the same thing. The word/phrase Translation unit is used more often than compilation unit. Which basically means your source file including all of its header files.

Second we(and by we i mean good C++ books) use the term function template or class template rather than using the terms "template function" and "template class".

From documentation

The text of the program is kept in units called source files in this International Standard. A source file
together with all the headers (17.6.1.2) and source files included (16.2) via the preprocessing directive
#include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives, is
called a translation unit

Also note in the same document they have used the term compilation unit. And if you read carefully the use of the word compilation unit in that document, you will see that they mean the same thing as translation unit.

Now to clear everything up(from above),

a compilation and a translation unit are the same thing.
a cpp file alone(without its headers) does not constitute a translation unit(or compilation unit since they mean the same thing). On the other hand a cpp file with all of its headers included does constitute a translation/compilation unit.

What Is a "Translation Unit" in C++