What is a translation unit in C++?
From here: (wayback machine link)
According to standard C++ (wayback machine link) :
A translation unit is the basic unit
of compilation in C++. It consists of
the contents of a single source file,
plus the contents of any header files
directly or indirectly included by it,
minus those lines that were ignored
using conditional preprocessing
statements.A single translation unit can be
compiled into an object file, library,
or executable program.The notion of a translation unit is
most often mentioned in the contexts
of the One Definition Rule, and
templates.
Translation unit in C and C++
A translation unit is not "a header and a source file". It could include a thousand header files (and a thousand source files too).
A translation unit is simply what is commonly known as "a source file" or a ".cpp file" after being preprocessed. If the source file #include
s other files the text of those files gets included in the translation unit by the preprocessor. There is no difference between C and C++ on this matter.
C: clarification on translation unit
Here's what the C standard has to say about that:
A source file together with all the headers and source files included via the preprocessing directive
#include
is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. [..] Previously translated translation
units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.
(Source: C99 draft standard, 5.1.1.1 §1)
So in both of your cases you have two translation units. One of them comes from the compiler preprocessing main.c
and everything that is included through #include
directives—that is, sub.h
and probably <stdio.h>
and other headers. The second comes from the compiler doing the same thing with sub.c
.
The difference from your first to your second example is that in the latter you are explicitly storing the "different translated translation units" as object files.
Notice that there is no rule associating one object file with any number of translation units. The GNU linker is one example of linker that is capable of joining two .o
files together.
The standard, as far as I know, does not specify the extension of source files. Notwithstanding, in practical aspects you are free to #include
a .c
file into other, or placing your entire program in a .h
file. With gcc
you can use the option -x c
to force a .h
file to be treated as the starting point of a translation unit.
The distinction made here:
A source file together with all the headers and source files included via the preprocessing directive
#include
[...]
is because a header need not be a source file. Similarly, the contents of <...>
in an #include
directive need not be a valid file name. How exactly the compiler uses the named headers <...>
and "..."
is implementation-defined.
Example of a translation unit vs file scope
A translation unit is a source file along with all of its included headers that is compiled as a single unit.
In this example, myprogram.c along with the header stdio.h is one translation unit. The file friendsprogram.c is another translation unit.
Note that this doesn't change when you compile like this:
gcc myprogram.c friendsprogram.c -o out
Because this command line combines compiling and linking into a single step. A temporary object file is created for myprogram.c and another for friendsprogram.c, then those object files are linked to create the file "out".
C++ Translation Unit
A translation unit is basically the chunk of code that you give to a compiler to process. The compiler processes it and produces object code for the linker. The linker combines the object code from all of your translation units to form the executable. (Sometimes you'll see details that vary from this, such as not seeing a file for the object code when you have only one translation unit. The concept is still valid even though implementation details may vary.)
So typically, there is a one-to-one correspondence between .o
(or .obj
) files produced when compiling and translation units. Also typically, you get one .o
file for each .cpp
file. Hence, it's typically reasonable to consider each .cpp
file to be its own translation unit. Until you do something unconventional.
When you use an #include
directive, you tell the compiler to replace that one line with the entire contents of the included file. That is, the chunk of code given to the compiler includes the code from both the original file and the included file. If you include one .cpp
file into another, the chunk of code given to the compiler will include the code from two .cpp
files, breaking the equivalence between .cpp
files and translation units. This is generally considered a Bad idea.
Let's look at an example. Suppose you had a file named ext.cpp
that contained the following:
namespace
{
void extFunction()
{
std::cout << "Called Unnamed Namespace's function.\n";
}
}
Also suppose you had a file named main.cpp
that contained the following:
#include <iostream>
#include "ext.cpp"
int main()
{
extFunction();
return 0;
}
If you were to compile main.cpp
, one of the first things the compiler would do is preprocess main.cpp
. This modifies the file's contents, changing what the compiler sees. After preprocessing, the chunk of code that the compiler will process would look like the following.
[lots of code from the library header named "iostream"]
namespace
{
void extFunction()
{
std::cout << "Called Unnamed Namespace's function.\n";
}
}
int main()
{
extFunction();
return 0;
}
At this point, there is no problem calling extFunction
since the compiler sees the unnamed namespace in the chunk of code it is processing.
Another example for the requested information about using unnamed namespaces. Similar to the above, but different. Suppose you had a file named ext.cpp
that contained the following:
#include <iostream>
namespace
{
void extFunction()
{
std::cout << "Called Unnamed Namespace's function in EXT.\n";
}
}
void extPublic()
{
extFunction();
}
Let's also provide a header (ext.h
) that will declare the function that has external linkage.
void extPublic();
Now move on to main.cpp
:
#include <iostream>
#include "ext.h" // <-- Including the header, not the source.
namespace
{
void extFunction()
{
std::cout << "Called Unnamed Namespace's function in MAIN.\n";
}
}
int main()
{
extFunction();
extPublic();
return 0;
}
Look at that! There are two definitions for the function named extFunction
! Won't the linker get confused? Not at all. Those functions are not seen outside their translation units, so there is no conflict. If you compile main.cpp
, compile ext.cpp
, and link main.o
and ext.o
into a single executable, you get the following output.
Called Unnamed Namespace's function in MAIN.
Called Unnamed Namespace's function in EXT.
One benefit of an unnamed namespace is that you don't have to worry about conflicting with names in another source file's unnamed namespace. (This becomes a much bigger benefit when your project grows to encompass hundreds of source files.)
Translation unit vs Compilation unit vs object file vs executable vs.... in C++
Translation unit is the same as compilaition unit (so your source and all the header files it includes)
Object file, in typical cases, is the result of the compilation unit being compiled.
Executable file is the result of linking the object file(s) of a project, together with the runtime library function.
Exactly what files are actually generated during compilation depends on the compiler, but most modern compilers will simply read the source file and headers, then produce the object file, which is passed to the linker directly if you only have one source file. This produces the executable file.
Older compilers would "preprocess" as a separate step, so you'd end up with all the compile-unit in a temporary file. Similarly, in the old days, instead of generating machine code in the object file, assembler code would be output by the compiler, and then assembled through the an assembler, and this would make the object file. Linking remains similar.
Note that this is just practice, there is nothing in the C or C++ standards about executable files or object files. It's up to the compiler implementation to solve those things in whatever fashion they like.
C++ compilation vs translation unit
First, translation and compilation units are the same thing. The word/phrase Translation unit is used more often than compilation unit. Which basically means your source file including all of its header files.
Second we(and by we i mean good C++ books) use the term function template
or class template
rather than using the terms "template function" and "template class".
From documentation
The text of the program is kept in units called source files in this International Standard. A source file
together with all the headers (17.6.1.2) and source files included (16.2) via the preprocessing directive
#include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives, is
called a translation unit
Also note in the same document they have used the term compilation unit. And if you read carefully the use of the word compilation unit in that document, you will see that they mean the same thing as translation unit.
Now to clear everything up(from above),
- a compilation and a translation unit are the same thing.
- a cpp file alone(without its headers) does not constitute a translation unit(or compilation unit since they mean the same thing). On the other hand a cpp file with all of its headers included does constitute a translation/compilation unit.
Related Topics
When and Why Will a Compiler Initialise Memory to 0Xcd, 0Xdd, etc. on Malloc/Free/New/Delete
Why Do We Need Virtual Functions in C++
How to Automatically Generate a Stacktrace When My Program Crashes
How to Convert an Instance of Std::String to Lower Case
How to Serialize and Deserialize a Class in C++
Size of Character ('A') in C/C++
Operator≪ and Strict Weak Ordering
Operator Overloading: Member Function Vs. Non-Member Function
How to Properly Overload the ≪≪ Operator For an Ostream
What Is the Effect of Extern "C" in C++
Why Does an Overridden Function in the Derived Class Hide Other Overloads of the Base Class
How to Get the Cpu Cycle Count in X86_64 from C++
Convert Char to Int in C and C++
What Is the Proper Declaration of Main in C++
Exotic Architectures the Standards Committees Care About