C++ Header Files, Code Separation

C header files and compilation/linking

Uchia Itachi gave the answer. It's the linker.

Using GNU C compiler gcc you would compile a one-file program like

gcc hello.c -o hello # generating the executable hello

But compiling the two (or more) file program as described in your example, you would have to do the following:

gcc -c func.c # generates the object file func.o
gcc -c main.c # generates the object file main.o
gcc func.o main.o -o main # generates the executable main

Each object file has external symbols (you may think of it as public members). Functions are by default external while (global) variables are by default internal. You could change this behavior by defining

static int func(int i) { # static linkage
return ++i ;
}

or

/* global variable accessible from other modules (object files) */
extern int global_variable = 10;

When encountering a call to a function, not defined in the main module, the linker searches all the object files (and libraries) provided as input for the module where the called function is defined. By default you probably have some libraries linked to your program, that's how you can use printf, it's already compiled into a library.

If you are really interested, try some assembly programming. These names are the equivalent of labels in assembly code.

Why does C++ need a separate header file?

You seem to be asking about separating definitions from declarations, although there are other uses for header files.

The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.

The reasons you might want to separate are:

  1. To improve build times.
  2. To link against code without having the source for the definitions.
  3. To avoid marking everything "inline".

If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p

More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.

So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.

Converting my comments to answer your follow-up question:

How does the compiler find the .cpp file with the code in it

It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.

Separating class code into a header and cpp file

The class declaration goes into the header file. It is important that you add the #ifndef include guards. Most compilers now also support #pragma once. Also I have omitted the private, by default C++ class members are private.

// A2DD.h
#ifndef A2DD_H
#define A2DD_H

class A2DD
{
int gx;
int gy;

public:
A2DD(int x,int y);
int getSum();

};

#endif

and the implementation goes in the CPP file:

// A2DD.cpp
#include "A2DD.h"

A2DD::A2DD(int x,int y)
{
gx = x;
gy = y;
}

int A2DD::getSum()
{
return gx + gy;
}


Why are function bodies in C/C++ placed in separate source code files instead of headers?

Function bodies are placed into .cpp files to achieve the following:

  1. To make the compiler parse and compile them only once, as opposed to forcing it to compile them again, again and again everywhere the header file is included. Additionally, in case of header implementation linker will later have to detect and eliminate identical external-linkage functions arriving in different object files.

    Header pre-compilation facilities implemented by many modern compilers might significantly reduce the wasted effort required for repetitive recompilation of the same header file, but they don't entirely eliminate the issue.

  2. To hide the implementations of these functions from the future users of the module or library. Implementation hiding techniques help to enforce certain programming discipline, which reduces parasitic inter-dependencies between modules and thus leads to cleaner code and faster compilation times.

    I'd even say that even if users have access to full source code of the library (i.e. nothing is really "hidden" from them), clean separation between what is supposed to be visible through header files and what is not supposed to be visible is beneficial to library's self-documenting properties (although such separation is achievable in header-only libraries as well).

  3. To make some functions "invisible" to the outside world (i.e. internal linkage, not immediately relevant to your example with class methods).

  4. Non-inline functions residing in a specific translation unit can be subjected to certain context-dependent optimizations. For example, two different functions with identical tail portions can end up "sharing" the machine code implementing these identical tails.

    Functions declared as inline in header files are compiled multiple times in different translation units (i.e. in different contexts) and have to be eliminated by the linker later, which makes it more difficult (if at all possible) to take advantage of such optimization opportunities.

  5. Other reasons I might have missed.

C - properly partitioning the code into multiple files

The is no language-defined rule about file organization. Actually, all files included into your source file form so-called "compilation unit" created by precompiler, and are treated by compiler as single file. Precompiler's directive #include literally does that... includes that other file into the current file being processed. Thus it is essential to ensure that files are included only once.

A classic way to do so is to use #ifndef directive of preprocessor. Most compilers support directive #pragma once but it is not a standart tool, that sometimes behaves unpredictably in case of circular includes.

It is accepted practice to call files that contain reusable declarations and definitions "headers" and give them .h (and .hpp sometimes). C++ standard headers agreed to not have extension at all to avoid mixing them with C headers. E.g. stdio.h is superceded by cstdio and former shouldn't be used in C++. Sometimes there is requrement to include large repetetive definitions, that can get different extensions .inc, .incl , etc.

So why people do say "never include .c files". It's related to the toolchain used, the set of utilities that control application's building process. You would almost never run compiler manually. There always is a building tool that usually decides what to do with file , basing that decision on extension. .c files usually considered to represent separate compilation modules, so tool would run compiler for EACH of them, before trying to link them together.

Headers should not define objects such as functions or variables, they can declare them for external linking. Why? If you include header file with defined variable into several compilation units, you'll get an error from linker, because all units will contain same symbol. So, definition should be unique or program is considered ill-formed. Functions always assumed to have external linking by default

/* myheader.h */
#ifndef ___MYHEADER_H
#define ___MYHEADER_H

extern int globalVariable;

typedef struct MyType {

};

int foo(MyType);

#endif

/* myheader.cpp */
#include "myheader.h"

int globalVariable = 5;

int foo(MyType param)
{
/* body of foo */
}

How files are organized in project is up to designer. If you're part of team, you're expected to follow recommendations of that team's lead designer or their approved documentation.

Is it a good practice to place C++ definitions in header files?

Your coworker is wrong, the common way is and always has been to put code in .cpp files (or whatever extension you like) and declarations in headers.

There is occasionally some merit to putting code in the header, this can allow more clever inlining by the compiler. But at the same time, it can destroy your compile times since all code has to be processed every time it is included by the compiler.

Finally, it is often annoying to have circular object relationships (sometimes desired) when all the code is the headers.

Bottom line, you were right, he is wrong.

EDIT: I have been thinking about your question. There is one case where what he says is true. templates. Many newer "modern" libraries such as boost make heavy use of templates and often are "header only." However, this should only be done when dealing with templates as it is the only way to do it when dealing with them.

EDIT: Some people would like a little more clarification, here's some thoughts on the downsides to writing "header only" code:

If you search around, you will see quite a lot of people trying to find a way to reduce compile times when dealing with boost. For example: How to reduce compilation times with Boost Asio, which is seeing a 14s compile of a single 1K file with boost included. 14s may not seem to be "exploding", but it is certainly a lot longer than typical and can add up quite quickly when dealing with a large project. Header only libraries do affect compile times in a quite measurable way. We just tolerate it because boost is so useful.

Additionally, there are many things which cannot be done in headers only (even boost has libraries you need to link to for certain parts such as threads, filesystem, etc). A Primary example is that you cannot have simple global objects in header only libs (unless you resort to the abomination that is a singleton) as you will run into multiple definition errors. NOTE: C++17's inline variables will make this particular example doable in the future.

As a final point, when using boost as an example of header only code, a huge detail often gets missed.

Boost is library, not user level code. so it doesn't change that often. In user code, if you put everything in headers, every little change will cause you to have to recompile the entire project. That's a monumental waste of time (and is not the case for libraries that don't change from compile to compile). When you split things between header/source and better yet, use forward declarations to reduce includes, you can save hours of recompiling when added up across a day.



Related Topics



Leave a reply



Submit