Why Have Header Files and .Cpp Files

Why have header files and .cpp files?

Well, the main reason would be for separating the interface from the implementation. The header declares "what" a class (or whatever is being implemented) will do, while the cpp file defines "how" it will perform those features.

This reduces dependencies so that code that uses the header doesn't necessarily need to know all the details of the implementation and any other classes/headers needed only for that. This will reduce compilation times and also the amount of recompilation needed when something in the implementation changes.

It's not perfect, and you would usually resort to techniques like the Pimpl Idiom to properly separate interface and implementation, but it's a good start.

Why does C++ need a separate header file?

You seem to be asking about separating definitions from declarations, although there are other uses for header files.

The answer is that C++ doesn't "need" this. If you mark everything inline (which is automatic anyway for member functions defined in a class definition), then there is no need for the separation. You can just define everything in the header files.

The reasons you might want to separate are:

To improve build times.
To link against code without having the source for the definitions.
To avoid marking everything "inline".

If your more general question is, "why isn't C++ identical to Java?", then I have to ask, "why are you writing C++ instead of Java?" ;-p

More seriously, though, the reason is that the C++ compiler can't just reach into another translation unit and figure out how to use its symbols, in the way that javac can and does. The header file is needed to declare to the compiler what it can expect to be available at link time.

So #include is a straight textual substitution. If you define everything in header files, the preprocessor ends up creating an enormous copy and paste of every source file in your project, and feeding that into the compiler. The fact that the C++ standard was ratified in 1998 has nothing to do with this, it's the fact that the compilation environment for C++ is based so closely on that of C.

Converting my comments to answer your follow-up question:

How does the compiler find the .cpp file with the code in it

It doesn't, at least not at the time it compiles the code that used the header file. The functions you're linking against don't even need to have been written yet, never mind the compiler knowing what .cpp file they'll be in. Everything the calling code needs to know at compile time is expressed in the function declaration. At link time you will provide a list of .o files, or static or dynamic libraries, and the header in effect is a promise that the definitions of the functions will be in there somewhere.

What's the REAL difference between .h and .cpp files?

Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.

Let's take your example of defining functions or variables. Suppose your header file contains the following line:

header.h:

int x = 10;

code.cpp:

#include "header.h"

Now, if you only have one code file and one header file this probably works just fine:

g++ code.cpp -o outputFile

However, if you have two code files this breaks:

header.h:

int x = 10;

code1.cpp:

#include "header.h"

code2.cpp:

#include "header.h"

And then:

g++ code1.cpp -c  (produces code1.o)
g++ code2.cpp -c  (produces code2.o)
g++ code1.o code2.o -o outputFile

This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:

code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status

Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.

Why do we include header files and not source files?

The moment you include Bob.h the compiler has everything it needs to know about PrintSomething(), it only need a declaration of the function. Frank.cpp does not need to know about Bob.cpp which defines PrintSomething().

All of your individual cpp files output object files generated by the compiler. These in themselves don't do much until they're all glued together, this is the linker's responsibility.

The linker takes all your object files and fills in the missing parts:

Linker talk:

Hey, I see that Frank.obj uses PrintSomething() and I can't see
its definition in that object file.
Let's check the other object files..
Upon inspecting Bob.obj I can see that this contains a usable
definition for PrintSomething(), let's use that.

This is of course simplified but that's what a linker does in short.

After this is done you get your usable executable.

on top of which if I were to now say in my Frank.cpp file: void MyClass::PrintSomething(){std::cout << "Bye";} and included the Bob.h
file in my main.cpp and called the PrintSomething() function would it
print "Hello" or "Bye"? Is the computer psychic or something?

The linker would find 2 definitions of PrintSomething() and would emit an error, it has no way to know what definition is the right one to pick.

Why are .h files included in .cpp files and not vice versa? [duplicate]

This seems to be the right etiquette anyway.
It's not a matter of etiquette, but .cpp files are considered as translation units and handled as such by any decent compiler or build system.

The .h files contain all the declarations needed to be seen for use in other translation units.

The compiled .cpp files are stitched together in the final linking phase.

This way though, I'm not including the .cpp file anywhere, and currently my code is complaining about not finding the functions therein. What am I doing wrong?

You probably missed to get that last part of linking the generated object files right.

As for your edit:

You're showing a template declaration now, which is a more special case. The compiler needs to see the template definitions to get them instantiated correctly.

You can read more about the details here: Why can templates only be implemented in the header file?

Why should I not include cpp files and instead use a header?

To the best of my knowledge, the C++ standard knows no difference between header files and source files. As far as the language is concerned, any text file with legal code is the same as any other. However, although not illegal, including source files into your program will pretty much eliminate any advantages you would've got from separating your source files in the first place.

Essentially, what #include does is tell the preprocessor to take the entire file you've specified, and copy it into your active file before the compiler gets its hands on it. So when you include all the source files in your project together, there is fundamentally no difference between what you've done, and just making one huge source file without any separation at all.

"Oh, that's no big deal. If it runs, it's fine," I hear you cry. And in a sense, you'd be correct. But right now you're dealing with a tiny tiny little program, and a nice and relatively unencumbered CPU to compile it for you. You won't always be so lucky.

If you ever delve into the realms of serious computer programming, you'll be seeing projects with line counts that can reach millions, rather than dozens. That's a lot of lines. And if you try to compile one of these on a modern desktop computer, it can take a matter of hours instead of seconds.

"Oh no! That sounds horrible! However can I prevent this dire fate?!" Unfortunately, there's not much you can do about that. If it takes hours to compile, it takes hours to compile. But that only really matters the first time -- once you've compiled it once, there's no reason to compile it again.

Unless you change something.

Now, if you had two million lines of code merged together into one giant behemoth, and need to do a simple bug fix such as, say, x = y + 1, that means you have to compile all two million lines again in order to test this. And if you find out that you meant to do a x = y - 1 instead, then again, two million lines of compile are waiting for you. That's many hours of time wasted that could be better spent doing anything else.

"But I hate being unproductive! If only there was some way to compile distinct parts of my codebase individually, and somehow link them together afterwards!" An excellent idea, in theory. But what if your program needs to know what's going on in a different file? It's impossible to completely separate your codebase unless you want to run a bunch of tiny tiny .exe files instead.

"But surely it must be possible! Programming sounds like pure torture otherwise! What if I found some way to separate interface from implementation? Say by taking just enough information from these distinct code segments to identify them to the rest of the program, and putting them in some sort of header file instead? And that way, I can use the #include preprocessor directive to bring in only the information necessary to compile!"

Hmm. You might be on to something there. Let me know how that works out for you.

What is the difference between a .cpp file and a .h file?

The C++ build system (compiler) knows no difference, so it's all one of conventions.

The convention is that .h files are declarations, and .cpp files are definitions.

That's why .h files are #included -- we include the declarations.

How is the header file connected to the corresponding .cpp file? [duplicate]

Short answer: there is no relationship between the header and its implementation. One can exist without the other, or the two could be placed in files with unrelated names.

while compiling the main.cpp how will the compiler know to look for the definition of any function mentioned in yum.h in yum.cpp?

The compiler has no idea. Each time it sees a reference to something declared in yum.h, or in any other header file, for that matter, it stays on a lookout for the corresponding definition.

If the definition is not there by the time the compiler has reached for the end of translation unit, it writes unsatisfied references into its main.o output, noting the places from which they are coming from. This is called a symbol table.

Then the compiler compiles yum.cpp, finds definitions from yum.h in it, and writes their positions into yum.o's symbol table.

Once all cpp files have been processed, linker grabs all .o files, and builds a combined symbol table from them. If unsatisfied references remain, it issues an error. Otherwise, it links references from main.o with the corresponding symbols from yum.o, completing the process.

Consider an example: let's say yum.h declares a global variable int yum = 0 defined in yum.cpp, and main.cpp prints that variable. The compiler produces main.o with a symbol table saying "I need int yum's definition at address 1234", and yum.o file's symbol table saying "I have int yum at address 9876". Linker matches the "I need" with "I have" by placing 9876 at the address 1234.

Why Have Header Files and .Cpp Files