Why Are Preprocessor Macros Evil and What Are the Alternatives

Why are preprocessor macros evil and what are the alternatives?

Macros are just like any other tool - a hammer used in a murder is not evil because it's a hammer. It is evil in the way the person uses it in that way. If you want to hammer in nails, a hammer is a perfect tool.

There are a few aspects to macros that make them "bad" (I'll expand on each later, and suggest alternatives):

You can not debug macros.
Macro expansion can lead to strange side effects.
Macros have no "namespace", so if you have a macro that clashes with a name used elsewhere, you get macro replacements where you didn't want it, and this usually leads to strange error messages.
Macros may affect things you don't realize.

So let's expand a little here:

1) Macros can't be debugged.
When you have a macro that translates to a number or a string, the source code will have the macro name, and many debuggers can't "see" what the macro translates to. So you don't actually know what is going on.

Replacement: Use enum or const T

For "function-like" macros, because the debugger works on a "per source line where you are" level, your macro will act like a single statement, no matter if it's one statement or a hundred. Makes it hard to figure out what is going on.

Replacement: Use functions - inline if it needs to be "fast" (but beware that too much inline is not a good thing)

2) Macro expansions can have strange side effects.

The famous one is #define SQUARE(x) ((x) * (x)) and the use x2 = SQUARE(x++). That leads to x2 = (x++) * (x++);, which, even if it was valid code [1], would almost certainly not be what the programmer wanted. If it was a function, it would be fine to do x++, and x would only increment once.

Another example is "if else" in macros, say we have this:

#define safe_divide(res, x, y)   if (y != 0) res = x/y;

and then

if (something) safe_divide(b, a, x);
else printf("Something is not set...");

It actually becomes completely the wrong thing....

Replacement: real functions.

3) Macros have no namespace

If we have a macro:

#define begin() x = 0

and we have some code in C++ that uses begin:

std::vector<int> v;

... stuff is loaded into v ... 

for (std::vector<int>::iterator it = myvector.begin() ; it != myvector.end(); ++it)
   std::cout << ' ' << *it;

Now, what error message do you think you get, and where do you look for an error [assuming you have completely forgotten - or didn't even know about - the begin macro that lives in some header file that someone else wrote? [and even more fun if you included that macro before the include - you'd be drowning in strange errors that makes absolutely no sense when you look at the code itself.

Replacement: Well there isn't so much as a replacement as a "rule" - only use uppercase names for macros, and never use all uppercase names for other things.

4) Macros have effects you don't realize

Take this function:

#define begin() x = 0
#define end() x = 17
... a few thousand lines of stuff here ... 
void dostuff()
{
    int x = 7;

    begin();

    ... more code using x ... 

    printf("x=%d\n", x);

    end();

}

Now, without looking at the macro, you would think that begin is a function, which shouldn't affect x.

This sort of thing, and I've seen much more complex examples, can REALLY mess up your day!

Replacement: Either don't use a macro to set x, or pass x in as an argument.

There are times when using macros is definitely beneficial. One example is to wrap a function with macros to pass on file/line information:

#define malloc(x) my_debug_malloc(x, __FILE__, __LINE__)
#define free(x)  my_debug_free(x, __FILE__, __LINE__)

Now we can use my_debug_malloc as the regular malloc in the code, but it has extra arguments, so when it comes to the end and we scan the "which memory elements hasn't been freed", we can print where the allocation was made so the programmer can track down the leak.

[1] It is undefined behaviour to update one variable more than once "in a sequence point". A sequence point is not exactly the same as a statement, but for most intents and purposes, that's what we should consider it as. So doing x++ * x++ will update x twice, which is undefined and will probably lead to different values on different systems, and different outcome value in x as well.

When are C++ macros beneficial?

As wrappers for debug functions, to automatically pass things like __FILE__, __LINE__, etc:

#ifdef ( DEBUG )
#define M_DebugLog( msg )  std::cout << __FILE__ << ":" << __LINE__ << ": " << msg
#else
#define M_DebugLog( msg )
#endif

Since C++20 the magic type std::source_location can however be used instead of __LINE__ and __FILE__ to implement an analogue as a normal function (template).

Difference between preprocessor macros and std::source_location

Preprocessor macros live outside the type system. Preprocessor macro substitution happens outside the rest of the language. See this answer and this answer for a comprehensive discussion of the disadvantages of using the preprocessor.

std::source_location on the other hand behaves like any other C++ struct. It has plain value fields that are typed and behave like any other values in the language.

Besides that, functionality-wise the two mechanisms are equivalent. There is nothing that the one can achieve that cannot be done by the other (apart from the column field in source_location, which has no equivalent in the preprocessor). It's just that the new approach achieves its goals more nicely.

What is the worst real-world macros/pre-processor abuse you've ever come across?

From memory, it looked something like this:

#define RETURN(result) return (result);}

int myfunction1(args) {
    int x = 0;
    // do something
    RETURN(x)

int myfunction2(args) {
    int y = 0;
    // do something
    RETURN(y)

int myfunction3(args) {
    int z = 0;
    // do something
    RETURN(z)

Yes that's right, no closing braces in any of the functions. Syntax highlighting was a mess, so he used vi to edit (not vim, it has syntax coloring!)

He was a Russian programmer who had mostly worked in assembly language. He was fanatical about saving as many bytes as possible because he had previously worked on systems with very limited memory. "It was for satellite. Only very few byte, so we use each byte over for many things." (bit fiddling, reusing machine instruction bytes for their numeric values) When I tried to find out what kinds of satellites, I was only able to get "Orbiting satellite. For making to orbit."

He had two other quirks: A convex mirror mounted above his monitor "For knowing who is watching", and an occasional sudden exit from his chair to do a quick ten pushups. He explained this last one as "Compiler found error in code. This is punishment".

What would make C++ preprocessor macros an accepted development tool?

Most preprocessor abuse come from misunderstanding, to quote Paul Mensonides(the author of the Boost.Preprocessor library):

Virtually all
issues related to the misuse of the preprocessor stems from attempting to
make object-like macros look like constant variables and function-like
macro invocations look like underlying-language function calls. At best,
the correlation between function-like macro invocations and function calls
should be incidental. It should never be considered to be a goal. That
is a fundamentally broken mentality.

As the preprocessor is well integrated into C++, its easier to blur the line, and most people don't see a difference. For example, ask someone to write a macro to add two numbers together, most people will write something like this:

#define ADD(x, y) ((x) + (y))

This is completely wrong. Runs this through the preprocessor:

#define ADD(x, y) ((x) + (y))
ADD(1, 2) // outputs ((1) + (2))

But the answer should be 3, since adding 1 to 2 is 3. Yet instead a macro is written to generate a C++ expression. Not only that, it could be thought of as a C++ function, but its not. This is where it leads to abuse. Its just generating a C++ expression, and a function is a much better way to go.

Furthermore, macros don't work like functions at all. The preprocessor works through a process of scanning and expanding macros, which is very different than using a call stack to call functions.

There are times it can be acceptable for macros to generate C++ code, as long as it isn't blurring the lines. Just like if you were to use python as a preprocessor to generate code, the preprocessor can do the same, and has the advantage that it doesn't need an extra build step.

Also, the preprocessor can be used with DSLs, like here and here, but these DSLs have a predefined grammar in the preprocessor, that it uses to generate C++ code. Its not really blurring the lines since it uses a different grammar.

Are preprocessor directives processed before macros are expanded?

Yes and no. Each preprocessor directive defnes its own interaction with macro replacement. The general rule is (C++11 16/6):

The preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise
stated.

Another relevant general rule is 16/1:

A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints:
The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either
the first character in the source file (optionally after white space containing no new-line characters) or that
follows white space containing at least one new-line character. The last token in the sequence is the first newline
character that follows the first token in the sequence. A new-line character ends the preprocessing
directive even if it occurs within what would otherwise be an invocation of a function-like macro.

(Translation phase 4 is preprocessing).

Some rules for individual directives:

#if and #elif expand macros in their arguments, except for arguments of defined (16.1/4).
#include expands macros in its arguments; they must eventually expand to a string delimited by "" or <>.
#line expands macros in its arguments; they must eventually expand to valid syntax for non-macro arguments to #line (16.4/5)

Is it OK to use #define preprocessor directive for function header (declaration/definition) in C?

You can just use a regular typedef to be type safe. It avoids the pitfalls of the pre-processor and doesn't require inventing your own language within a language. Your colleagues have a point.

typedef void EventCB(Event, EventArgs *);

And use it as you'd expect:

static EventCB onEvent1;
static EventCB onEvent2;
static EventCB onEvent3;

You'll need to repeat the prototype when defining the functions, but the compiler will do type checking and warn you of any mistakes. Another plus of working with the type system, is the ability to use the same type to declare pointers:

EventCB *func_ptr = &onEvent1;

Why Are Preprocessor Macros Evil and What Are the Alternatives