What Is The Reason for Having Unreserved Identifiers as Built-In Macros in Gcc

What is the reason for having unreserved identifiers as built-in macros in gcc?

gcc does not fully conform to any C standard by default.

Invoke it with -ansi, -std=c99, or -std=c1x and unix won't be predefined. (-std=c1x ~~will probably become~~ became -std=c11 in a ~~future~~ more recent gcc release.)

It's a bit confusing that this is documented in the separate manual for the GNU preprocessor, not in the gcc manual.

Quoting the GNU preprocessor documentation (info cpp, version 4.5):

The C standard requires that all system-specific macros be part of
the "reserved namespace". All names which begin with two underscores,
or an underscore and a capital letter, are reserved for the compiler
and library to use as they wish. However, historically
system-specific macros have had names with no special prefix; for
instance, it is common to find `unix' defined on Unix systems. For
all such macros, GCC provides a parallel macro with two underscores
added at the beginning and the end. If `unix' is defined,
`__unix__' will be defined too. There will never be more than two
underscores; the parallel of `_mips' is `__mips__'.

When the `-ansi' option, or any `-std' option that requests strict
conformance, is given to the compiler, all the system-specific
predefined macros outside the reserved namespace are suppressed. The
parallel macros, inside the reserved namespace, remain defined.

We are slowly phasing out all predefined macros which are outside the
reserved namespace. You should never use them in new programs, and we
encourage you to correct older code to use the parallel macros
whenever you find it. We don't recommend you use the system-specific
macros that are in the reserved namespace, either. It is better in
the long run to check specifically for features you need, using a tool
such as `autoconf'.

The current version of the manual is here.

What is the reason for having unreserved identifiers as built-in macros in gcc?

gcc does not fully conform to any C standard by default.

Invoke it with -ansi, -std=c99, or -std=c1x and unix won't be predefined. (-std=c1x ~~will probably become~~ became -std=c11 in a ~~future~~ more recent gcc release.)

It's a bit confusing that this is documented in the separate manual for the GNU preprocessor, not in the gcc manual.

Quoting the GNU preprocessor documentation (info cpp, version 4.5):

The C standard requires that all system-specific macros be part of
the "reserved namespace". All names which begin with two underscores,
or an underscore and a capital letter, are reserved for the compiler
and library to use as they wish. However, historically
system-specific macros have had names with no special prefix; for
instance, it is common to find `unix' defined on Unix systems. For
all such macros, GCC provides a parallel macro with two underscores
added at the beginning and the end. If `unix' is defined,
`__unix__' will be defined too. There will never be more than two
underscores; the parallel of `_mips' is `__mips__'.

When the `-ansi' option, or any `-std' option that requests strict
conformance, is given to the compiler, all the system-specific
predefined macros outside the reserved namespace are suppressed. The
parallel macros, inside the reserved namespace, remain defined.

We are slowly phasing out all predefined macros which are outside the
reserved namespace. You should never use them in new programs, and we
encourage you to correct older code to use the parallel macros
whenever you find it. We don't recommend you use the system-specific
macros that are in the reserved namespace, either. It is better in
the long run to check specifically for features you need, using a tool
such as `autoconf'.

The current version of the manual is here.

C identifier names: What goes with which compiler?

The rules in the 2018 C standard for identifiers include:

Per 6.4.2.1 1, an identifier is a sequence of identifier-nondigit and digit characters, starting with an identifier-nondigit.
An identifier-nodigit is _, a to z, A to Z, a universal-character-name, or “other implementation-defined characters”.
A digit is 0 to 9.
A universal-character-name is \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits, which specify Unicode characters.

So, if an implementation allows $, that is a valid character for that implementation. You may use it, but it may not be portable to other implementations. The C standard requires implementations to accept the specific characters listed, but it allows them to accept more. Generally, the C standard should be viewed as an open field rather than a walled garden: The behavior is defined within the field, but you are not stopped at the barrier; you may go beyond it, at your own risk.

The rules you were taught were rules for what is portable, not rules for what the C standard requires implementations to restrict you to.

The C standard defines strictly conforming code, which is, roughly speaking, code that should work in any C implementation, and conforming code, which is code that works in at least one C implementation. Conforming code is still C code. So the rules you were taught were for strictly conforming code.

Generally, you should prefer to write strictly conforming code and only use additional features when benefit (speed, ease of development on a particular platform, whatever) is worth the cost (loss of portability).

i386 macro predefined in make or gcc?

I believe the C preprocessor (cpp), which is called by gcc, defines i386 (for i386 systems). You can find out what it defines like so:

touch foo.h; cpp -dM foo.h; rm foo.h

This method is described by the cpp man page, under -d, with the character M (so, -dM):

Instead of the normal output, generate a list of #define directives for all the macros defined during the execution of the preprocessor, including predefined macros. This gives you a way of finding out what is predefined in your version of the preprocessor. Assuming you have no file foo.h, the command
    touch foo.h; cpp -dM foo.h
will show all the predefined macros.

make m4 see macro when macro ends with same character as string following macro

Deceptively simple really, all I had to do in the underlying text was change this:

__PREFIX___some_service

for this:

__PREFIX__()_some_service

It looks a bit clunky perhaps, but it is a macro after all and there's no need to change the macro definition. So this can stay as it is:

define(`__PREFIX__', `App_Mnemonic')

use _ and __ in C programs

Here's what the C standard says (section 7.1.3):

All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.

(The section goes on to list specific identifiers and sets of identifiers reserved by certain standard headers.)

What this means is that for example, the implementation (either the compiler or a standard header) can use the name __FOO for anything it likes. If you define that identifier in your own code, your program's behavior is undefined. If you're "lucky", you'll be using an implementation that doesn't happen to define it, and your program will work as expected.

This means you simply should not define any such identifiers in your own code (unless your own code is part of a C implementation -- and if you have to ask, it isn't). There's no need to define such identifiers anyway; there's hardly any shortage of unreserved identifiers.

You can use an identifier like _foo as long as it's defined locally (not at file scope) -- but personally I find it much easier just to avoid using leading underscores at all.

Incidentally, your example of _sqrt doesn't necessarily illustrate the point. An implementation may define the name _sqrt in <math.h> (since anything defined there is at file scope), but there's no particular reason to expect that it will do so. When I compile your program, I get a warning:

c.c:7:1: warning: implicit declaration of function ‘_sqrt’ [-Wimplicit-function-declaration]

because <math.h> on my system doesn't define that identifier, and a link-time fatal error:

/tmp/cc1ixRmL.o: In function `main':
c.c:(.text+0x1a): undefined reference to `_sqrt'

because there's no such symbol in the library.

#define macro for debug printing in C?

If you use a C99 or later compiler

#define debug_print(fmt, ...) \
            do { if (DEBUG) fprintf(stderr, fmt, __VA_ARGS__); } while (0)

It assumes you are using C99 (the variable argument list notation is not supported in earlier versions). The do { ... } while (0) idiom ensures that the code acts like a statement (function call). The unconditional use of the code ensures that the compiler always checks that your debug code is valid — but the optimizer will remove the code when DEBUG is 0.

If you want to work with #ifdef DEBUG, then change the test condition:

#ifdef DEBUG
#define DEBUG_TEST 1
#else
#define DEBUG_TEST 0
#endif

And then use DEBUG_TEST where I used DEBUG.

If you insist on a string literal for the format string (probably a good idea anyway), you can also introduce things like __FILE__, __LINE__ and __func__ into the output, which can improve the diagnostics:

#define debug_print(fmt, ...) \
        do { if (DEBUG) fprintf(stderr, "%s:%d:%s(): " fmt, __FILE__, \
                                __LINE__, __func__, __VA_ARGS__); } while (0)

This relies on string concatenation to create a bigger format string than the programmer writes.

If you use a C89 compiler

If you are stuck with C89 and no useful compiler extension, then there isn't a particularly clean way to handle it. The technique I used to use was:

#define TRACE(x) do { if (DEBUG) dbg_printf x; } while (0)

And then, in the code, write:

TRACE(("message %d\n", var));

The double-parentheses are crucial — and are why you have the funny notation in the macro expansion. As before, the compiler always checks the code for syntactic validity (which is good) but the optimizer only invokes the printing function if the DEBUG macro evaluates to non-zero.

This does require a support function — dbg_printf() in the example — to handle things like 'stderr'. It requires you to know how to write varargs functions, but that isn't hard:

#include <stdarg.h>
#include <stdio.h>

void dbg_printf(const char *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
}

You can also use this technique in C99, of course, but the __VA_ARGS__ technique is neater because it uses regular function notation, not the double-parentheses hack.

Why is it crucial that the compiler always see the debug code?

[Rehashing comments made to another answer.]

One central idea behind both the C99 and C89 implementations above is that the compiler proper always sees the debugging printf-like statements. This is important for long-term code — code that will last a decade or two.

Suppose a piece of code has been mostly dormant (stable) for a number of years, but now needs to be changed. You re-enable debugging trace - but it is frustrating to have to debug the debugging (tracing) code because it refers to variables that have been renamed or retyped, during the years of stable maintenance. If the compiler (post pre-processor) always sees the print statement, it ensures that any surrounding changes have not invalidated the diagnostics. If the compiler does not see the print statement, it cannot protect you against your own carelessness (or the carelessness of your colleagues or collaborators). See 'The Practice of Programming' by Kernighan and Pike, especially Chapter 8 (see also Wikipedia on TPOP).

This is 'been there, done that' experience — I used essentially the technique described in other answers where the non-debug build does not see the printf-like statements for a number of years (more than a decade). But I came across the advice in TPOP (see my previous comment), and then did enable some debugging code after a number of years, and ran into problems of changed context breaking the debugging. Several times, having the printing always validated has saved me from later problems.

I use NDEBUG to control assertions only, and a separate macro (usually DEBUG) to control whether debug tracing is built into the program. Even when the debug tracing is built in, I frequently do not want debug output to appear unconditionally, so I have mechanism to control whether the output appears (debug levels, and instead of calling fprintf() directly, I call a debug print function that only conditionally prints so the same build of the code can print or not print based on program options). I also have a 'multiple-subsystem' version of the code for bigger programs, so that I can have different sections of the program producing different amounts of trace - under runtime control.

I am advocating that for all builds, the compiler should see the diagnostic statements; however, the compiler won't generate any code for the debugging trace statements unless debug is enabled. Basically, it means that all of your code is checked by the compiler every time you compile - whether for release or debugging. This is a good thing!

debug.h - version 1.2 (1990-05-01)

/*
@(#)File:            $RCSfile: debug.h,v $
@(#)Version:         $Revision: 1.2 $
@(#)Last changed:    $Date: 1990/05/01 12:55:39 $
@(#)Purpose:         Definitions for the debugging system
@(#)Author:          J Leffler
*/

#ifndef DEBUG_H
#define DEBUG_H

/* -- Macro Definitions */

#ifdef DEBUG
#define TRACE(x)    db_print x
#else
#define TRACE(x)
#endif /* DEBUG */

/* -- Declarations */

#ifdef DEBUG
extern  int     debug;
#endif

#endif  /* DEBUG_H */

debug.h - version 3.6 (2008-02-11)

/*
@(#)File:           $RCSfile: debug.h,v $
@(#)Version:        $Revision: 3.6 $
@(#)Last changed:   $Date: 2008/02/11 06:46:37 $
@(#)Purpose:        Definitions for the debugging system
@(#)Author:         J Leffler
@(#)Copyright:      (C) JLSS 1990-93,1997-99,2003,2005,2008
@(#)Product:        :PRODUCT:
*/

#ifndef DEBUG_H
#define DEBUG_H

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif /* HAVE_CONFIG_H */

/*
** Usage:  TRACE((level, fmt, ...))
** "level" is the debugging level which must be operational for the output
** to appear. "fmt" is a printf format string. "..." is whatever extra
** arguments fmt requires (possibly nothing).
** The non-debug macro means that the code is validated but never called.
** -- See chapter 8 of 'The Practice of Programming', by Kernighan and Pike.
*/
#ifdef DEBUG
#define TRACE(x)    db_print x
#else
#define TRACE(x)    do { if (0) db_print x; } while (0)
#endif /* DEBUG */

#ifndef lint
#ifdef DEBUG
/* This string can't be made extern - multiple definition in general */
static const char jlss_id_debug_enabled[] = "@(#)*** DEBUG ***";
#endif /* DEBUG */
#ifdef MAIN_PROGRAM
const char jlss_id_debug_h[] = "@(#)$Id: debug.h,v 3.6 2008/02/11 06:46:37 jleffler Exp $";
#endif /* MAIN_PROGRAM */
#endif /* lint */

#include <stdio.h>

extern int      db_getdebug(void);
extern int      db_newindent(void);
extern int      db_oldindent(void);
extern int      db_setdebug(int level);
extern int      db_setindent(int i);
extern void     db_print(int level, const char *fmt,...);
extern void     db_setfilename(const char *fn);
extern void     db_setfileptr(FILE *fp);
extern FILE    *db_getfileptr(void);

/* Semi-private function */
extern const char *db_indent(void);

/**************************************\
** MULTIPLE DEBUGGING SUBSYSTEMS CODE **
\**************************************/

/*
** Usage:  MDTRACE((subsys, level, fmt, ...))
** "subsys" is the debugging system to which this statement belongs.
** The significance of the subsystems is determined by the programmer,
** except that the functions such as db_print refer to subsystem 0.
** "level" is the debugging level which must be operational for the
** output to appear. "fmt" is a printf format string. "..." is
** whatever extra arguments fmt requires (possibly nothing).
** The non-debug macro means that the code is validated but never called.
*/
#ifdef DEBUG
#define MDTRACE(x)  db_mdprint x
#else
#define MDTRACE(x)  do { if (0) db_mdprint x; } while (0)
#endif /* DEBUG */

extern int      db_mdgetdebug(int subsys);
extern int      db_mdparsearg(char *arg);
extern int      db_mdsetdebug(int subsys, int level);
extern void     db_mdprint(int subsys, int level, const char *fmt,...);
extern void     db_mdsubsysnames(char const * const *names);

#endif /* DEBUG_H */

Single argument variant for C99 or later

Kyle Brandt asked:

Anyway to do this so debug_print still works even if there are no arguments? For example:
    debug_print("Foo");

There's one simple, old-fashioned hack:

debug_print("%s\n", "Foo");

The GCC-only solution shown below also provides support for that.

However, you can do it with the straight C99 system by using:

#define debug_print(...) \
            do { if (DEBUG) fprintf(stderr, __VA_ARGS__); } while (0)

Compared to the first version, you lose the limited checking that requires the 'fmt' argument, which means that someone could try to call 'debug_print()' with no arguments (but the trailing comma in the argument list to fprintf() would fail to compile). Whether the loss of checking is a problem at all is debatable.

GCC-specific technique for a single argument

Some compilers may offer extensions for other ways of handling variable-length argument lists in macros. Specifically, as first noted in the comments by Hugo Ideler, GCC allows you to omit the comma that would normally appear after the last 'fixed' argument to the macro. It also allows you to use ##__VA_ARGS__ in the macro replacement text, which deletes the comma preceding the notation if, but only if, the previous token is a comma:

#define debug_print(fmt, ...) \
            do { if (DEBUG) fprintf(stderr, fmt, ##__VA_ARGS__); } while (0)

This solution retains the benefit of requiring the format argument while accepting optional arguments after the format.

This technique is also supported by Clang for GCC compatibility.

Why the do-while loop?

What's the purpose of the do while here?

You want to be able to use the macro so it looks like a function call, which means it will be followed by a semi-colon. Therefore, you have to package the macro body to suit. If you use an if statement without the surrounding do { ... } while (0), you will have:

/* BAD - BAD - BAD */
#define debug_print(...) \
            if (DEBUG) fprintf(stderr, __VA_ARGS__)

Now, suppose you write:

if (x > y)
    debug_print("x (%d) > y (%d)\n", x, y);
else
    do_something_useful(x, y);

Unfortunately, that indentation doesn't reflect the actual control of flow, because the preprocessor produces code equivalent to this (indented and braces added to emphasize the actual meaning):

if (x > y)
{
    if (DEBUG)
        fprintf(stderr, "x (%d) > y (%d)\n", x, y);
    else
        do_something_useful(x, y);
}

The next attempt at the macro might be:

/* BAD - BAD - BAD */
#define debug_print(...) \
            if (DEBUG) { fprintf(stderr, __VA_ARGS__); }

And the same code fragment now produces:

if (x > y)
    if (DEBUG)
    {
        fprintf(stderr, "x (%d) > y (%d)\n", x, y);
    }
; // Null statement from semi-colon after macro
else
    do_something_useful(x, y);

And the else is now a syntax error. The do { ... } while(0) loop avoids both these problems.

There's one other way of writing the macro which might work:

/* BAD - BAD - BAD */
#define debug_print(...) \
            ((void)((DEBUG) ? fprintf(stderr, __VA_ARGS__) : 0))

This leaves the program fragment shown as valid. The (void) cast prevents it being used in contexts where a value is required — but it could be used as the left operand of a comma operator where the do { ... } while (0) version cannot. If you think you should be able to embed debug code into such expressions, you might prefer this. If you prefer to require the debug print to act as a full statement, then the do { ... } while (0) version is better. Note that if the body of the macro involved any semi-colons (roughly speaking), then you can only use the do { ... } while(0) notation. It always works; the expression statement mechanism can be more difficult to apply. You might also get warnings from the compiler with the expression form that you'd prefer to avoid; it will depend on the compiler and the flags you use.

^{TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2015-08-10) broken.}

Code in GitHub

If you're curious, you can look at this code in GitHub in my SOQ (Stack
Overflow Questions) repository as files debug.c, debug.h and mddebug.c in the
src/libsoq
sub-directory.

Is this redundant load/store optimization allowed in C99?

The answer is a definitive NO, this is not allowed.

Consider what happens if foo and bar are mutually recursive. For example, this implementation of bar:

void bar(int *restrict p)
{
    static int q;
    if (p == &q) {
        printf("pointers match!\n");
    } else if (p == NULL) {
        foo(&q);
    }
}

bar never dereferences p, so the restrict qualifier is irrelevant. It is obvious that the static variable q cannot have the same address as the automatic variable tmp in foo. Therefore, foo cannot pass its parameter back to bar, and the given optimization is not allowed.

What Is The Reason for Having Unreserved Identifiers as Built-In Macros in Gcc

What is the reason for having unreserved identifiers as built-in macros in gcc?

What is the reason for having unreserved identifiers as built-in macros in gcc?

C identifier names: What goes with which compiler?

i386 macro predefined in make or gcc?

make m4 see macro when macro ends with same character as string following macro

use _ and __ in C programs

#define macro for debug printing in C?

If you use a C99 or later compiler

If you use a C89 compiler

Why is it crucial that the compiler always see the debug code?

debug.h - version 1.2 (1990-05-01)

debug.h - version 3.6 (2008-02-11)

Single argument variant for C99 or later

GCC-specific technique for a single argument

Why the do-while loop?

Code in GitHub

Is this redundant load/store optimization allowed in C99?

Related Topics

Leave a reply