What Are the Rules About Using an Underscore in a C++ Identifier

What are the rules about using an underscore in a C++ identifier?

The rules (which did not change in C++11):

  • Reserved in any scope, including for use as implementation macros:

    • identifiers beginning with an underscore followed immediately by an uppercase letter
    • identifiers containing adjacent underscores (or "double underscore")
  • Reserved in the global namespace:

    • identifiers beginning with an underscore
  • Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)

From the 2003 C++ Standard:

17.4.3.1.2 Global names [lib.global.names]


Certain sets of names and function signatures are always reserved to the implementation:

  • Each name that contains a double underscore (__) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
  • Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.165

165) Such names are also reserved in namespace ::std (17.4.3.1).

Because C++ is based on the C standard (1.1/2, C++03) and C99 is a normative reference (1.2/1, C++03) these also apply, from the 1999 C Standard:

7.1.3 Reserved identifiers


Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

  • All identifiers that begin with an underscore and either an uppercase letter or another
    underscore are always reserved for any use.
  • All identifiers that begin with an underscore are always reserved for use as identifiers
    with file scope in both the ordinary and tag name spaces.
  • Each macro name in any of the following subclauses (including the future library
    directions) is reserved for use as specified if any of its associated headers is included;
    unless explicitly stated otherwise (see 7.1.4).
  • All identifiers with external linkage in any of the following subclauses (including the
    future library directions) are always reserved for use as identifiers with external
    linkage.154
  • Each identifier with file scope listed in any of the following subclauses (including the
    future library directions) is reserved for use as a macro name and as an identifier with
    file scope in the same name space if any of its associated headers is included.

No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.

If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.

154) The list of reserved identifiers with external linkage includes errno, math_errhandling, setjmp, and va_end.

Other restrictions might apply. For example, the POSIX standard reserves a lot of identifiers that are likely to show up in normal code:

  • Names beginning with a capital E followed a digit or uppercase letter:

    • may be used for additional error code names.
  • Names that begin with either is or to followed by a lowercase letter

    • may be used for additional character testing and conversion functions.
  • Names that begin with LC_ followed by an uppercase letter

    • may be used for additional macros specifying locale attributes.
  • Names of all existing mathematics functions suffixed with f or l are reserved

    • for corresponding functions that operate on float and long double arguments, respectively.
  • Names that begin with SIG followed by an uppercase letter are reserved

    • for additional signal names.
  • Names that begin with SIG_ followed by an uppercase letter are reserved

    • for additional signal actions.
  • Names beginning with str, mem, or wcs followed by a lowercase letter are reserved

    • for additional string and array functions.
  • Names beginning with PRI or SCN followed by any lowercase letter or X are reserved

    • for additional format specifier macros
  • Names that end with _t are reserved

    • for additional type names.

While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of that standard.


Personally I just don't start identifiers with underscores. New addition to my rule: Don't use double underscores anywhere, which is easy as I rarely use underscore.

After doing research on this article I no longer end my identifiers with _t
as this is reserved by the POSIX standard.

The rule about any identifier ending with _t surprised me a lot. I think that is a POSIX standard (not sure yet) looking for clarification and official chapter and verse. This is from the GNU libtool manual, listing reserved names.

CesarB provided the following link to the POSIX 2004 reserved symbols and notes 'that many other reserved prefixes and suffixes ... can be found there'. The
POSIX 2008 reserved symbols are defined here. The restrictions are somewhat more nuanced than those above.

What are the rules about using an underscore in a C identifier?

Good-enough rule of thumb

Don't start your identifier with an underscore.

That's it. You might still have a conflict with some file-specific definitions (see below), but those will just get you an error message which you can take care of.

Safe, slightly restrictive, rule of thumb

Don't start your identifier with:

  • An underscore.
  • Any 1-3 letter prefix, followed by an underscore, which isn't a proper word (e.g. a_, st_)
  • memory_ or atomic_.

and don't end your identifier with either _MIN or _MAX.

This rule cover more than is actually reserved, but are easier to remember.

More detailed rules

This is based on the C2x standard draft (and thus covers previous standards' reservations) and the glibc documentation.

Don't use:

  • The prefix __ (two underscores).
  • A prefix of one underscore followed by a capital letter (e.g. _D).
  • For identifiers visible at file scope - the prefix _.
  • The following prefixes with underscores, when followed by a lowercase letter: atomic_, memory_, memory_order_, cnd_, mtx_, thrd_, tss_
  • The following prefixes with underscores, when followed by an uppercase ltter : LC_, SIG_, ATOMIC, TIME_
  • The suffix _t (that's a POSIX restriction; for C proper, you can use this suffix unless your identifier begins with int or uint)

Additional restrictions are per-library-header-file rather than universal (some of these are POSIX restrictions):















































If you use header file...You can't use identifiers with ...
dirent.hPrefix d_
fcntl.hPrefixes l_, F_, O_, and S_
grp.hPrefix gr_
limits.hSuffix _MAX (also probably _MIN)
pwd.hPrefix pw_
signal.hPrefixes sa_ and SA_
sys/stat.hPrefixes st_ and S_
sys/times.hPrefix tms_
termios.hPrefix c_

use _ and __ in C programs [duplicate]

Here's what the C standard says (section 7.1.3):

  • All identifiers that begin with an underscore and either an uppercase letter or another
    underscore are always reserved for any use.
  • All identifiers that begin with an underscore are always reserved for use as identifiers
    with file scope in both the ordinary and tag name spaces.

(The section goes on to list specific identifiers and sets of identifiers reserved by certain standard headers.)

What this means is that for example, the implementation (either the compiler or a standard header) can use the name __FOO for anything it likes. If you define that identifier in your own code, your program's behavior is undefined. If you're "lucky", you'll be using an implementation that doesn't happen to define it, and your program will work as expected.

This means you simply should not define any such identifiers in your own code (unless your own code is part of a C implementation -- and if you have to ask, it isn't). There's no need to define such identifiers anyway; there's hardly any shortage of unreserved identifiers.

You can use an identifier like _foo as long as it's defined locally (not at file scope) -- but personally I find it much easier just to avoid using leading underscores at all.

Incidentally, your example of _sqrt doesn't necessarily illustrate the point. An implementation may define the name _sqrt in <math.h> (since anything defined there is at file scope), but there's no particular reason to expect that it will do so. When I compile your program, I get a warning:

c.c:7:1: warning: implicit declaration of function ‘_sqrt’ [-Wimplicit-function-declaration]

because <math.h> on my system doesn't define that identifier, and a link-time fatal error:

/tmp/cc1ixRmL.o: In function `main':
c.c:(.text+0x1a): undefined reference to `_sqrt'

because there's no such symbol in the library.

What does double underscore ( __const) mean in C?

In C, symbols starting with an underscore followed by either an upper-case letter or another underscore are reserved for the implementation. You as a user of C should not create any symbols that start with the reserved sequences. In C++, the restriction is more stringent; you the user may not create a symbol containing a double-underscore.

Given:

extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;

The __const notation is there to allow for the possibility (somewhat unlikely) that a compiler that this code is used with supports prototype notations but does not have a correct understanding of the C89 standard keyword const. The autoconf macros can still check whether the compiler has working support for const; this code could be used with a broken compiler that does not have that support.

The use of __hostname and __addr is a protection measure for you, the user of the header. If you compile with GCC and the -Wshadow option, the compiler will warn you when any local variables shadow a global variable. If the function used just hostname instead of __hostname, and if you had a function called hostname(), there'd be a shadowing. By using names reserved to the implementation, there is no conflict with your legitimate code.

The use of __THROW means that the code can, under some circumstances, be declared with some sort of 'throw specification'. This is not standard C; it is more like C++. But the code can be used with a C compiler as long as one of the headers (or the compiler itself) defines __THROW to empty, or to some compiler-specific extension of the standard C syntax.


Section 7.1.3 of the C standard (ISO 9899:1999) says:

7.1.3 Reserved identifiers


Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any use or for use as file
scope identifiers.

— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.

— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.

— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).

— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.154)

— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.

No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.

If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.

Footnote 154) The list of reserved identifiers with external linkage includes errno, math_errhandling,
setjmp, and va_end.


See also What are the rules about using an underscore in a C++ identifier; a lot of the same rules apply to both C and C++, though the embedded double-underscore rule is in C++ only, as mentioned at the top of this answer.


C99 Rationale

The C99 Rationale says:

7.1.3 Reserved identifiers

To give implementors maximum latitude in packing library functions into files, all external
identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has
the same specification.
Thus, for instance, strtod may be defined in the same object module as printf, with no fear that link-time conflicts will occur. Equally, strtod may call printf, or printf may call strtod, for whatever reason, with no fear that the wrong function will be called.

Also reserved for the implementor are all external identifiers beginning with an underscore, and all other identifiers beginning with an underscore followed by a capital letter or an underscore. This gives a name space for writing the numerous behind-the-scenes non-external macros and functions a library needs to do its job properly.

With these exceptions, the Standard assures the programmer that all other identifiers are available, with no fear of unexpected collisions when moving programs from one
implementation to another5. Note, in particular, that part of the name space of internal identifiers beginning with underscore is available to the user: translator implementors have not been the only ones to find use for “hidden” names. C is such a portable language in many respects that the issue of “name space pollution” has been and is one of the principal barriers to writing completely portable code. Therefore the Standard assures that macro and typedef names are reserved only if the associated header is explicitly included.

5 See §6.2.1 for a discussion of some of the precautions an implementor should take to keep this promise. Note also that any implementation-defined member names in structures defined in <time.h> and <locale.h> must begin with an underscore, rather than following the pattern of other names in those structures.

And the relevant part of the rationale for §6.2.1 Scopes of identifiers is:

Although the scope of an identifier in a function prototype begins at its declaration and ends at the end of that function’s declarator, this scope is ignored by the preprocessor. Thus an identifier
in a prototype having the same name as that of an existing macro is treated as an invocation of that macro. For example:

    #define status 23
void exit(int status);

generates an error, since the prototype after preprocessing becomes

   void exit(int 23);

Perhaps more surprising is what happens if status is defined

   #define status []

Then the resulting prototype is

   void exit(int []);

which is syntactically correct but semantically quite different from the intent.

To protect an implementation’s header prototypes from such misinterpretation, the implementor must write them to avoid these surprises. Possible solutions include not using identifiers in prototypes, or using names in the reserved name space (such as __status or _Status).

See also P J Plauger The Standard C Library (1992) for an extensive discussion of name space rules and library implementations. The book refers to C90 rather than any later version of the standard, but most of the implementation advice in it remains valid to this day.

What is the reason for underscore in C variable name definition?

Maybe this helps, from C99, 7.1.3 ("Reserved Identifiers"):

  • All identifiers that begin with an underscore and either an uppercase letter or another
    underscore are always reserved for any use.

  • All identifiers that begin with an underscore are always reserved for use as identifiers
    with file scope in both the ordinary and tag name spaces.

Moral: For ordinary user code, it's probably best not to start identifiers with an underscore.

(On a related note, I think you should also stay clear from naming types with a trailing _t, which is reserved for standard types.)

Is _ (single underscore) a valid C++ variable name?

Yes, from The C++ Programming Language, 4th Edition:

A name (identifier) consists of a sequence of letters and digits. The
first character must be a letter. The underscore character, _, is
considered a letter.

What's the meaning of reserved for any use?

In the C standard, the meaning of the term "reserved" is defined by 7.1.3p2, immediately below the bullet list you are quoting:

No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

Emphasis mine: reserved identifiers place a restriction on the program, not the implementation. Thus, the common interpretation – reserved identifiers may be used by the implementation to any purpose – is correct for C.

I have not kept up with the C++ standard and no longer feel qualified to interpret it.

Why do some functions/variables have the character _ in front of them , in C++ ? [duplicate]

Usually, when you see a name with leading underscore, it either

  • belongs to the (C++) implementation, or

  • has been chosen by someone unaware of the first possibility.

It's not advisable to use names with leading underscore in user code.

Any name starting with underscore is reserved to the implementation in the global namespace, and any name starting with leading underscore followed by uppercase, is reserved to the implementation anywhere.

As I recall also any name with two successive underscores, is reserved.


A novice programmer may use leading underscore to indicate “data member”.

The usual convention, for those aware of the above, is instead a trailing underscore, and/or a prefix like m or my.

E.g. trailing underscore is, as I recall, used in Boost, while an m or my prefix is used (still as I recall) in MFC.

Meaning of reserved for the implementation

Implementation here means the combination of compiler(say gcc, msvc and so on), the standard library (says what features are included in the language), Operating System(Windows, Mac etc) and hardware(Intel,ARM and so on).

Depending upon the implementation, certain values are defined which the compiler uses to produce the object code that is specific to the implementation. For example

__TARGET_ARCH_ARM is defined by RealView #Matches first case
_M_ARM is defined by Visual Studio #Matches second case

to identify the CPU manufacturer.

In short these clauses are meant to discourage you from using macros of mentioned format.

In fact, n3797->17.6.5.3 Restrictions on macro definitions says, if you wish to define macros of the aforementioned formats they are :

suitable for use in #if preprocessing directives, unless explicitly
stated otherwise.

Example :

#ifndef _M_ARM
#define _M_ARM // Say you're compiling for another platform
#endif

Note

Macros, reserved for implementation, are not restricted to the format mentioned in question. For instance __arm__ is defined by gcc to identify the manufacturer.

MISRA C 2012 - Rule 21.1 - Macros starting with underscore

According to C standard (section 7.1.3), all identifiers starting with _[A_Z] or __ are reserved. As they are reserved, common sense and rule 21 forbid you to modify (redefine or undefine) them (or create your own).

Thus, you should change your code to not using leading underscores even in include guards not to mention your macros.

Some further reading can be found e.g. here: Include guard conventions in C



Related Topics



Leave a reply



Submit