Why Do We Need Extern "C"{ #Include ≪Foo.H≫ } in C++

Why do we need extern C{ #include <foo.h> } in C++?

C and C++ are superficially similar, but each compiles into a very different set of code. When you include a header file with a C++ compiler, the compiler is expecting C++ code. If, however, it is a C header, then the compiler expects the data contained in the header file to be compiled to a certain format—the C++ 'ABI', or 'Application Binary Interface', so the linker chokes up. This is preferable to passing C++ data to a function expecting C data.

(To get into the really nitty-gritty, C++'s ABI generally 'mangles' the names of their functions/methods, so calling printf() without flagging the prototype as a C function, the C++ will actually generate code calling _Zprintf, plus extra crap at the end.)

So: use extern "C" {...} when including a c header—it's that simple. Otherwise, you'll have a mismatch in compiled code, and the linker will choke. For most headers, however, you won't even need the extern because most system C headers will already account for the fact that they might be included by C++ code and already extern "C" their code.

What is the effect of extern C in C++?

extern "C" makes a function-name in C++ have C linkage (compiler does not mangle the name) so that client C code can link to (use) your function using a C compatible header file that contains just the declaration of your function. Your function definition is contained in a binary format (that was compiled by your C++ compiler) that the client C linker will then link to using the C name.

Since C++ has overloading of function names and C does not, the C++ compiler cannot just use the function name as a unique id to link to, so it mangles the name by adding information about the arguments. A C compiler does not need to mangle the name since you can not overload function names in C. When you state that a function has extern "C" linkage in C++, the C++ compiler does not add argument/parameter type information to the name used for linkage.

Just so you know, you can specify extern "C" linkage to each individual declaration/definition explicitly or use a block to group a sequence of declarations/definitions to have a certain linkage:

extern "C" void foo(int);
extern "C"
{
void g(char);
int i;
}

If you care about the technicalities, they are listed in section 7.5 of the C++03 standard, here is a brief summary (with emphasis on extern "C"):

  • extern "C" is a linkage-specification
  • Every compiler is required to provide "C" linkage
  • A linkage specification shall occur only in namespace scope
  • All function types, function names and variable names have a language linkage See Richard's Comment: Only function names and variable names with external linkage have a language linkage
  • Two function types with distinct language linkages are distinct types even if otherwise identical
  • Linkage specs nest, inner one determines the final linkage
  • extern "C" is ignored for class members
  • At most one function with a particular name can have "C" linkage (regardless of namespace)
  • extern "C" forces a function to have external linkage (cannot make it static) See Richard's comment: static inside extern "C" is valid; an entity so declared has internal linkage, and so does not have a language linkage
  • Linkage from C++ to objects defined in other languages and to objects defined in C++ from other languages is implementation-defined and language-dependent. Only where the object layout strategies of two language implementations are similar enough can such linkage be achieved

Advantage of using extern in a header file

Introduction

There are three forms of declaration of concern here:1

extern int x; // Declares x but does not define it.
int x; // Tentative definition of x.
int x = 0; // Defines x.

A declaration makes an identifier (a name, like x) known.

A definition creates an object (such as an int).2 A definition is also a declaration, since it makes the name known.

A tentative definition without a regular definition in the same translation unit (the source file being compiled, with all of its included files) acts like a definition with an initializer of zero.

The way you should use these normally is:

  • For an object you will access by name in multiple files, write exactly one definition of it in one source file. (It can be a tentative definition3 if you would like it to be initialized with zero, or it can be a regular definition with an initializer you choose.)
  • In an associated header file (such as foo.h for the source file foo.c), declare the name, using extern as shown above.
  • Include the header file in each file that uses the name, including its associated source file. (The latter is important; when foo.c includes foo.h, the compiler will see both the declaration and the definition in the same compilation and give you a warning if there is a typo that makes the two declarations incompatible.)

Actually, the way you should use these normally is not to use them at all. Programs generally do not need external identifiers for objects, so you should design the program without them. The rules above are for when you do use them.

Tentative Definitions

In Unix and some other systems, it has been possible to put a tentative definition, int x;, in a header file and include it in multiple source files. Because a tentative definition acts like a definition in the absence of a regular definition, this results in there being multiple definitions in multiple translation units. The C standard does not define the behavior of this. So how does it work in Unix?

Until recently, when you compiled with GCC (as built by default), it created an object file that marked tentatively defined identifiers differently from regularly defined identifiers. The tentatively defined identifiers were marked as “common.” When the linker found multiple definitions of a “common” identifier, it coalesced them into a single definition. Remember, the C standard does not define the behavior. But Unix tools4 did. So you could put int x; in a header and include it in lots of places, and you would get one int x out of it when linking the entire program.

In version 10 and later, GCC does not do this by default. Tentative definitions are, in the absence of regular definitions, treated more like regular definitions, and linking with multiple definitions of the same identifier will result in an error, even if the definitions arose from tentative definitions. GCC has a switch to select the old behavior, -fcommon.

This is information you should know so that you can understand old source files and headers that took advantage of the “common” behavior. It is not needed in new source code, and you should write only non-definition declarations (using extern) in headers and regular definitions in source files.

Miscellaneous

You do not need extern with a function declaration because a function declaration without a body (the compound statement that contains the code for the function) is automatically a declaration and behaves the same as if it had extern. Functions do not have tentative definitions.

Footnote

1 This answer addresses only external declarations and external definitions for identifiers of objects, with external linkage. The full rules for C declarations are somewhat complicated, partly due to the history of C’s evolution.

2 This is for definitions of identifiers that refer to objects. For other kinds of identifiers, what is a definition may be different. For example, typedef int foo is said to define foo as an alias for the type int, but no object is created.

3 It may be preferable to also include an initializer, even if it is zero, as this will make it a regular definition and avoid a potential problem where the same name is used tentative definitions in two different source files for two different things, resulting in the linker not complaining even though this is an error.

4 I may be being sloppy with terminology here; somebody else could identify precisely where this behavior was specified and what tools it applied to.

When to use extern C in C++?

extern "C" makes names not mangled.

It used when:

  1. We need to use some C library in C++

    extern "C" int foo(int);
  2. We need export some C++ code to C

    extern "C" int foo(int) { something; }
  3. We need an ability to resolve symbol in shared library -- so we need to get rid mangling

    extern "C" int foo(int) { something; }
    ///
    typedef int (*foo_type)(int);
    foo_type f = (foo_type)dlsym(handle,"foo")

Is extern C only required on the function declaration?

The 'extern "C"' should not be required on the function defintion as long as the declaration has it and is already seen in the compilation of the definition. The standard specifically states (7.5/5 Linkage specifications):

A function can be declared without a linkage specification after an explicit linkage specification has been seen; the linkage explicitly specified in the earlier declaration is not affected by such a function declaration.

However, I generally do put the 'extern "C"' on the definition as well, because it is in fact a function with extern "C" linkage. A lot of people hate when unnecessary, redundant stuff is on declarations (like putting virtual on method overrides), but I'm not one of them.

Exact meaning of extern in C and header file name equivalence to code file name

  1. The C language does not care what you name your source and header files. You can use any names your compiler will accept, and put any function in any .c file you wish. Some other tools may care, but the language does not. Indeed, the language does not care if you name your source file bar.source instead of foo.c (but, again, your compiler may).

  2. extern tells the compiler that the variable is not defined in this compilation unit (the .c file plus all headers it includes), but somewhere else. You pretty much only need to use it when referring to a global variable defined in some other compilation unit. You can also use it with functions, but it is implicit, so not needed.

  3. The syntax you show is the really old syntax for defining functions. It was used before the first C standard, until the late 1980s. Don't use it anymore. The rules for how argument types are handled are archaic and unnecessarily complex, and using new-style function declarations and definitions make all the bad things go away.

Your example would be better written as:

int function(int param1, int *param2, char param3)
{
function body
}

Only problem is that the old-style functions can't pass a char as an argument, so the last parameter should really be int param3 instead.



Related Topics



Leave a reply



Submit