Benefits of Inline Functions in C++

Benefits of inline functions in C++?

Inline functions are faster because you don't need to push and pop things on/off the stack like parameters and the return address; however, it does make your binary slightly larger.

Does it make a significant difference? Not noticeably enough on modern hardware for most. But it can make a difference, which is enough for some people.

Marking something inline does not give you a guarantee that it will be inline. It's just a suggestion to the compiler. Sometimes it's not possible such as when you have a virtual function, or when there is recursion involved. And sometimes the compiler just chooses not to use it.

I could see a situation like this making a detectable difference:

inline int aplusb_pow2(int a, int b) {
  return (a + b)*(a + b) ;
}

for(int a = 0; a < 900000; ++a)
    for(int b = 0; b < 900000; ++b)
        aplusb_pow2(a, b);

Benefits of declaring a function as inline?

There are two reasons to use the inline keyword. One is an optimization hint, and you can safely ignore it; your compiler is like to ignore it too. The other reason is to allow a function to exist in multiple translation units, and that usage is strictly necessary. If you put a function into a .h header file for example, you'd better declare it inline.

__inline functions vs normal functions in C

Inlining a function can have several advantages:

It can make the program size smaller. This is typically the case when a function is only used once. Also, see 2. and 3.
The compiler can eliminate unused bits of the function, if the compiler knows that a variable is constant, or non-NULL, or something like that. This can save size, but also makes the code more efficient at run time.
The compiler can eliminate bits of the calling function, or even other inlined functions because it can see what the function does with/to the data. (Say the code checks the return value and calls an error function if it's NULL, it might be able to rule that out.
It can reduce the call overhead, but in current processors with predictive branching that's not as much of a win as you might think.
It can hoist constant bit out of loops, do common subexpression elimination, and many other optimizations to make looping code more efficient, and such like.

And then there's the disadvantages:

It can make the code larger, obviously.
It can increase register pressure within the calling function which might confuse the compiler and prevent it optimizing as well.
Having one hot function that can live in the CPU cache can be quicker than duplicating it to many places that are not always cached.
It can hamper debugging.

The reason that the inline function is just a hint is mostly because the C standard does not require that the compiler optimize anything at all. If it wasn't a hint then optimization wouldn't be optional. Also, just because the function isn't marked inline doesn't stop the compiler inlining it if it calculates that doing so would be advantageous.

Inline function(C++) is efficient, why don't we define every function as inline function? [duplicate]

Apart from the call overhead I would mention that pasting the code allows the compiler to make further optimization at call site.

There are few cases in which is impossible to inline:

Procedures linked from shared objects
callback function that are invoked using function pointers
recursive function (non tail recursive) i.e. (function call don't need to hang waiting for the recursive call to return)(can be easily and automatically converted to an iterative form)

Inlining also affects dimension of the executable that end up in more disk usage and longer load time.

When to use static inline instead of regular functions

Inlining is done for optimization. However, a little known fact is that inline can also hurt performance: Your CPU has an instruction cache with a fixed size, and inlining has the downside of replicating the function at several places, which makes the instruction cache less efficient.

So, from a performance point of view, it's generally not advisable to declare functions inline unless they are so short that their call is more expensive than their execution.

To put this in relation: a function call takes somewhere between 10 to 30 cycles of CPU time (depending on the amount of arguments). Arithmetic operations generally take a single cycle, however, memory loads from first level cache takes something like three to four cycles. So, if your function is more complex than a simple sequence of at most three memory accesses and some arithmetic, there is little point in inlining it.

I usually take this approach:

If a function is as simple as incrementing a single counter, and if it is used all over the place, I inline it. Examples of this are rare, but one valid case is reference counting.
If a function is used only within a single file, I declare it as static, not inline. This has the effect that the compiler can see when such a function is used precisely one time. And if it sees that, it will very likely inline it, no matter how complex it is, since it can prove that there is no downside of inlining.
All other functions are neither static nor inline.

The example in your question is a borderline example: It contains a function call, thus it seems to be too complex for inlining at first sight.

However, the memcpy() function is special: it is seen more as a part of the language than as a library function. Most compilers will inline it, and optimize it heavily when the size is a small compile time constant, which is the case in the code in question.

With that optimization, the function is indeed reduced to a short, simple sequence. I cannot say whether it touches a lot of memory because I don't know the structure that is copied. If that structure is small, adding the inline keyword seems to be a good idea in this case.

Clarification over internal linkage of inline functions in C

TL;DR: GCC still defaults to its old semantics of inline, in which an inline function is still compiled as an externally visible entity. Specifying -std=c99 or -std=c11 will cause GCC to implement the standard semantics; however, the IBM compiler does not conform to the standard either. So linking will still fail, but with a different error.

Since C99, a function declaration with no declared linkage does not generate a function object. The inline definition will only be used with inline substitution, and the compiler is not obliged to perform this optimisation. It is expected that an external definition of the function exists in some other translation unit, and such a definition must exist if the function object is used, either by taking its address or by being called in a context where the compiler chooses not to perform the inline substitution.

If the inline function is declared with either static or extern, then a function object is compiled, with the indicated linkage, thereby satisfying the requirement that the function object be defined.

Prior to C99, inline was not part of the C standard, but many compilers -- particularly GCC -- implemented it as an extension. In the case of GCC, however, the semantics of inline differed slightly from the above exposition.

In C99 (and more recent), an inline function with no linkage specification is only an inline definition ("An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit." §6.7.4p7). But in the GCC extension, an inline function with no linkage specification was given external linkage (just like a non-inline function declaration). GCC then special-cased extern inline to mean "do not generate a function object", which is effectively the same as standard C99's handling of an inline function with neither extern nor static modifiers. See the GCC manual, particularly the last section.

This is only still important because GCC still defaults to using its original inline semantics unless you specify that it should conform to some C standard (using, for example, -std=c11) or disable the GNU inline semantics using -fno-gnu89-inline.

The example code, which I understand is taken from the IBM i7.1 compiler documentation, does not correctly reflect any C standard. The two definitions of foo as inline functions do not generate any actual function named foo, so the use of &foo must refer to some externally-defined foo, and there isn't one in the program. GCC will report this issue if you tell it to use C11/C99 semantics:

$ gcc -std=c99 a.c b.c
/tmp/ccUKlp5g.o: In function `g':
a.c:(.text+0xa): undefined reference to `foo'
a.c:(.text+0x13): undefined reference to `foo'
/tmp/cc2hv17O.o: In function `main':
b.c:(.text+0xa): undefined reference to `foo'
b.c:(.text+0x13): undefined reference to `foo'
collect2: error: ld returned 1 exit status

By contrast, if you ask for Gnu inline semantics, both translation units will define foo, and the linker will complain about a duplicate definition:

$ gcc -std=c99 -fgnu89-inline a.c b.c
/tmp/ccAHHqOI.o: In function `foo':
b.c:(.text+0x0): multiple definition of `foo'
/tmp/ccPyQrTO.o:a.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

Also note that GCC does not inline any function by default. You must provide some optimization option in order to enable function inlining. If you do so, and you remove the use of the address operator, you can get the program to compile:

$ cat a2.c
#include <stdio.h>
inline int foo() { return 3; }
void g() {
    printf("foo called from g: return value = %d\n", foo());
}
$ cat b2.c
#include <stdio.h>
inline int foo() { return 4; }
void g();
int main() {
    printf("foo called from main: return value = %d\n", foo());
    g();
    return 0;
}

$ # With no optimisation, an external definition is still needed:
$ gcc -std=c11 a2.c b2.c
/tmp/cccJV9J6.o: In function `g':
a2.c:(.text+0xa): undefined reference to `foo'
/tmp/cct5NcjY.o: In function `main':
b2.c:(.text+0xa): undefined reference to `foo'
collect2: error: ld returned 1 exit status

$ # With inlining enabled, the program works as (possibly) expected:
$ gcc -std=c11 -O a2.c b2.c
$ gcc -std=c11 -O1 a2.c b2.c
$ ./a.out
foo called from main: return value = 4
foo called from g: return value = 3

As indicated by the IBM documentation, the rules for C++ are distinct. This program is not valid C++ because the definitions of foo in the two translation units differ, but the compiler is not obliged to detect this error and the usual Undefined Behaviour rules apply (i.e., the standard doesn't define what will be printed). As it happens, GCC seems to show the same results as i7.1:

$ gcc -std=c++14 -x c++ a.c b.c
$ ./a.out
foo called from main: return value = 3, address = 0x55cd03df5670
foo called from g: return value = 3, address = 0x55cd03df5670

Benefits of Inline Functions in C++