"Inline" Keyword VS "Inlining" Concept

inline keyword vs inlining concept

I wasn't sure about your claim:

Smaller functions are automatically "inlined" by optimizer irrespective of inline is mentioned or not...
It's quite clear that the user doesn't have any control over function "inlining" with the use of keyword inline.

I've heard that compilers are free to ignore your inline request, but I didn't think they disregarded it completely.

I looked through the Github repository for Clang and LLVM to find out. (Thanks, open source software!) I found out that The inline keyword does make Clang/LLVM more likely to inline a function.

The Search

Searching for the word inline in the Clang repository leads to the token specifier kw_inline. It looks like Clang uses a clever macro-based system to build the lexer and other keyword-related functions, so there's noting direct like if (tokenString == "inline") return kw_inline to be found. But Here in ParseDecl.cpp, we see that kw_inline results in a call to DeclSpec::setFunctionSpecInline().

case tok::kw_inline:
isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
break;

Inside that function, we set a bit and emit a warning if it's a duplicate inline:

if (FS_inline_specified) {
DiagID = diag::warn_duplicate_declspec;
PrevSpec = "inline";
return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;

Searching for FS_inline_specified elsewhere, we see it's a single bit in a bitfield, and it's used in a getter function, isInlineSpecified():

bool isInlineSpecified() const {
return FS_inline_specified | FS_forceinline_specified;
}

Searching for call sites of isInlineSpecified(), we find the codegen, where we convert the C++ parse tree into LLVM intermediate representation:

if (!CGM.getCodeGenOpts().NoInline) {
for (auto RI : FD->redecls())
if (RI->isInlineSpecified()) {
Fn->addFnAttr(llvm::Attribute::InlineHint);
break;
}
} else if (!FD->hasAttr<AlwaysInlineAttr>())
Fn->addFnAttr(llvm::Attribute::NoInline);

Clang to LLVM

We are done with the C++ parsing stage. Now our inline specifier is converted to an attribute of the language-neutral LLVM Function object. We switch from Clang to the LLVM repository.

Searching for llvm::Attribute::InlineHint yields the method Inliner::getInlineThreshold(CallSite CS) (with a scary-looking braceless if block):

// Listen to the inlinehint attribute when it would increase the threshold
// and the caller does not need to minimize its size.
Function *Callee = CS.getCalledFunction();
bool InlineHint = Callee && !Callee->isDeclaration() &&
Callee->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::InlineHint);
if (InlineHint && HintThreshold > thres
&& !Caller->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::MinSize))
thres = HintThreshold;

So we already have a baseline inlining threshold from the optimization level and other factors, but if it's lower than the global HintThreshold, we bump it up. (HintThreshold is settable from the command line.)

getInlineThreshold() appears to have only one call site, a member of SimpleInliner:

InlineCost getInlineCost(CallSite CS) override {
return ICA->getInlineCost(CS, getInlineThreshold(CS));
}

It calls a virtual method, also named getInlineCost, on its member pointer to an instance of InlineCostAnalysis.

Searching for ::getInlineCost() to find the versions that are class members, we find one that's a member of AlwaysInline - which is a non-standard but widely supported compiler feature - and another that's a member of InlineCostAnalysis. It uses its Threshold parameter here:

CallAnalyzer CA(Callee->getDataLayout(), *TTI, AT, *Callee, Threshold);
bool ShouldInline = CA.analyzeCall(CS);

CallAnalyzer::analyzeCall() is over 200 lines and does the real nitty gritty work of deciding if the function is inlineable. It weighs many factors, but as we read through the method we see that all its computations either manipulate the Threshold or the Cost. And at the end:

return Cost < Threshold;

But the return value named ShouldInline is really a misnomer. In fact the main purpose of analyzeCall() is to set the Cost and Threshold member variables on the CallAnalyzer object. The return value only indicates the case when some other factor has overridden the cost-vs-threshold analysis, as we see here:

// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())
return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())
return InlineCost::getAlways();

Otherwise, we return an object that stores the Cost and Threshold.

return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());

So we're not returning a yes-or-no decision in most cases. The search continues! Where is this return value of getInlineCost() used?

The Real Decision

It's found in bool Inliner::shouldInline(CallSite CS). Another big function. It calls getInlineCost() right at the beginning.

It turns out that getInlineCost analyzes the intrinsic cost of inlining the function - its argument signature, code length, recursion, branching, linkage, etc. - and some aggregate information about every place the function is used. On the other hand, shouldInline() combines this information with more data about a specific place where the function is used.

Throughout the method there are calls to InlineCost::costDelta() - which will use the InlineCosts Threshold value as computed by analyzeCall(). Finally, we return a bool. The decision is made. In Inliner::runOnSCC():

if (!shouldInline(CS)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}

// Attempt to inline the function.
if (!InlineCallIfPossible(CS, InlineInfo, InlinedArrayAllocas,
InlineHistoryID, InsertLifetime, DL)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
++NumInlined;

InlineCallIfPossible() does the inlining based on shouldInline()'s decision.

So the Threshold was affected by the inline keyword, and is used in the end to decide whether to inline.

Therefore, your Perception B is partly wrong because at least one major compiler changes its optimization behavior based on the inline keyword.

However, we can also see that inline is only a hint, and other factors may outweigh it.

When to use inline function and when not to use it?

Avoiding the cost of a function call is only half the story.

do:

  • use inline instead of #define
  • very small functions are good candidates for inline: faster code and smaller executables (more chances to stay in the code cache)
  • the function is small and called very often

don't:

  • large functions: leads to larger executables, which significantly impairs performance regardless of the faster execution that results from the calling overhead
  • inline functions that are I/O bound
  • the function is seldom used
  • constructors and destructors: even when empty, the compiler generates code for them
  • breaking binary compatibility when developing libraries:

    • inline an existing function
    • change an inline function or make an inline function non-inline: prior version of the library call the old implementation

when developing a library, in order to make a class extensible in the future you should:

  • add non-inline virtual destructor even if the body is empty
  • make all constructors non-inline
  • write non-inline implementations of the copy constructor and assignment operator unless the class cannot be copied by value

Remember that the inline keyword is a hint to the compiler: the compiler may decide not to inline a function and it can decide to inline functions that were not marked inline in the first place. I generally avoid marking function inline (apart maybe when writing very very small functions).

About performance, the wise approach is (as always) to profile the application, then eventually inline a set of functions representing a bottleneck.

References:

  • To Inline or Not To Inline
  • [9] Inline functions
  • Policies/Binary Compatibility Issues With C++
  • GotW #33: Inline
  • Inline Redux
  • Effective C++ - Item 33: Use inlining judiciously

EDIT: Bjarne Stroustrup, The C++ Programming Language:

A function can be defined to be inline. For example:

inline int fac(int n)
{
return (n < 2) ? 1 : n * fac(n-1);
}

The inline specifier is a hint to the compiler that it should attempt to generate code for a call of fac() inline rather than laying down the code for the function once and then calling through the usual function call mechanism. A clever compiler can generate the constant 720 for a call fac(6). The possibility of mutually recursive inline functions, inline functions that recurse or not depending on input, etc., makes it impossible to guarantee that every call of an inline function is actually inlined. The degree of cleverness of a compiler cannot be legislated, so one compiler might generate 720, another 6 * fac(5), and yet another an un-inlined call fac(6).

To make inlining possible in the absence of unusually clever compilation and linking facilities, the definition–and not just the declaration–of an inline function must be in scope (§9.2). An inline especifier does not affect the semantics of a function. In particular, an inline function still has a unique address and so has static variables (§7.1.2) of an inline function.

EDIT2: ISO-IEC 14882-1998, 7.1.2 Function specifiers

A function declaration (8.3.5, 9.3, 11.4) with an inline specifier declares an inline function. The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call; however, even if this inline substitution is omitted, the other rules for inline functions defined by 7.1.2 shall still be respected.

When should I write the keyword 'inline' for a function/method?

Oh man, one of my pet peeves.

inline is more like static or extern than a directive telling the compiler to inline your functions. extern, static, inline are linkage directives, used almost exclusively by the linker, not the compiler.

It is said that inline hints to the compiler that you think the function should be inlined. That may have been true in 1998, but a decade later the compiler needs no such hints. Not to mention humans are usually wrong when it comes to optimizing code, so most compilers flat out ignore the 'hint'.

  • static - the variable/function name cannot be used in other translation units. Linker needs to make sure it doesn't accidentally use a statically defined variable/function from another translation unit.

  • extern - use this variable/function name in this translation unit but don't complain if it isn't defined. The linker will sort it out and make sure all the code that tried to use some extern symbol has its address.

  • inline - this function will be defined in multiple translation units, don't worry about it. The linker needs to make sure all translation units use a single instance of the variable/function.

Note: Generally, declaring templates inline is pointless, as they have the linkage semantics of inline already. However, explicit specialization and instantiation of templates require inline to be used.


Specific answers to your questions:

  • When should I write the keyword 'inline' for a function/method in C++?

    Only when you want the function to be defined in a header. More exactly only when the function's definition can show up in multiple translation units. It's a good idea to define small (as in one liner) functions in the header file as it gives the compiler more information to work with while optimizing your code. It also increases compilation time.

  • When should I not write the keyword 'inline' for a function/method in C++?

    Don't add inline just because you think your code will run faster if the compiler inlines it.

  • When will the compiler not know when to make a function/method 'inline'?

    Generally, the compiler will be able to do this better than you. However, the compiler doesn't have the option to inline code if it doesn't have the function definition. In maximally optimized code usually all private methods are inlined whether you ask for it or not.

    As an aside to prevent inlining in GCC, use __attribute__(( noinline )), and in Visual Studio, use __declspec(noinline).

  • Does it matter if an application is multithreaded when one writes 'inline' for a function/method?

    Multithreading doesn't affect inlining in any way.

What is the use of the `inline` keyword in C?

Note: when I talk about .c files and .h files in this answer, I assume you have laid out your code correctly, i.e. .c files only include .h files. The distinction is that a .h file may be included in multiple translation units.

static inline void f(void) {} has no practical difference with static void f(void) {}.

In ISO C, this is correct. They are identical in behaviour (assuming you don't re-declare them differently in the same TU of course!) the only practical effect may be to cause the compiler to optimize differently.

inline void f(void) {} in C doesn't work as the C++ way. How does it work in C? What actually does extern inline void f(void); do?

This is explained by this answer and also this thread.

In ISO C and C++, you can freely use inline void f(void) {} in header files -- although for different reasons!

In ISO C, it does not provide an external definition at all. In ISO C++ it does provide an external definition; however C++ has an additional rule (which C doesn't), that if there are multiple external definitions of an inline function, then the compiler sorts it out and picks one of them.

extern inline void f(void); in a .c file in ISO C is meant to be paired with the use of inline void f(void) {} in header files. It causes the external definition of the function to be emitted in that translation unit. If you don't do this then there is no external definition, and so you may get a link error (it is unspecified whether any particular call of f links to the external definition or not).

In other words, in ISO C you can manually select where the external definition goes; or suppress external definition entirely by using static inline everywhere; but in ISO C++ the compiler chooses if and where an external definition would go.

In GNU C, things are different (more on this below).

To complicate things further, GNU C++ allows you to write static inline an extern inline in C++ code... I wouldn't like to guess on what that does exactly

I never really found a use of the inline keyword in my C programs, and when I see this keyword in other people's code, it's almost always static inline

Many coders don't know what they're doing and just put together something that appears to work. Another factor here is that the code you're looking at might have been written for GNU C, not ISO C.

In GNU C, plain inline behaves differently to ISO C. It actually emits an externally visible definition, so having a .h file with a plain inline function included from two translation units causes undefined behaviour.

So if the coder wants to supply the inline optimization hint in GNU C, then static inline is required. Since static inline works in both ISO C and GNU C, it's natural that people ended up settling for that and seeing that it appeared to work without giving errors.

, in which I see no difference with just static.

The difference is just in the intent to provide a speed-over-size optimization hint to the compiler. With modern compilers this is superfluous.

Why are C++ inline functions in the header?

The definition of an inline function doesn't have to be in a header file but, because of the one definition rule (ODR) for inline functions, an identical definition for the function must exist in every translation unit that uses it.

The easiest way to achieve this is by putting the definition in a header file.

If you want to put the definition of a function in a single source file then you shouldn't declare it inline. A function not declared inline does not mean that the compiler cannot inline the function.

Whether you should declare a function inline or not is usually a choice that you should make based on which version of the one definition rules it makes most sense for you to follow; adding inline and then being restricted by the subsequent constraints makes little sense.

Is there a way to separate the two meanings of the inline keyword (ODR relaxation vs. function code inlining)

But __declspec(noinline) isn't portable.

You can make it portable to all implementations that have an analogous attribute by using a platform detection macro. GCC and Clang have __attribute__((noinline)).


Another approach is to simply not care. The compiler still has the option to ignore the preference that it perceives to have been implied. If the inline expansion would be expensive (because the function is big), a smart compiler should refrain from expanding it.

When is the inline keyword effective in C?

It has a semantic effect. To simplify, a function marked inline may be defined multiple times in one program — though all definitions must be equivalent to each other — so presence of inline is required for correctness when including the function definition in headers (which is, in turn, makes the definition visible so the compiler can inline it without LTO).

Other than that, for inlining-the-optimization, "never" is a perfectly safe approximation. It probably has some effect in some compilers, but nothing worth losing sleep over, especially not without actual hard data. For example, in the following code, using Clang 3.0 or GCC 4.7, main contains the same code whether work is marked inline or not. The only difference is whether work remains as stand-alone function for other translation units to link to, or is removed.

void work(double *a, double *b) {
if (*b > *a) *a = *b;
}

void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
//if (y[i] > x[i]) x[i] = y[i];
work(x+i, y+i);
}
}


Related Topics



Leave a reply



Submit