inline keyword vs inlining concept
I wasn't sure about your claim:
Smaller functions are automatically "inlined" by optimizer irrespective of inline is mentioned or not...
It's quite clear that the user doesn't have any control over function "inlining" with the use of keywordinline
.
I've heard that compilers are free to ignore your inline
request, but I didn't think they disregarded it completely.
I looked through the Github repository for Clang and LLVM to find out. (Thanks, open source software!) I found out that The inline
keyword does make Clang/LLVM more likely to inline a function.
The Search
Searching for the word inline
in the Clang repository leads to the token specifier kw_inline
. It looks like Clang uses a clever macro-based system to build the lexer and other keyword-related functions, so there's noting direct like if (tokenString == "inline") return kw_inline
to be found. But Here in ParseDecl.cpp, we see that kw_inline
results in a call to DeclSpec::setFunctionSpecInline()
.
case tok::kw_inline:
isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
break;
Inside that function, we set a bit and emit a warning if it's a duplicate inline
:
if (FS_inline_specified) {
DiagID = diag::warn_duplicate_declspec;
PrevSpec = "inline";
return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;
Searching for FS_inline_specified
elsewhere, we see it's a single bit in a bitfield, and it's used in a getter function, isInlineSpecified()
:
bool isInlineSpecified() const {
return FS_inline_specified | FS_forceinline_specified;
}
Searching for call sites of isInlineSpecified()
, we find the codegen, where we convert the C++ parse tree into LLVM intermediate representation:
if (!CGM.getCodeGenOpts().NoInline) {
for (auto RI : FD->redecls())
if (RI->isInlineSpecified()) {
Fn->addFnAttr(llvm::Attribute::InlineHint);
break;
}
} else if (!FD->hasAttr<AlwaysInlineAttr>())
Fn->addFnAttr(llvm::Attribute::NoInline);
Clang to LLVM
We are done with the C++ parsing stage. Now our inline
specifier is converted to an attribute of the language-neutral LLVM Function
object. We switch from Clang to the LLVM repository.
Searching for llvm::Attribute::InlineHint
yields the method Inliner::getInlineThreshold(CallSite CS)
(with a scary-looking braceless if
block):
// Listen to the inlinehint attribute when it would increase the threshold
// and the caller does not need to minimize its size.
Function *Callee = CS.getCalledFunction();
bool InlineHint = Callee && !Callee->isDeclaration() &&
Callee->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::InlineHint);
if (InlineHint && HintThreshold > thres
&& !Caller->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::MinSize))
thres = HintThreshold;
So we already have a baseline inlining threshold from the optimization level and other factors, but if it's lower than the global HintThreshold
, we bump it up. (HintThreshold is settable from the command line.)
getInlineThreshold()
appears to have only one call site, a member of SimpleInliner
:
InlineCost getInlineCost(CallSite CS) override {
return ICA->getInlineCost(CS, getInlineThreshold(CS));
}
It calls a virtual method, also named getInlineCost
, on its member pointer to an instance of InlineCostAnalysis
.
Searching for ::getInlineCost()
to find the versions that are class members, we find one that's a member of AlwaysInline
- which is a non-standard but widely supported compiler feature - and another that's a member of InlineCostAnalysis
. It uses its Threshold
parameter here:
CallAnalyzer CA(Callee->getDataLayout(), *TTI, AT, *Callee, Threshold);
bool ShouldInline = CA.analyzeCall(CS);
CallAnalyzer::analyzeCall()
is over 200 lines and does the real nitty gritty work of deciding if the function is inlineable. It weighs many factors, but as we read through the method we see that all its computations either manipulate the Threshold
or the Cost
. And at the end:
return Cost < Threshold;
But the return value named ShouldInline
is really a misnomer. In fact the main purpose of analyzeCall()
is to set the Cost
and Threshold
member variables on the CallAnalyzer
object. The return value only indicates the case when some other factor has overridden the cost-vs-threshold analysis, as we see here:
// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())
return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())
return InlineCost::getAlways();
Otherwise, we return an object that stores the Cost
and Threshold
.
return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());
So we're not returning a yes-or-no decision in most cases. The search continues! Where is this return value of getInlineCost()
used?
The Real Decision
It's found in bool Inliner::shouldInline(CallSite CS)
. Another big function. It calls getInlineCost()
right at the beginning.
It turns out that getInlineCost
analyzes the intrinsic cost of inlining the function - its argument signature, code length, recursion, branching, linkage, etc. - and some aggregate information about every place the function is used. On the other hand, shouldInline()
combines this information with more data about a specific place where the function is used.
Throughout the method there are calls to InlineCost::costDelta()
- which will use the InlineCost
s Threshold
value as computed by analyzeCall()
. Finally, we return a bool
. The decision is made. In Inliner::runOnSCC()
:
if (!shouldInline(CS)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
// Attempt to inline the function.
if (!InlineCallIfPossible(CS, InlineInfo, InlinedArrayAllocas,
InlineHistoryID, InsertLifetime, DL)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +
" will not be inlined into " +
Caller->getName()));
continue;
}
++NumInlined;
InlineCallIfPossible()
does the inlining based on shouldInline()
's decision.
So the Threshold
was affected by the inline
keyword, and is used in the end to decide whether to inline.
Therefore, your Perception B is partly wrong because at least one major compiler changes its optimization behavior based on the inline
keyword.
However, we can also see that inline
is only a hint, and other factors may outweigh it.
When to use inline function and when not to use it?
Avoiding the cost of a function call is only half the story.
do:
- use
inline
instead of#define
- very small functions are good candidates for
inline
: faster code and smaller executables (more chances to stay in the code cache) - the function is small and called very often
don't:
- large functions: leads to larger executables, which significantly impairs performance regardless of the faster execution that results from the calling overhead
- inline functions that are I/O bound
- the function is seldom used
- constructors and destructors: even when empty, the compiler generates code for them
- breaking binary compatibility when developing libraries:
- inline an existing function
- change an inline function or make an inline function non-inline: prior version of the library call the old implementation
when developing a library, in order to make a class extensible in the future you should:
- add non-inline virtual destructor even if the body is empty
- make all constructors non-inline
- write non-inline implementations of the copy constructor and assignment operator unless the class cannot be copied by value
Remember that the inline
keyword is a hint to the compiler: the compiler may decide not to inline a function and it can decide to inline functions that were not marked inline
in the first place. I generally avoid marking function inline
(apart maybe when writing very very small functions).
About performance, the wise approach is (as always) to profile the application, then eventually inline
a set of functions representing a bottleneck.
References:
- To Inline or Not To Inline
- [9] Inline functions
- Policies/Binary Compatibility Issues With C++
- GotW #33: Inline
- Inline Redux
- Effective C++ - Item 33: Use inlining judiciously
EDIT: Bjarne Stroustrup, The C++ Programming Language:
A function can be defined to be
inline
. For example:
inline int fac(int n)
{
return (n < 2) ? 1 : n * fac(n-1);
}
The
inline
specifier is a hint to the compiler that it should attempt to generate code for a call offac()
inline rather than laying down the code for the function once and then calling through the usual function call mechanism. A clever compiler can generate the constant720
for a callfac(6)
. The possibility of mutually recursive inline functions, inline functions that recurse or not depending on input, etc., makes it impossible to guarantee that every call of aninline
function is actually inlined. The degree of cleverness of a compiler cannot be legislated, so one compiler might generate720
, another6 * fac(5)
, and yet another an un-inlined callfac(6)
.To make inlining possible in the absence of unusually clever compilation and linking facilities, the definition–and not just the declaration–of an inline function must be in scope (§9.2). An
inline
especifier does not affect the semantics of a function. In particular, an inline function still has a unique address and so hasstatic
variables (§7.1.2) of an inline function.
EDIT2: ISO-IEC 14882-1998, 7.1.2 Function specifiers
A function declaration (8.3.5, 9.3, 11.4) with an
inline
specifier declares an inline function. The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call; however, even if this inline substitution is omitted, the other rules for inline functions defined by 7.1.2 shall still be respected.
When should I write the keyword 'inline' for a function/method?
Oh man, one of my pet peeves.
inline
is more like static
or extern
than a directive telling the compiler to inline your functions. extern
, static
, inline
are linkage directives, used almost exclusively by the linker, not the compiler.
It is said that inline
hints to the compiler that you think the function should be inlined. That may have been true in 1998, but a decade later the compiler needs no such hints. Not to mention humans are usually wrong when it comes to optimizing code, so most compilers flat out ignore the 'hint'.
static
- the variable/function name cannot be used in other translation units. Linker needs to make sure it doesn't accidentally use a statically defined variable/function from another translation unit.extern
- use this variable/function name in this translation unit but don't complain if it isn't defined. The linker will sort it out and make sure all the code that tried to use some extern symbol has its address.inline
- this function will be defined in multiple translation units, don't worry about it. The linker needs to make sure all translation units use a single instance of the variable/function.
Note: Generally, declaring templates inline
is pointless, as they have the linkage semantics of inline
already. However, explicit specialization and instantiation of templates require inline
to be used.
Specific answers to your questions:
When should I write the keyword 'inline' for a function/method in C++?
Only when you want the function to be defined in a header. More exactly only when the function's definition can show up in multiple translation units. It's a good idea to define small (as in one liner) functions in the header file as it gives the compiler more information to work with while optimizing your code. It also increases compilation time.
When should I not write the keyword 'inline' for a function/method in C++?
Don't add inline just because you think your code will run faster if the compiler inlines it.
When will the compiler not know when to make a function/method 'inline'?
Generally, the compiler will be able to do this better than you. However, the compiler doesn't have the option to inline code if it doesn't have the function definition. In maximally optimized code usually all
private
methods are inlined whether you ask for it or not.As an aside to prevent inlining in GCC, use
__attribute__(( noinline ))
, and in Visual Studio, use__declspec(noinline)
.Does it matter if an application is multithreaded when one writes 'inline' for a function/method?
Multithreading doesn't affect inlining in any way.
What is the use of the `inline` keyword in C?
Note: when I talk about .c
files and .h
files in this answer, I assume you have laid out your code correctly, i.e. .c
files only include .h
files. The distinction is that a .h
file may be included in multiple translation units.
static inline void f(void) {}
has no practical difference withstatic void f(void) {}
.
In ISO C, this is correct. They are identical in behaviour (assuming you don't re-declare them differently in the same TU of course!) the only practical effect may be to cause the compiler to optimize differently.
inline void f(void) {}
in C doesn't work as the C++ way. How does it work in C? What actually doesextern inline void f(void);
do?
This is explained by this answer and also this thread.
In ISO C and C++, you can freely use inline void f(void) {}
in header files -- although for different reasons!
In ISO C, it does not provide an external definition at all. In ISO C++ it does provide an external definition; however C++ has an additional rule (which C doesn't), that if there are multiple external definitions of an inline
function, then the compiler sorts it out and picks one of them.
extern inline void f(void);
in a .c
file in ISO C is meant to be paired with the use of inline void f(void) {}
in header files. It causes the external definition of the function to be emitted in that translation unit. If you don't do this then there is no external definition, and so you may get a link error (it is unspecified whether any particular call of f
links to the external definition or not).
In other words, in ISO C you can manually select where the external definition goes; or suppress external definition entirely by using static inline
everywhere; but in ISO C++ the compiler chooses if and where an external definition would go.
In GNU C, things are different (more on this below).
To complicate things further, GNU C++ allows you to write static inline
an extern inline
in C++ code... I wouldn't like to guess on what that does exactly
I never really found a use of the inline keyword in my C programs, and when I see this keyword in other people's code, it's almost always static inline
Many coders don't know what they're doing and just put together something that appears to work. Another factor here is that the code you're looking at might have been written for GNU C, not ISO C.
In GNU C, plain inline
behaves differently to ISO C. It actually emits an externally visible definition, so having a .h
file with a plain inline
function included from two translation units causes undefined behaviour.
So if the coder wants to supply the inline
optimization hint in GNU C, then static inline
is required. Since static inline
works in both ISO C and GNU C, it's natural that people ended up settling for that and seeing that it appeared to work without giving errors.
, in which I see no difference with just static.
The difference is just in the intent to provide a speed-over-size optimization hint to the compiler. With modern compilers this is superfluous.
Why are C++ inline functions in the header?
The definition of an inline
function doesn't have to be in a header file but, because of the one definition rule (ODR) for inline functions, an identical definition for the function must exist in every translation unit that uses it.
The easiest way to achieve this is by putting the definition in a header file.
If you want to put the definition of a function in a single source file then you shouldn't declare it inline
. A function not declared inline
does not mean that the compiler cannot inline the function.
Whether you should declare a function inline
or not is usually a choice that you should make based on which version of the one definition rules it makes most sense for you to follow; adding inline
and then being restricted by the subsequent constraints makes little sense.
Is there a way to separate the two meanings of the inline keyword (ODR relaxation vs. function code inlining)
But
__declspec(noinline)
isn't portable.
You can make it portable to all implementations that have an analogous attribute by using a platform detection macro. GCC and Clang have __attribute__((noinline))
.
Another approach is to simply not care. The compiler still has the option to ignore the preference that it perceives to have been implied. If the inline expansion would be expensive (because the function is big), a smart compiler should refrain from expanding it.
When is the inline keyword effective in C?
It has a semantic effect. To simplify, a function marked inline
may be defined multiple times in one program — though all definitions must be equivalent to each other — so presence of inline
is required for correctness when including the function definition in headers (which is, in turn, makes the definition visible so the compiler can inline it without LTO).
Other than that, for inlining-the-optimization, "never" is a perfectly safe approximation. It probably has some effect in some compilers, but nothing worth losing sleep over, especially not without actual hard data. For example, in the following code, using Clang 3.0 or GCC 4.7, main
contains the same code whether work
is marked inline
or not. The only difference is whether work
remains as stand-alone function for other translation units to link to, or is removed.
void work(double *a, double *b) {
if (*b > *a) *a = *b;
}
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
//if (y[i] > x[i]) x[i] = y[i];
work(x+i, y+i);
}
}
Related Topics
Constant Expression Initializer for Static Class Member of Type Double
How to Make a Variadic MACro for Std::Cout
How to Make Reading from 'Std::Cin' Timeout After a Particular Amount of Time
Good Debugger Tutorial for Beginners
Unsigned and Signed Comparison
Constexpr Initializing Static Member Using Static Function
How to Std::Move Objects Out of Functions? (C++11)
How to Specify Setprecision Rounding
How to Check If the Input Is a Valid Integer Without Any Other Chars
How Do Compilers Treat Variable Length Arrays
Scope VS Life of Variable in C
Statically Declared 2-D Array C++ as Data Member of a Class
Inlining Failed in Call to Always_Inline '_M256D _Mm256_Broadcast_Sd(Const Double*)'
Waitforinputidle Doesn't Work for Starting Mspaint Programmatically