Are Lambdas Inlined Like Functions in C++

How effectively can function-local lambdas be inlined by C++ compilers?

Yes.

Modern compilers use "static single assignment" (SSA) as an optimization pass.

Each time you assign to a value or modify it, a conceptually different value is created. Sometimes these conceptually different values share identity (for the purpose of pointers-to).

Identity, when you take the address of something, is the thing that gets in the way of this.

Simple references are turned into aliases for the value they reference; they have no identity. This is part of the original design intent for references, and why you cannot have a pointer to a reference.

Concretely:

std::string printSomeNumbers(void)
{
std::ostringstream ss;
const auto appendNumber = [&ss](auto number) {
ss << number << "\n"; // Pretend this is something non-trivial
};

printf("hello\n");
appendNumber(1);
printf("world\n");
appendNumber(2.0);
printf("today\n");

return ss.str();
}

compiles to:

printSomeNumbers[abi:cxx11]():           # @printSomeNumbers[abi:cxx11]()
push r14
push rbx
sub rsp, 376
mov r14, rdi
mov rbx, rsp
mov rdi, rbx
mov esi, 16
call std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode)
mov edi, offset .Lstr
call puts
mov rdi, rbx
mov esi, 1
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov esi, offset .L.str.3
mov edx, 1
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edi, offset .Lstr.8
call puts
mov rdi, rsp
movsd xmm0, qword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
mov esi, offset .L.str.3
mov edx, 1
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edi, offset .Lstr.9
call puts
lea rsi, [rsp + 8]
mov rdi, r14
call std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const
mov rax, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >]
mov qword ptr [rsp], rax
mov rcx, qword ptr [rip + VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >+24]
mov rax, qword ptr [rax - 24]
mov qword ptr [rsp + rax], rcx
mov qword ptr [rsp + 8], offset vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >+16
mov rdi, qword ptr [rsp + 80]
lea rax, [rsp + 96]
cmp rdi, rax
je .LBB0_7
call operator delete(void*)
.LBB0_7:
mov qword ptr [rsp + 8], offset vtable for std::basic_streambuf<char, std::char_traits<char> >+16
lea rdi, [rsp + 64]
call std::locale::~locale() [complete object destructor]
lea rdi, [rsp + 112]
call std::ios_base::~ios_base() [base object destructor]
mov rax, r14
add rsp, 376
pop rbx
pop r14
ret

Godbolt

Notice that between the printf calls (in the assembly they are puts) there is no call other than directly to a operator<< of ostringstream.

Is it possible to inline a lambda expression?

The inline keyword does not actually cause functions to be inlined. Any recent compiler is going to make better decisions with regards to inlining than you will.

In the case of a short lambda, the function will probably be inlined.

If you're trying to use the inline keyword with a lambda, the answer is no, you can't use that.

Inline Lambda variable vs Inline function vs Inline template function with automatic type deduction

They all 3 define something sightly different. Hence, answer to 2 is: Choose the one that does what you want. 1) I'll levave to you, because you can easily try them out to see if they compile. 3) isn't that relevant, because choosing between them is a matter of what you actually need, not of style.

inline auto const myfunc = [&](const auto& val){return something(val);};

The lambda myfunc is of some unnamed type with a templated operator(). myfunc itself is not templated. You can pass myfunc to other functions, because it is an object. You cannot do that easily with the other two.

The difference between

inline auto myfunc(const auto& val)
{
return something(val);
}

and

template<class T>
inline T myfunc(const T& val)
{
return something(val);
}

is the return type. With the second, the return type is T. T is either deduced from the paramter or you specify it explicitly and then it can be different from the parameter passed, as long as the parameter can convert to const T&. Hence the first is more similar to the lambda (more precisely, to its operator()), because the return type is deduced from something(val), though the lambda additionally captures via &. You cannot do that easily with a function.

C++11 Performance: Lambda inlining vs Function template specialization

If the compiler can track "this function pointer points to this function", the compiler can inline the call through the function pointer.

Sometimes compilers can do this. Sometimes they cannot.

Unless you store a lambda in a function pointer, std::function, or similar type-erasing wrapper, the compiler at the point where the lambda is called knows the type of the lambda, so knows the body of the lambda. The compiler can trivially inline the function call.

Nothing about using a function template changes this, except if the argument is constexpr like a function non-type template parameter:

template <int func(int, int)>

this is an example of that. Here, the function template, in the body of the function, is guaranteed to be known at compile time.

Pass that func anywhere else, however, and the compiler can lose track of it.

In any case, any speed difference is going to be highly context dependent. And sometimes the larger binary size caused by inlining of a lambda will cause more slowdown than the inability to inline a function pointer, so performance can go the other way.

Any universal claims like you are trying to make is going to be wrong sometimes.

Do lambdas get inlined?

No. Lambda functions are not inlined but instead are stored as delegates under the hood and incur the same cost of execution as other delegates.

Is there a way to check whether C++ lambda functions are inlined by the compiler?

TL;DR: Not without looking at the compilation output.

First, as other answers point out, C++ lambdas are basically anonymous classes with an operator() method; so, your question is no different than "is there a way to check that a certain invocation of an object's method gets inlined?"

Whether your method invocation is inlined or not is a choice of the compiler, and is not mandated by the language specification (although in some cases it's impossible to inline). This fact is therefore not represented in the language itself (nor by compiler extensions of the language).

What you can do is one of two things:

  • Externally examine the compilation output (the easiest way is by compiling without assembling, e.g. gcc -S or clang++ -S, plus whatever optimization flags and other compilation options. Bear in mind, though, that even if inlining has not happened during compilation, it may still theoretically occur at link-time.
  • Internally, try to determine side-effects of the inlining choice. For example, you could have a function which gets the address of a function you want to check; then you read - at run-time - the instructions of that function, to see whether it has any function calls, look up the called addresses in the symbol table, and see whether the symbol name comes from some lambda. This is already rather difficult, error-prone, platform-specific and brittle - and there's the fact that you might have two lambda used in the same function. So I obviously wouldn't recommend doing something like that.

does passing lambda by value or reference make it easier to inline?

Think of a lambda as a small object with a function call operator:

int foo = 1000;
auto f = [=]() ->int { return foo; };

is somewhat equivalent to:

class FooLambda {
int foo;
public:
FooLambda(int foo) : foo(foo) {}
int operator()(){ return foo; }
};
// ...
int foo = 1000;
FooLambda f(foo);

So you see, the function body itself can be inlined if it is seen in the same translation unit as it is called (and possibly if not by some smarter compilers). Since your invoke is a template, it knows the actual type of the lamdba, and you don't force it to jump through function-pointer hoops, which a big inhibitor of inlining.

Taking the callable object by value or reference in invoke determines whether the captured variables are local to the function body or not, which can make a difference if it means they will be in cache.

What is the difference between inline and constexpr captureless lambda in a header?

If I understand correctly, constexpr imply inline, and lambdas are constexpr by default.

The first part is true, but not for this case. From [dcl.constexpr]/1:

A function or static data member declared with the constexpr or consteval specifier is implicitly an inline function or variable ([dcl.inline]).

In our case, we don't have either a function or a static data member, so it's not implicitly inline. You'd have to explicitly mark it as such.

The second part isn't quite right. From [expr.prim.lambda.closure]/4:

The function call operator or any given operator template specialization is a constexpr function if either the corresponding lambda-expression's parameter-declaration-clause is followed by constexpr or consteval, or it satisfies the requirements for a constexpr function ([dcl.constexpr]).

The call operator is constexpr by default, but the lambda itself is not. Which for a capture-less lambda is basically fine, you can still use the call operator - as demonstrated in the example for this section:

auto ID = [](auto a) { return a; };
static_assert(ID(3) == 3); // OK

In short, if you're declaring this lambda in the header, you definitely need the inline keyword and it doesn't hurt to just slap on the constexpr keyword either.



Related Topics



Leave a reply



Submit