Std::Bind VS Lambda Performance

std::bind vs lambda performance

I assume that lambda cannot be that better than bind.

That's quite a preconception.

Lambdas are tied into the compiler internals, so extra optimization opportunities may be found. Moreover, they're designed to avoid inefficiency.

However, there are probably no compiler optimization tricks happening here. The likely culprit is the argument to bind, bind(&decltype(result)::eval, &result). You are passing a pointer-to-member-function (PTMF) and an object. Unlike the lambda type, the PTMF does not capture what function actually gets called; it only contains the function signature (parameter and return types). The slow loop is using an indirect branch function call, because the compiler failed to resolve the function pointer through constant propagation.

If you rename the member eval() to operator () () and get rid of bind, then the explicit object will essentially behave like the lambda and the performance difference should disappear.

Efficiency of std::bind vs lambda

Is there any difference in the performance of these approaches?

Perhaps, perhaps not; as commenters suggest - profile to check, or look at the assemby code you get (e.g. using the GodBolt Compiler Explorer). But you're asking the wrong question, for two main reasons:

  1. You should probably not be passing lambda's, nor bind() results, around in the part of your code that's performance-critical.
  2. You should definitely avoid invoking arbitrary functions via function pointer or std::function variables in performance-critical areas of your code (except if this can be de-virtualized and inlined by the compiler).

and one mind reason:


  1. Lambdas (and std::bind()'s) are usable, and useful, without being wrapped in std::function; this wrapper has its own performance penalty, so you would only be comparing one way of using these constructs.

Bottom line recommendation: Just use Lambdas. They're cleaner, easier to understand, cheaper to compile, and more flexible syntactically. So don't worry and be happy :-) . And in performance-critical code, either use Lambda's without std::function, or don't use any of the two.

Bind Vs Lambda?

As you said, bind and lambdas don't quite exactly aim at the same goal.

For instance, for using and composing STL algorithms, lambdas are clear winners, IMHO.

To illustrate, I remember a really funny answer, here on stack overflow, where someone asked for ideas of hex magic numbers, (like 0xDEADBEEF, 0xCAFEBABE, 0xDEADDEAD etc.) and was told that if he were a real C++ programmer he would simply have download a list of English words and use a simple one-liner of C++ :)

#include <iterator>
#include <string>
#include <algorithm>
#include <iostream>
#include <fstream>
#include <boost/lambda/lambda.hpp>
#include <boost/lambda/bind.hpp>

int main()
{
using namespace boost::lambda;
std::ifstream ifs("wordsEn.txt");
std::remove_copy_if(
std::istream_iterator<std::string>(ifs),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"),
bind(&std::string::size, _1) != 8u
||
bind(
static_cast<std::string::size_type (std::string::*)(const char*, std::string::size_type) const>(
&std::string::find_first_not_of
),
_1,
"abcdef",
0u
) != std::string::npos
);
}

This snippet, in pure C++98, open the English words file, scan each word and print only those of length 8 with 'a', 'b', 'c', 'd', 'e' or 'f' letters.

Now, turn on C++0X and lambda :

#include <iterator>
#include <string>
#include <algorithm>
#include <iostream>
#include <fstream>

int main()
{
std::ifstream ifs("wordsEn.txt");
std::copy_if(
std::istream_iterator<std::string>(ifs),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"),
[](const std::string& s)
{
return (s.size() == 8 &&
s.find_first_not_of("abcdef") == std::string::npos);
}
);
}

This is still a bit heavy to read (mainly because of the istream_iterator business), but a lot simpler than the bind version :)

Why use std::bind over lambdas in C++14?

Scott Meyers gave a talk about this. This is what I remember:

In C++14 there is nothing useful bind can do that can't also be done with lambdas.

In C++11 however there are some things that can't be done with lambdas:

  1. You can't move the variables while capturing when creating the lambdas. Variables are always captured as lvalues. For bind you can write:

    auto f1 = std::bind(f, 42, _1, std::move(v));
  2. Expressions can't be captured, only identifiers can. For bind you can write:

    auto f1 = std::bind(f, 42, _1, a + b);
  3. Overloading arguments for function objects. This was already mentioned in the question.

  4. Impossible to perfect-forward arguments

In C++14 all of these possible.

  1. Move example:

    auto f1 = [v = std::move(v)](auto arg) { f(42, arg, std::move(v)); };
  2. Expression example:

    auto f1 = [sum = a + b](auto arg) { f(42, arg, sum); };
  3. See question

  4. Perfect forwarding: You can write

    auto f1 = [=](auto&& arg) { f(42, std::forward<decltype(arg)>(arg)); };

Some disadvantages of bind:

  • Bind binds by name and as a result if you have multiple functions with the same name (overloaded functions) bind doesn't know which one to use. The following example won't compile, while lambdas wouldn't have a problem with it:

    void f(int); void f(char); auto f1 = std::bind(f, _1, 42);
  • When using bind functions are less likely to be inlined

On the other hand lambdas might theoretically generate more template code than bind. Since for each lambda you get a unique type. For bind it is only when you have different argument types and a different function (I guess that in practice however it doesn't happen very often that you bind several time with the same arguments and function).

What Jonathan Wakely mentioned in his answer is actually one more reason not to use bind. I can't see why you would want to silently ignore arguments.

Speed of bound lambda (via std::function) vs operator() of functor struct

The difference between the two cases is fundamentally that with the functor, the compiler knows exactly what will be called at compile time, so the function call can be inlined. Lambdas, interestingly enough, also have a unique type. This means again, when you use a lambda, at compile type (since the compiler must know all types) the function being called is already known, so inlining can occur. On the other hand, a function pointer is type based only on its signature. The signature must be known so that it can be called to and returned from appropriately, but other than that a function pointer can point to anything at run-time, as far as the compiler is concerned. The same is true about std::function.

When you wrap the lambda in a std::function, you erase the type of the lambda from a compiler perspective. If this sounds weird/impossible, think of it this way: since a std::function of a fixed type can wrap any callable with the same signature, the compiler has no way of knowing that some other instruction won't come alone and change what the std::function is wrapping.

This link, http://goo.gl/60QFjH, shows what I mean (by the way, the godbolt page is very very handy, I suggest getting acquainted with it). I wrote three examples here similar to yours. The first uses std::function wrapping a lambda, the second a functor, the third a naked lambda (unwrapped), using decltype. You can look at the assembly on the right and see that both of the latter two get inlined, but not the first.

My guess is that you can use lambdas to do exactly the same thing. Instead of bind, you can just do value based capture with the lambdas of a and b. Each time you push back the lambda into the vector, modify a and b appropriately, and voila.

Stylistically though, I actually strongly feel you should use the struct. It's much clearer what's going on. The mere fact that you are seeming to want to capture a and b in one place, and test against c in another, means that this is used in your code in not just one place. In exchange for like, 2 extra lines of code, you get something more readable, easier to debug, and more extensible.

Why use `std::bind_front` over lambdas in C++20?

bind_front binds the first X parameters, but if the callable calls for more parameters, they get tacked onto the end. This makes bind_front very readable when you're only binding the first few parameters of a function.

The obvious example would be creating a callable for a member function that is bound to a specific instance:

type *instance = ...;

//lambda
auto func = [instance](auto &&... args) -> decltype(auto) {return instance->function(std::forward<decltype(args)>(args)...);}

//bind
auto func = std::bind_front(&type::function, instance);

The bind_front version is a lot less noisy. It gets right to the point, having exactly 3 named things: bind_front, the member function to be called, and the instance on which it will be called. And that's all that our situation calls for: a marker to denote that we're creating a binding of the first parameters of a function, the function to be bound, and the parameter we want to bind. There is no extraneous syntax or other details.

By contrast, the lambda has a lot of stuff we just don't care about at this location. The auto... args bit, the std::forward stuff, etc. It's a bit harder to figure out what it's doing, and it's definitely much longer to read.

Note that bind_front doesn't allow bind's placeholders at all, so it's not really a replacement. It's more a shorthand for the most useful forms of bind.

Use std::bind and store into a std:: function

The placeholders _1, _2, _3... are placed in namespace std::placeholders, you should qualify it like

Callback c = std::bind(handler, std::placeholders::_1);

Or

using namespace std::placeholders;
Callback c = std::bind(handler, _1);

C++0x lambda wrappers vs. bind for passing member functions

As far as readability and style are concerned, I think std::bind looks cleaner for this purpose, actually. std::placeholders does not have anything other than _[1-29] for use with std::bind as far as I know, so I think it is fine to just use "using namespace std::placeholders;"

As for performance, I tried disassembling some test functions:

#include <functional>

void foo (int, int, int);

template <typename T>
void test_functor (const T &functor)
{
functor (1, 2, 3);
}

template <typename T>
void test_functor_2 (const T &functor)
{
functor (2, 3);
}

void test_lambda ()
{
test_functor ([] (int a, int b, int c) {foo (a, b, c);});
}

void test_bind ()
{
using namespace std::placeholders;
test_functor (std::bind (&foo, _1, _2, _3));
}

void test_lambda (int a)
{
test_functor_2 ([=] (int b, int c) {foo (a, b, c);});
}

void test_bind (int a)
{
using namespace std::placeholders;
test_functor_2 (std::bind (&foo, a, _1, _2));
}

When foo() was not defined in the same translation unit, the assembly outputs were more or less the same for both test_lambda and test_bind:

00000000004004d0 <test_lambda()>:
4004d0: ba 03 00 00 00 mov $0x3,%edx
4004d5: be 02 00 00 00 mov $0x2,%esi
4004da: bf 01 00 00 00 mov $0x1,%edi
4004df: e9 dc ff ff ff jmpq 4004c0 <foo(int, int, int)>
4004e4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1)
4004eb: 00 00 00 00 00

00000000004004f0 <test_bind()>:
4004f0: ba 03 00 00 00 mov $0x3,%edx
4004f5: be 02 00 00 00 mov $0x2,%esi
4004fa: bf 01 00 00 00 mov $0x1,%edi
4004ff: e9 bc ff ff ff jmpq 4004c0 <foo(int, int, int)>
400504: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1)
40050b: 00 00 00 00 00

0000000000400510 <test_lambda(int)>:
400510: ba 03 00 00 00 mov $0x3,%edx
400515: be 02 00 00 00 mov $0x2,%esi
40051a: e9 a1 ff ff ff jmpq 4004c0 <foo(int, int, int)>
40051f: 90 nop

0000000000400520 <test_bind(int)>:
400520: ba 03 00 00 00 mov $0x3,%edx
400525: be 02 00 00 00 mov $0x2,%esi
40052a: e9 91 ff ff ff jmpq 4004c0 <foo(int, int, int)>
40052f: 90 nop

However, when the body of foo was included into the same translation unit, only the lambda had its contents inlined (by GCC 4.6):

00000000004008c0 <foo(int, int, int)>:
4008c0: 53 push %rbx
4008c1: ba 04 00 00 00 mov $0x4,%edx
4008c6: be 2c 0b 40 00 mov $0x400b2c,%esi
4008cb: bf 60 10 60 00 mov $0x601060,%edi
4008d0: e8 9b fe ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
4008d5: 48 8b 05 84 07 20 00 mov 0x200784(%rip),%rax # 601060 <std::cout@@GLIBCXX_3.4>
4008dc: 48 8b 40 e8 mov -0x18(%rax),%rax
4008e0: 48 8b 98 50 11 60 00 mov 0x601150(%rax),%rbx
4008e7: 48 85 db test %rbx,%rbx
4008ea: 74 3c je 400928 <foo(int, int, int)+0x68>
4008ec: 80 7b 38 00 cmpb $0x0,0x38(%rbx)
4008f0: 74 1e je 400910 <foo(int, int, int)+0x50>
4008f2: 0f b6 43 43 movzbl 0x43(%rbx),%eax
4008f6: bf 60 10 60 00 mov $0x601060,%edi
4008fb: 0f be f0 movsbl %al,%esi
4008fe: e8 8d fe ff ff callq 400790 <std::basic_ostream<char, std::char_traits<char> >::put(char)@plt>
400903: 5b pop %rbx
400904: 48 89 c7 mov %rax,%rdi
400907: e9 74 fe ff ff jmpq 400780 <std::basic_ostream<char, std::char_traits<char> >::flush()@plt>
40090c: 0f 1f 40 00 nopl 0x0(%rax)
400910: 48 89 df mov %rbx,%rdi
400913: e8 08 fe ff ff callq 400720 <std::ctype<char>::_M_widen_init() const@plt>
400918: 48 8b 03 mov (%rbx),%rax
40091b: be 0a 00 00 00 mov $0xa,%esi
400920: 48 89 df mov %rbx,%rdi
400923: ff 50 30 callq *0x30(%rax)
400926: eb ce jmp 4008f6 <foo(int, int, int)+0x36>
400928: e8 e3 fd ff ff callq 400710 <std::__throw_bad_cast()@plt>
40092d: 0f 1f 00 nopl (%rax)

0000000000400930 <test_lambda()>:
400930: 53 push %rbx
400931: ba 04 00 00 00 mov $0x4,%edx
400936: be 2c 0b 40 00 mov $0x400b2c,%esi
40093b: bf 60 10 60 00 mov $0x601060,%edi
400940: e8 2b fe ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
400945: 48 8b 05 14 07 20 00 mov 0x200714(%rip),%rax # 601060 <std::cout@@GLIBCXX_3.4>
40094c: 48 8b 40 e8 mov -0x18(%rax),%rax
400950: 48 8b 98 50 11 60 00 mov 0x601150(%rax),%rbx
400957: 48 85 db test %rbx,%rbx
40095a: 74 3c je 400998 <test_lambda()+0x68>
40095c: 80 7b 38 00 cmpb $0x0,0x38(%rbx)
400960: 74 1e je 400980 <test_lambda()+0x50>
400962: 0f b6 43 43 movzbl 0x43(%rbx),%eax
400966: bf 60 10 60 00 mov $0x601060,%edi
40096b: 0f be f0 movsbl %al,%esi
40096e: e8 1d fe ff ff callq 400790 <std::basic_ostream<char, std::char_traits<char> >::put(char)@plt>
400973: 5b pop %rbx
400974: 48 89 c7 mov %rax,%rdi
400977: e9 04 fe ff ff jmpq 400780 <std::basic_ostream<char, std::char_traits<char> >::flush()@plt>
40097c: 0f 1f 40 00 nopl 0x0(%rax)
400980: 48 89 df mov %rbx,%rdi
400983: e8 98 fd ff ff callq 400720 <std::ctype<char>::_M_widen_init() const@plt>
400988: 48 8b 03 mov (%rbx),%rax
40098b: be 0a 00 00 00 mov $0xa,%esi
400990: 48 89 df mov %rbx,%rdi
400993: ff 50 30 callq *0x30(%rax)
400996: eb ce jmp 400966 <test_lambda()+0x36>
400998: e8 73 fd ff ff callq 400710 <std::__throw_bad_cast()@plt>
40099d: 0f 1f 00 nopl (%rax)

00000000004009a0 <test_bind()>:
4009a0: ba 03 00 00 00 mov $0x3,%edx
4009a5: be 02 00 00 00 mov $0x2,%esi
4009aa: bf 01 00 00 00 mov $0x1,%edi
4009af: e9 0c ff ff ff jmpq 4008c0 <foo(int, int, int)>
4009b4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1)
4009bb: 00 00 00 00 00

00000000004009c0 <test_lambda(int)>:
4009c0: 53 push %rbx
4009c1: ba 04 00 00 00 mov $0x4,%edx
4009c6: be 2c 0b 40 00 mov $0x400b2c,%esi
4009cb: bf 60 10 60 00 mov $0x601060,%edi
4009d0: e8 9b fd ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@plt>
4009d5: 48 8b 05 84 06 20 00 mov 0x200684(%rip),%rax # 601060 <std::cout@@GLIBCXX_3.4>
4009dc: 48 8b 40 e8 mov -0x18(%rax),%rax
4009e0: 48 8b 98 50 11 60 00 mov 0x601150(%rax),%rbx
4009e7: 48 85 db test %rbx,%rbx
4009ea: 74 3c je 400a28 <test_lambda(int)+0x68>
4009ec: 80 7b 38 00 cmpb $0x0,0x38(%rbx)
4009f0: 74 1e je 400a10 <test_lambda(int)+0x50>
4009f2: 0f b6 43 43 movzbl 0x43(%rbx),%eax
4009f6: bf 60 10 60 00 mov $0x601060,%edi
4009fb: 0f be f0 movsbl %al,%esi
4009fe: e8 8d fd ff ff callq 400790 <std::basic_ostream<char, std::char_traits<char> >::put(char)@plt>
400a03: 5b pop %rbx
400a04: 48 89 c7 mov %rax,%rdi
400a07: e9 74 fd ff ff jmpq 400780 <std::basic_ostream<char, std::char_traits<char> >::flush()@plt>
400a0c: 0f 1f 40 00 nopl 0x0(%rax)
400a10: 48 89 df mov %rbx,%rdi
400a13: e8 08 fd ff ff callq 400720 <std::ctype<char>::_M_widen_init() const@plt>
400a18: 48 8b 03 mov (%rbx),%rax
400a1b: be 0a 00 00 00 mov $0xa,%esi
400a20: 48 89 df mov %rbx,%rdi
400a23: ff 50 30 callq *0x30(%rax)
400a26: eb ce jmp 4009f6 <test_lambda(int)+0x36>
400a28: e8 e3 fc ff ff callq 400710 <std::__throw_bad_cast()@plt>
400a2d: 0f 1f 00 nopl (%rax)

0000000000400a30 <test_bind(int)>:
400a30: ba 03 00 00 00 mov $0x3,%edx
400a35: be 02 00 00 00 mov $0x2,%esi
400a3a: e9 81 fe ff ff jmpq 4008c0 <foo(int, int, int)>
400a3f: 90 nop

Out of curiosity, I redid the test using GCC 4.7, and found that with 4.7, both tests were inlined in the same manner.

My conclusion is that the performance should be the same in either case, but you might want to check your compiler output if it matters.

What is the performance overhead of std::function?

You can find information from the boost's reference materials: How much overhead does a call through boost::function incur? and Performance

This doesn't determine "yes or no" to boost function. The performance drop may be well acceptable given program's requirements. More often than not, parts of a program are not performance-critical. And even then it may be acceptable. This is only something you can determine.

As to the standard library version, the standard only defines an interface. It is entirely up to individual implementations to make it work. I suppose a similar implementation to boost's function would be used.



Related Topics



Leave a reply



Submit