Why the sequence-operation algorithms predicates are passed by copy?
It's mostly for historic reasons. At '98 when the whole algo stuff made it into the standard references had all kind of problems. That got eventually resolved through core and library DRs by C++03 and beyond. Also sensible ref-wrappers and actually working bind only arrived only in TR1.
Those who tried use algos with early C++98 having functions using ref params or returns can recall all kind of trouble. Self-written algos were also prone to hit the dreaded 'reference to reference' problem.
Passing by value at least worked fine, and hardly created many problems -- and boost had ref and cref early on to help out where you needed to tweak.
why does `boost::lower_bound` take its argument by value?
This has now been fixed by this issue.
There may be historical reasons for taking the argument by value. See this answer about function objects passed by value to standard algorithms.
Is There a Reason Standard Algorithms Take Lambdas by Value?
std
assumes function objects and iterators are free to copy.
std::ref
provides a method to turn a function object into a pseudo-reference with a compatible operator()
that uses reference instead of value semantics. So nothing of large value is lost.
If you have been taught all your life to take objects by reference, reconsider. Unless there is a good reason otherwise, take objects by value. Reasoning about values is far easier; references are pointers into any state anywhere in your program.
The conventional use of references, as a pointer to a local object which is not referred to by any other active reference in the context where it is used, is not something someone reading your code nor the compiler can presume. If you reason about references this way, they don't add a ridiculous amount of complexity to your code.
But if you reason about them that way, you are going to have bugs when your assumption is violated, and they will be subtle, gross, unexpected, and horrible.
A classic example is the number of operator=
that break when this
and the argument refer to the same object. But any function that takes two references or pointers of the same type has the same issue.
But even one reference can break your code. Let's look at sort
. In pseudo-code:
void sort( Iterator start, Iterator end, Ordering order )
Now, let's make Ordering a reference:
void sort( Iterator start, Iterator end, Ordering const& order )
How about this one?
std::function< void(int, int) > alice;
std::function< void(int, int) > bob;
alice = [&]( int x, int y ) { std:swap(alice, bob); return x<y; };
bob = [&]( int x, int y ) { std:swap(alice, bob); return x>y; };
Now, call sort( begin(vector), end(vector), alice )
.
Every time <
is called, the referred-to order
object swaps meaning. Now this is pretty ridiculous, but when you took Ordering
by const&
, the optimizer had to take into account that possibility and rule it out on every invokation of your ordering code!
You wouldn't do the above (and in fact this particular implementation is UB as it would violate any reasonable requisites on std::sort
); but the compiler has to prove you didn't do something "like that" (change the code in ordering
) every time it follows order
or invokes it! Which means constantly reloading the state of order
, or inlining and proving you did nonesuch insanity.
Doing this when taking by-value is an order of magnitude harder (and basically requires something like std::ref
). The optimizer has a function object, it is local, and its state is local. Anything stored within it is local, and the compiler and optimizer know who exactly can modify it legally.
Every function you write taking a const&
that ever leaves its "local scope" (say, called a C library function) can not assume the state of the const&
remained the same after it got back. It must reload the data from wherever the pointer points to.
Now, I did say pass by value unless there is a good reason. And there are many good reasons; your type is very expensive to move or copy, for example, is a great reason. You are writing data to it. You actually want it to change as you read it each time. Etc.
But the default behavior should be pass-by-value. Only move to references if you have a good reason, because the costs are distributed and hard to pin down.
eliminate unnecessary copies when calling C++/STL algorithms
1 - I know that most STL algorithms pass their argument by value. However, compared to func, that also passes its input argument by value, the STL algorithms generate an extra copy. What's the reason for this "unnecessary" copy?
STL algorithms return the function object. This happens so that the mutation on the object will be observable. Your func
returns void so that's a copy less.
- Well, to be precise,
generate
does not return a thing (see comment by dyp)
2 - Is there a way to eliminate such "unnecessary" copies?
Well unnecessary is a bit too strong. The whole point of functors is to be lightweight objects so that a copy wouldn't matter. As for a way, the one you provide (std::ref) will do the job, alas a copy of the std::ref
will be generated (your object won't get copied though)
Another way would be to qualify the call of the algorithm
then the function object type will be a reference :
auto fobj1 = funObj<int>();
std::for_each<std::vector<int>::iterator, std::vector<int>::iterator,
funObj<int>&> // this is where the magic happens !!
(std::begin(v), std::end(v), fobj1);
3 - When calling std::for_each(std::begin(v), std::end(v), funObj()) and func(funObj()) in which scope does temporary object funObj lives, for each case respectively?
The body of std_for_each
is expanded as follows :
template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function fn)
{ // 1
while (first!=last) {
fn (*first);
++first;
}
return fn; // or, since C++11: return move(fn);
// 2
}
your function reads
template<typename T>
void func(funObj<T> obj)
{ // 1.
obj();
// 2.
}
The comments 1
and 2
mark the lifespan in each case. Note though that if a return value optimization applies (named or unnamed), then the compiler may generate code that places the return value (the function object in for_each) in the stack frame of the caller, so the life span is longer.
4 - I've tried to use std::ref in order to force pass-by-reference and as you can see the "unnecessary" copy was eliminated. However, when I try to pass a temporary object to std::ref (i.e., std::ref(funObj())) I get a compiler error. Why such kind of statements are illegal?
std::ref
does not work with r-value references (STL code follows):
template<class _Ty>
void ref(const _Ty&&) = delete;
you need to pass an l-value
5 - The output was generated using VC++2013. As you can see there's an anomaly when calling std::for_each the destructors of the objects are being called in reversed order. Why is that so?
6 - When I run the code on Coliru that runs GCC v4.8 the anomaly with destructors is fixed however std::generate doesn't generate an extra copy. Why is that so?
Check the settings for each compilation. With optimizations ON (and in Release for VS) copy elision / elimination of extra copies / ignoring non observable behaviors, are possible.
Secondly (as far as I can see) in VS 2013 the functor in
for_each
and the generator ingenerate
are both passed by value (there's no signature accepting an r-value reference) so it would be clearly a matter of copy elision to save the extra copy.
For what matters, the STL implementation in gcc also has no signatures that accept r-value references (please notify me if one having is spotted)
template<typename _InputIterator, typename _Function>
_Function
for_each(_InputIterator __first, _InputIterator __last, _Function __f)
{
// concept requirements
__glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
__glibcxx_requires_valid_range(__first, __last);
for (; __first != __last; ++__first)
__f(*__first);
return _GLIBCXX_MOVE(__f);
}
so I may be going out on limb on this one and assume, that defining move semantics for your functor has no effect and only compiler optimizations apply to eliminate copies
Can I abuse a predicate to perform operations on the elements before remove_if removes them?
Can I abuse a predicate to perform operations on the elements before
remove_if
removes them?
Yes. There's nothing in the standard specification that requires the predicate to be a pure function. So this C++11 solution would be perfectly fine:
my_list.remove_if([f, pred](Elem const& e){
if (pred(e)) {
f(e);
return true;
}
return false;
});
Nothing even requires the predicate to ever return true
. You can even use remove_if
as a poor man's, unnecessarily confusing for_each
:
my_list.remove_if([f](Elem const& e){
f(e);
return false;
});
That's pointless and inefficient, but it's definitely standards conforming.
You can write the equivalent in C++03 as a function object. Whether or not you find that easier to read than the for loop is a matter of opinion. But it's not wrong.
Why is there no transform_if in the C++ standard library?
The standard library favours elementary algorithms.
Containers and algorithms should be independent of each other if possible.
Likewise, algorithms that can be composed of existing algorithms are only rarely included, as shorthand.
If you require a transform if, you can trivially write it. If you want it /today/, composing of ready-mades and not incur overhead, you can use a range library that has lazy ranges, such as Boost.Range, e.g.:
v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0)
As @hvd points out in a comment, transform_if
double result in a different type (double
, in this case). Composition order matters, and with Boost Range you could also write:
v | transformed(arg1 * arg1 / 7.0) | filtered(arg1 < 2.0)
resulting in different semantics. This drives home the point:
it makes very little sense to include
std::filter_and_transform
,std::transform_and_filter
,std::filter_transform_and_filter
etc. etc. into the standard library.
See a sample Live On Coliru
#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
using namespace boost::adaptors;
// only for succinct predicates without lambdas
#include <boost/phoenix.hpp>
using namespace boost::phoenix::arg_names;
// for demo
#include <iostream>
int main()
{
std::vector<int> const v { 1,2,3,4,5 };
boost::copy(
v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0),
std::ostream_iterator<double>(std::cout, "\n"));
}
How can std::bind bind a binary operation of two different-type parameters passed in to std::sort?
Not exactly related, but your answer is wrong: "depending, strings, supposed, example, vector, In, this, am, sort, a, of, I," Note that "In" is before "this" and "sort". Your sorting function should be comparing the sizes of the two input strings.
Now, for the 'How does bind work?'. Basically, when you call std::bind
, it copies/moves all of the additional arguments provided into the returned data structure. This structure has a templated operator()
. The parameters passed to operator()
are substituted in place of the std::placeholders
objects that were provided during the call to bind. The important part here is that any parameters passed during the invocation that are not mentioned are not forwarded to the bound functor.
So in this case, because you didn't pass std::placeholders:_2
to the call to bind, the second argument is discarded. The second argument to the lambda is filled in by len
which you passed in.
There is more functionality in std::bind
, but this covers the simplest use case.
Related Topics
How Does the Friend Keyword (Class/Function) Break Encapsulation in C++
What Does Afx_Manage_State(Afxgetstaticmodulestate()) Do Exactly
Speed Accessing a Std::Vector by Iterator VS by Operator[]/Index
Reading Files Larger Than 4Gb Using C++ Stl
Fatal Error C1083: Cannot Open Include File: 'Xyz.H': No Such File or Directory
What Does the "::" Mean in "::Tolower"
Handling Ssl_Shutdown Correctly
Std::Optional - Construct Empty with {} or Std::Nullopt
Why Is a C++ Bool Var True by Default
What Is Namespace Used For, in C++
How to Handle C++ Return Type Std::Vector<Int> in Python Ctypes
C++ Copy Constructor Gets Called Instead of Initializer_List<>
How to Determine the Value of Socket Listen() Backlog Parameter
How to Open a Folder in %Appdata% with C++
Why Do We Even Need the "Delete[]" Operator
Why Does C++11 Not Support Anonymous Structs, While C11 Does
Can't Access Derived Class Method from Pointer of Type Base Class