Why the Sequence-Operation Algorithms Predicates Are Passed by Copy

Why the sequence-operation algorithms predicates are passed by copy?

It's mostly for historic reasons. At '98 when the whole algo stuff made it into the standard references had all kind of problems. That got eventually resolved through core and library DRs by C++03 and beyond. Also sensible ref-wrappers and actually working bind only arrived only in TR1.

Those who tried use algos with early C++98 having functions using ref params or returns can recall all kind of trouble. Self-written algos were also prone to hit the dreaded 'reference to reference' problem.

Passing by value at least worked fine, and hardly created many problems -- and boost had ref and cref early on to help out where you needed to tweak.

why does `boost::lower_bound` take its argument by value?

This has now been fixed by this issue.

There may be historical reasons for taking the argument by value. See this answer about function objects passed by value to standard algorithms.

Is There a Reason Standard Algorithms Take Lambdas by Value?

std assumes function objects and iterators are free to copy.

std::ref provides a method to turn a function object into a pseudo-reference with a compatible operator() that uses reference instead of value semantics. So nothing of large value is lost.

If you have been taught all your life to take objects by reference, reconsider. Unless there is a good reason otherwise, take objects by value. Reasoning about values is far easier; references are pointers into any state anywhere in your program.

The conventional use of references, as a pointer to a local object which is not referred to by any other active reference in the context where it is used, is not something someone reading your code nor the compiler can presume. If you reason about references this way, they don't add a ridiculous amount of complexity to your code.

But if you reason about them that way, you are going to have bugs when your assumption is violated, and they will be subtle, gross, unexpected, and horrible.

A classic example is the number of operator= that break when this and the argument refer to the same object. But any function that takes two references or pointers of the same type has the same issue.

But even one reference can break your code. Let's look at sort. In pseudo-code:

void sort( Iterator start, Iterator end, Ordering order )

Now, let's make Ordering a reference:

void sort( Iterator start, Iterator end, Ordering const& order )

How about this one?

std::function< void(int, int) > alice;
std::function< void(int, int) > bob;
alice = [&]( int x, int y ) { std:swap(alice, bob); return x<y; };
bob = [&]( int x, int y ) { std:swap(alice, bob); return x>y; };

Now, call sort( begin(vector), end(vector), alice ).

Every time < is called, the referred-to order object swaps meaning. Now this is pretty ridiculous, but when you took Ordering by const&, the optimizer had to take into account that possibility and rule it out on every invokation of your ordering code!

You wouldn't do the above (and in fact this particular implementation is UB as it would violate any reasonable requisites on std::sort); but the compiler has to prove you didn't do something "like that" (change the code in ordering) every time it follows order or invokes it! Which means constantly reloading the state of order, or inlining and proving you did nonesuch insanity.

Doing this when taking by-value is an order of magnitude harder (and basically requires something like std::ref). The optimizer has a function object, it is local, and its state is local. Anything stored within it is local, and the compiler and optimizer know who exactly can modify it legally.

Every function you write taking a const& that ever leaves its "local scope" (say, called a C library function) can not assume the state of the const& remained the same after it got back. It must reload the data from wherever the pointer points to.

Now, I did say pass by value unless there is a good reason. And there are many good reasons; your type is very expensive to move or copy, for example, is a great reason. You are writing data to it. You actually want it to change as you read it each time. Etc.

But the default behavior should be pass-by-value. Only move to references if you have a good reason, because the costs are distributed and hard to pin down.

eliminate unnecessary copies when calling C++/STL algorithms

1 - I know that most STL algorithms pass their argument by value. However, compared to func, that also passes its input argument by value, the STL algorithms generate an extra copy. What's the reason for this "unnecessary" copy?

STL algorithms return the function object. This happens so that the mutation on the object will be observable. Your func returns void so that's a copy less.

Well, to be precise, generate does not return a thing (see comment by dyp)

2 - Is there a way to eliminate such "unnecessary" copies?

Well unnecessary is a bit too strong. The whole point of functors is to be lightweight objects so that a copy wouldn't matter. As for a way, the one you provide (std::ref) will do the job, alas a copy of the std::ref will be generated (your object won't get copied though)

Another way would be to qualify the call of the algorithm

then the function object type will be a reference :

auto fobj1 = funObj<int>();

std::for_each<std::vector<int>::iterator, std::vector<int>::iterator, 
funObj<int>&> // this is where the magic happens !!
(std::begin(v), std::end(v), fobj1);

3 - When calling std::for_each(std::begin(v), std::end(v), funObj()) and func(funObj()) in which scope does temporary object funObj lives, for each case respectively?

The body of std_for_each is expanded as follows :

template<class InputIterator, class Function>
  Function for_each(InputIterator first, InputIterator last, Function fn)
{ // 1
  while (first!=last) {
    fn (*first);
    ++first;
  }
  return fn;      // or, since C++11: return move(fn);
// 2
}

your function reads

template<typename T>
void func(funObj<T> obj) 
{ // 1.
    obj();  
// 2.
}

The comments 1 and 2 mark the lifespan in each case. Note though that if a return value optimization applies (named or unnamed), then the compiler may generate code that places the return value (the function object in for_each) in the stack frame of the caller, so the life span is longer.

4 - I've tried to use std::ref in order to force pass-by-reference and as you can see the "unnecessary" copy was eliminated. However, when I try to pass a temporary object to std::ref (i.e., std::ref(funObj())) I get a compiler error. Why such kind of statements are illegal?

std::ref does not work with r-value references (STL code follows):

template<class _Ty>
void ref(const _Ty&&) = delete;

you need to pass an l-value

5 - The output was generated using VC++2013. As you can see there's an anomaly when calling std::for_each the destructors of the objects are being called in reversed order. Why is that so?

6 - When I run the code on Coliru that runs GCC v4.8 the anomaly with destructors is fixed however std::generate doesn't generate an extra copy. Why is that so?

Check the settings for each compilation. With optimizations ON (and in Release for VS) copy elision / elimination of extra copies / ignoring non observable behaviors, are possible.
Secondly (as far as I can see) in VS 2013 the functor in for_each and the generator in generate are both passed by value (there's no signature accepting an r-value reference) so it would be clearly a matter of copy elision to save the extra copy.

For what matters, the STL implementation in gcc also has no signatures that accept r-value references (please notify me if one having is spotted)

template<typename _InputIterator, typename _Function>
_Function
for_each(_InputIterator __first, _InputIterator __last, _Function __f)
{
  // concept requirements
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
  __glibcxx_requires_valid_range(__first, __last);
  for (; __first != __last; ++__first)
__f(*__first);
  return _GLIBCXX_MOVE(__f);
}

so I may be going out on limb on this one and assume, that defining move semantics for your functor has no effect and only compiler optimizations apply to eliminate copies

Can I abuse a predicate to perform operations on the elements before remove_if removes them?

Can I abuse a predicate to perform operations on the elements before remove_if removes them?

Yes. There's nothing in the standard specification that requires the predicate to be a pure function. So this C++11 solution would be perfectly fine:

my_list.remove_if([f, pred](Elem const& e){
    if (pred(e)) {
        f(e);
        return true;
    }
    return false;
});

Nothing even requires the predicate to ever return true. You can even use remove_if as a poor man's, unnecessarily confusing for_each:

my_list.remove_if([f](Elem const& e){
    f(e);
    return false;
});

That's pointless and inefficient, but it's definitely standards conforming.

You can write the equivalent in C++03 as a function object. Whether or not you find that easier to read than the for loop is a matter of opinion. But it's not wrong.

Why is there no transform_if in the C++ standard library?

The standard library favours elementary algorithms.

Containers and algorithms should be independent of each other if possible.

Likewise, algorithms that can be composed of existing algorithms are only rarely included, as shorthand.

If you require a transform if, you can trivially write it. If you want it /today/, composing of ready-mades and not incur overhead, you can use a range library that has lazy ranges, such as Boost.Range, e.g.:

v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0)

As @hvd points out in a comment, transform_if double result in a different type (double, in this case). Composition order matters, and with Boost Range you could also write:

 v | transformed(arg1 * arg1 / 7.0) | filtered(arg1 < 2.0)

resulting in different semantics. This drives home the point:

it makes very little sense to include std::filter_and_transform, std::transform_and_filter, std::filter_transform_and_filter etc. etc. into the standard library.

See a sample Live On Coliru

#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>

using namespace boost::adaptors;

// only for succinct predicates without lambdas
#include <boost/phoenix.hpp>
using namespace boost::phoenix::arg_names;

// for demo
#include <iostream>

int main()
{
    std::vector<int> const v { 1,2,3,4,5 };

    boost::copy(
            v | filtered(arg1 % 2) | transformed(arg1 * arg1 / 7.0),
            std::ostream_iterator<double>(std::cout, "\n"));
}

How can std::bind bind a binary operation of two different-type parameters passed in to std::sort?

Not exactly related, but your answer is wrong: "depending, strings, supposed, example, vector, In, this, am, sort, a, of, I," Note that "In" is before "this" and "sort". Your sorting function should be comparing the sizes of the two input strings.

Now, for the 'How does bind work?'. Basically, when you call std::bind, it copies/moves all of the additional arguments provided into the returned data structure. This structure has a templated operator(). The parameters passed to operator() are substituted in place of the std::placeholders objects that were provided during the call to bind. The important part here is that any parameters passed during the invocation that are not mentioned are not forwarded to the bound functor.

So in this case, because you didn't pass std::placeholders:_2 to the call to bind, the second argument is discarded. The second argument to the lambda is filled in by len which you passed in.

There is more functionality in std::bind, but this covers the simplest use case.

Why the Sequence-Operation Algorithms Predicates Are Passed by Copy