Why Is Std::For_Each a Non-Modifying Sequence Operation

Why is std::for_each a non-modifying sequence operation?

See this defect report they say

The LWG believes that nothing in the standard prohibits function objects that modify the sequence elements. The problem is that for_each is in a secion entitled "nonmutating algorithms", and the title may be confusing. A nonnormative note should clarify that.

But also note this one.

They seem to call it "non-modifying" because for_each itself does not exlicitly modify the elements of the sequence.

c++11 lambda: difference between transform and for_each

You mean, transform vs for_each? Because you can also use lambdas as the functions for transform.

In this case, there's no real difference between transform and for_each. for_each is the more general algorithm, I'd use transform in this case because otherwise you're just re-implementing transform using for_each, which is less clear IMHO, although clarity is somewhat subjective so your mileage may vary.

In terms of efficiency, there wouldn't be a difference in this case. In the future, if you're interested in that kind of thing, it's probably easier to just measure it.

why std::for_each with deletion of elements not break iteration?

It's undefined behaviour, and won't work reliably. After adding a line to print keys and values inside your erasing lambda function, I see:

1=5000
2=1
3=2
4=5000
2=1          // AGAIN!!!
3=2          // AGAIN!!!
5=5000
6=3

With my Standard library's map implementation, after erasing the element with key 4, iteration returns to the node with key 2! It then revisits the node with key 3. Because your lambda happily retested such nodes (v.second > 1000) and returned without any side effects, the broken iteration wasn't affecting the output.

You might reasonably ask: "but isn't it astronomically unlikely that it'd have managed to continue iteration (even if to the wrong next node) without crashing?"

Actually, it's quite likely.

Erasing a node causes delete to be called for the memory that node occupied, but in general the library code performing the delete will just:

invoke the destructor (which has no particular reason to waste time overwriting the left-child-, right-child- and parent-pointers), then
modify its records of which memory regions are allocated vs. available.

It's unlikely to "waste" time arbitrarily modifying the heap memory being deallocated (though some implementations will in memory-usage debugging modes).

So, the erased node probably sits there intact until some other heap allocation's performed.

And, when you erase an element in a map, the Standard requires that none of the container's other elements be moved in memory - iterators, pointers and references to other elements must remain valid. It can only modify nodes' left/right/parent pointers that maintain the binary tree.

Consequently, if you continue to use the iterator to the erased element, it is likely to see pointers to the left/right/parent nodes the erased element linked to before erasure, and operator++() will "iterate" to them using the same logic it would have employed if the erased element were still in the map.

If we consider an example map's internal binary tree, where N3 is a node with key 3:

The way iteration is done will likely be:

initially, start at the N1; the map must directly track where this is to ensure begin() is O(1)
if on a node with no children, repeat { Nfrom = where you are, move to parent, if nullptr or right != Nfrom break} (e.g. N1->N3, N4->N3->N5, N6->N7->N5->nullptr)
if on a node with right-hand child, take it then any number of left-hand links (e.g. N3->N4, N5->N7->N6)

So, if say N4 is removed (so N3->right = nullptr;) and no rebalancing occurs, then iteration records NFrom=N4 then moves to the parent N3, then N3->right != Nfrom, so it will think it should stop on the already-iterated-over N3 instead of moving on up to N5.

On the other hand, if the tree has been rebalanced after the erase, all bets are off and the invalidated iterator could repeat or skip elements or even iterate "as hoped".

This is not intended to let you reason about behaviour after an erase - it's undefined and shouldn't be relied on. Rather, I'm just showing that a sane implementation can account for your unexpected observations.

Is it ok to mutate objects with std::for_each?

Read this article.

To be pedantic: for_each is a non-modifying sequence operation. The intent is not to modify the sequence. However, it is okay to modify the input sequence when using for_each.

What is the difference between std::transform and std::for_each?

std::transform is the same as map. The idea is to apply a function to each element in between the two iterators and obtain a different container composed of elements resulting from the application of such a function. You may want to use it for, e.g., projecting an object's data member into a new container. In the following, std::transform is used to transform a container of std::strings in a container of std::size_ts.

std::vector<std::string> names = {"hi", "test", "foo"};
std::vector<std::size_t> name_sizes;

std::transform(names.begin(), names.end(), std::back_inserter(name_sizes), [](const std::string& name) { return name.size();});

On the other hand, you execute std::for_each for the sole side effects. In other words, std::for_each closely resembles a plain range-based for loop.

Back to the string example:

std::for_each(name_sizes.begin(), name_sizes.end(), [](std::size_t name_size) {
    std::cout << name_size << std::endl;
});

Indeed, starting from C++11 the same can be achieved with a terser notation using range-based for loops:

for (std::size_t name_size: name_sizes) {
    std::cout << name_size << std::endl;
}

Advantages of std::for_each over for loop

The nice thing with C++11 (previously called C++0x), is that this tiresome debate will be settled.

I mean, no one in their right mind, who wants to iterate over a whole collection, will still use this

for(auto it = collection.begin(); it != collection.end() ; ++it)
{
   foo(*it);
}

Or this

for_each(collection.begin(), collection.end(), [](Element& e)
{
   foo(e);
});

when the range-based for loop syntax is available:

for(Element& e : collection)
{
   foo(e);
}

This kind of syntax has been available in Java and C# for some time now, and actually there are way more foreach loops than classical for loops in every recent Java or C# code I saw.

Should I use std::for_each?

There is an advantage to using std::for_each instead of an old school for loop (or even the newfangled C++0x range-for loop): you can look at the first word of the statement and you know exactly what the statement does.

When you see the for_each, you know that the operation in the lambda is performed exactly once for each element in the range (assuming no exceptions are thrown). It isn't possible to break out of the loop early before every element has been processed and it isn't possible to skip elements or evaluate the body of the loop for one element multiple times.

With the for loop, you have to read the entire body of the loop to know what it does. It may have continue, break, or return statements in it that alter the control flow. It may have statements that modify the iterator or index variable(s). There is no way to know without examining the entire loop.

Herb Sutter discussed the advantages of using algorithms and lambda expressions in a recent presentation to the Northwest C++ Users Group.

Note that you can actually use the std::copy algorithm here if you'd prefer:

std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, "\n"));

Can stl functions that work on each member of a container accept external parameters?

Here is an example that uses for_each to update each element in abc:

std::for_each(abc.begin(),    // Start of range
              abc.end(),      // End of range
              [=](int &value) // The operation to apply
              {
                value=(value-min)/(max-min);
              });

Changing the sequence being iterated over by for_each is often frowned upon, despite the fact that for_each guarantees the order of traversal as well as how many invocations occur per element. However, to appease the naysayers, you may want to use transform instead which has no such guarantees (neither of the order of traversal nor of the number of calls of the predicate other than via a complexity guarantee {Note: the complexity has been modified to a guarantee of the number of calls in C++11, see 25.3.4.4}):

std::transform(abc.begin(),    // Start of source range
               abc.end(),      // End of source range
               abc.begin(),    // Start of destination range
               [=](int value)  // The operation to apply
               {
                 return (value-min)/(max-min);
               });

By the way, since you're doing integer division in your formula, you have to be very careful with the truncation of the result. You should probably check for a divide by zero, as well.

Why use std::for_each over a for loop?

It depends somewhat on the local coding conventions, but there are two
potential advantages. The first is that it states clearly that the code
iterates over all of the elements in the sequence; unless the local
coding conventions say otherwise (and they are enforced), you have to
consider that some cowboy programmer might have inserted a break. The
second is that it names the operation you are performing on each
element; this once can easily be handled by calling a function in the
loop, and of course, really trivial operations may not need a name.

There's also the advantage, at least if you aren't yet using C++11, that
you don't have to spell out the iterator types; the spelled out iterator
types create a lot of verbiage, in which the important logic can get
lost or overlooked.

Why Is Std::For_Each a Non-Modifying Sequence Operation