Stateful Functors & Stl:Undefined Behaviour

Stateful functors & STL : Undefined behaviour

The meaning of that quote should be taken at face value. For the majority of the STL algorithms, you should not implement your predicate functor such that it has observable state (AKA "side effects"), because:

the iteration order over the container is not defined,
the algorithm is free to make copies of the functor,
the algorithm may depend on statelessness in order to not corrupt the container contents or crash.

The simplest way to enforce this upon yourself is to define operator() as const.

There are exceptions, such as for_each, for which none of the above apply. You are free to use stateful functors here. For more info, see this excellent article: http://drdobbs.com/cpp/184403769.

Behind the scenes, the authors of your STL implementation are free to write the remove_if (and other algorithms) any way they like, so long as it conforms to the requirements laid down by the standard. There's no real reason to worry too much about exactly why you're getting the behaviour you're seeing, beyond acknowledging that it's undefined. If you want to know the specifics, I would just take a look at the code for remove_if in the STL implementation that you're using.

As for your side-note; this is not "corruption", it's simply an artifact of how remove_if works (this would occur even for a valid predicate). The only requirement is that all the elements to the left of pos are valid (because they are to be retained). There are no requirements on what elements exist from pos onward (see here). (Chapter 32 of "Effective STL" by Scott Meyers has a good explanation of why remove_if (and so on) behave like this).

Can functor provided to std::generate be stateful?

Introduction

There's no problem using a stateful functor with functions such as std::generate, but one has to be careful not to run into issues where the underlying implementation makes a copy that will change the semantics of a program in a way that is different from what the developer have in mind.

25.1p10 Algorithms library - General [algorithms.general]
[ Note: Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function obejcts freely. Programmers for whom object identity is imoprtant should consider using a wrapper class that points to a noncopied implementation object such as reference_wrapper, or some equivalent solution. -- end note ]

Order of assignment/evaluation

The standard explicitly states that exactly last - first (N) assignments and invocations of the generator will be made, but it doesn't state in what order. More can be read in the following Q&A:

C++ standard wording: Does “through all iterators in the range” imply sequentiality?

Stateful functors + `std::generate`, unsafe?

Generally no, but there are a few caveats.

Within std::generate the standard guarantees that the same instance of the functor type will be invoked for every element in the given range, but as can be hinted by the declaration of std::generate we might run into issues where we forget that the passed functor invoked inside the function will be a copy of the one passed as argument.

See the below snippet where we declare a functor to generate "unique" ids:

#include <iostream>
#include <algorithm>

template<class id_type>
struct id_generator {
  id_type operator() () {
    return ++idx;
  }

  id_type next_id () const {
    return idx + 1;
  }

  id_type idx {};
};

int main () {
  id_generator<int> gen;

  std::vector<int>  vec1 (5);
  std::vector<int>  vec2 (5);

  std::generate (vec1.begin (), vec1.end (), gen);
  std::generate (vec2.begin (), vec2.end (), gen);

  std::cout << gen.next_id () << std::endl; // will print '1'
}

After running the above we might expect gen.next_id () to yield 11, since we have used it to generate 5 ids for vec1, and 5 ids for vec2.

This is not the case since upon invoking std::generate our instance of id_generator will be copied, and it is the copy that will be used inside the function.

What would be the solution?

There are several solutions to this problem, all of which prevents a copy from being made when you pass your functor to some algorithm function related to std::generated.

Alternative #1

The recommended solution is to wrap your functor in a std::reference_wrapper with the use of std::ref from <functional>. This will effectively copy the reference_wrapper, but the referred to instance of generate_id will stay the same.

std::generate (vec1.begin (), vec1.end (), std::ref (gen));
std::generate (vec2.begin (), vec2.end (), std::ref (gen));

std::cout << gen.next_id () << std::endl; // will print '11'

Alternative #2

You could, of course, also make your fingers stronger by writing something as confusing as the below:

std::generate<decltype(vec1.begin()), id_generator<int>&>(vec1.begin(), vec1.end(), gen);

Is it still advisable to pass functors to STL instead of functions?

In support of that I seem to remember that if a function's address is ever taken, the function can't be inlined.

You're reading that incorrectly. A function can be inlined into any direct call site, and any indirect call site if the compiler can trace the function pointer. What the GCC manpage is saying that a function that is inlined into every call site will not be emitted as a separate function at all (thus reducing binary size), unless its address is taken.

Firstly, is this still true with C++14?

Yes. Of course, now you would generally write lambdas instead of hand-crafted functors.

Why is there no mechanism to do this automatically.

It's a matter of the type system. All functions with a given signature have the same type. Thus, an algorithm that is passed a function pointer gets instantiated to one concrete function, and its code just exists once in the compilation model of C++.

Of course, an optimizer can still specialize the function on one particular argument, but that's a more advanced optimization than just inlining a functor. So yes, there is a mechanism, it's just less likely to be used.

Does this mean we need to capture something only for the sake of stated at top optimization?

No. The conversion to a function pointer is possible, but unless you invoke it explicitly, it won't be done when you pass a lambda to an algorithm.

STL: Initializing a container with an unconstructed stateful comparator

The std::set<CodePtr,Sorter> object stores an instance of Sorter by value so when you construct it with a Sorter* (did you mean that to be a reference not a pointer?) it will slice the object and only keep the base part.

That means the Sorter copy constructor will run and make a copy of an uninitialized object. Undefined behaviour ensues.

That's assuming you can even create an instance of Sorter, if it's an abstract type you won't be able to (I don't know what your PURE does but I assume you're making the function pure virtual)

@Angew's comment suggest a good approach, the base from member idiom will allow you to ensure the m_sorter object is initialized first, which is part of the problem. That doesn't help the issue of slicing though, to solve that you'd need some wrapper around the sorter e.g.

typedef std::function<bool(const CodePtr&,const CodePtr&)> SorterFunc;
typedef std::set<CodePtr, SorterFunc> TSortedCode;

And then pass the wrapper to the set constructor:

inline SortedBase(SorterFunc s) : m_codeList(s) {}

If you construct the std::function from the derived type it won't be sliced. It will be copied though, but you can prevent that by using a reference wrapper:

  SortedObject5() : BaseFrommember(this), SortedBase(SorterFunc(std::ref(m_sorter))) { }

Where m_sorter is already initialized, because it is stored in the BaseFromMember base class, using the base-from-member idiom.

This:

creates the m_sorter first so you don't do anything with an uninitialized object
passes it by reference to a SorterFunc object
uses a copy of that SorterFunc (still holding a reference to m_sorter) as the comparision function for the std::set

If you don't want to use the base-from-member idiom then it's still easy to avoid the undefined behaviour of your original code, just default construct the set (instead of passing it an uninitialized object) then assign a new value to it before you start populating it:

SortedObject5() : m_sorter(this)
{
  this->m_codeList = TSortedCode(SorterFunc(boost::ref(m_sorter)));
}

No new base classes, no extra templates, no undefined behaviour.

Here's the working code in full:

class SortedBase
{
protected:
   class Sorter : public std::binary_function<CodePtr,CodePtr,bool>
   {
   protected:
     Sorter() {}
     virtual ~Sorter() {}
   public:
     virtual bool operator()(const CodePtr& left, const CodePtr& right) = 0;
   };

   typedef boost::function<bool(const CodePtr&, const CodePtr&)> SorterFunc;

   typedef std::set<CodePtr,SorterFunc> TSortedCode;

   TSortedCode m_codeList;

public:
   virtual ~SortedBase() {}
   void fetch(); // populates m_codeList
};

template<class SORT1, class SORT2, class SORT3, class SORT4, class SORT5>
class SortedObject5 : public SortedBase
{
public:
  SortedObject5() : m_sorter(*this)
  {
    this->m_codeList = TSortedCode(SorterFunc(boost::ref(m_sorter)));
  }

protected:
  typedef SortedObject5<SORT1,SORT2,SORT3,SORT4,SORT5> my_class;

  class MySorter : public Sorter
  {
  public:
    MySorter(const my_class& parent):m_parent(parent) {}
    virtual bool operator()(const CodePtr& left, const CodePtr& right);
  protected:
    const my_class& m_parent;
  };

  MySorter m_sorter;
};

Why can swapping standard library containers be problematic in C++11 (involving allocators)?

Let's start of by digging into the Standard (N3797):

23.2.1p9 General Container Requirements [container.requirements.general]

If
allocator_traits<allocator_type>::propagate_on_container_swap::value
is true, then the allocators of a and b shall also be exchanged
using an unqalified call to non-member swap. Otherwise, they shall
not be swapped, and the behavior is undefined unless a.get_allocator() == b.get_allocator().

What is the purpose of propagate_on_container_swap?

If an Allocator has a typedef named propagate_on_container_swap that refers to std::true_type the underlying Allocators of two containers being swapped, will also swap.^[1]

If propagate_on_container_swap is std::false_type only the data of the two containers will swap, but the allocators will remain in their place.

^{^[1] This means that after a.swap(b), a.get_allocator() will be that which was previously b.get_allocator(); the allocators has swapped.}

What are the implications of stateful Allocators?

The Allocator is not only responsible for allocating memory for the elements within a Standard container, they are also responsible for the deallocation of said elements.

C++03 didn't allow stateful allocators within standard containers, but C++11 mandates that support for such must be present. This means that we could define an allocator that, depending on how it's constructed, acts in a certain way.

If the allocator has propagate_on_container_swap::value equal to false the difference in state between the two allocators involved might lead to undefined behavior, since one instance of the Allocator might not be compatible with the data handled by the other.

What might be the problem with stateful allocators if they are not swapped properly?

Let's say we have a MagicAllocator which either uses malloc or operator new to allocate memory, depending on how it's constructed.

If it uses malloc to allocate memory it must use free to deallocate it, and in case of operator new, delete is required; because of this it must maintain some information saying which of the two it should use.

If we have two std::vector which both uses MagicAllocator but with different states (meaning that one uses malloc and the other operator new), and we don't swap the allocators upon a.swap(b) the allocator won't match the memory allocated for the elements in the two vectors after the swap - which in terms means that the wrong free/delete might be called upon deallocation.

C++ Delete Dupes

There are some error regarding the algorithm. Mainly, range issues.

v.erase(v.begin()+(i-1));

Your i variable starts at 0, so you don't need to remove the element before it. Just remove begin()+i.

count(v, i, last, i)

You have passed for last argument of count(...) the value of size()-1. So your iteration must be until index <= last, instead of just index < last.

Another problem, the fourth parameter isn't i, is v[i]. Your code only compiles because you is testing with a vector of integers, the same type of the index.

It can have other errors, look carefully to the algorithm and index ranges. A good option is a print on each step of the functions.

Edit: Adding my suggested algorithm

This is the code of the two core methods. Replicate the logic if you want something similar in other tasks.

template <typename T>
void removeDup(std::vector<T> & v) {
    for(int i = 0; i <= v.size(); i++)
    {
            if (count(v, i, v[i]) > 1){ // if there is more than 1
                v.erase(v.begin()+i); // erase it
                i--; // Decrement to stay at the same index.
            }
        }
}

template <typename T>
int count(const std::vector<T> & v, int first, const T& target){
    int count = 0;
    for (int index = first; index < v.size(); index++){
        if (v[index] == target){
            count++;
        }
    }
    return count;
}

How to fold STL container?

STL does have such a function: std::accumulate. However, it is in the header <numeric>, not <algorithm>.

Actually the Wikipedia page on "Fold" already listed the foldl/foldr functions on most programming languages, including C++.

Pass additional arguments to remove_if

The state telling "how many elements have been removed so far" should be defined outside of the function / the call to the algorithm. This is because a functor should not have a state which is modified when being called (this would be undefined behavior).

You should take a reference to this state (counter) in the constructor of the functor (or capture by reference in the lambda), so you can access and modify this counter. When this functor is now copied, it doesn't matter which one is called by the algorithm, since all of them now hold the reference to the same state.

Using functors (C++03):

class IsOdd {
    int deleteLimit;
    int & deletedSoFar;

public:
    IsOdd(int deleteLimit, int & deletedSoFar) :
        deleteLimit(deleteLimit), deletedSoFar(deletedSoFar)
    {}

    bool operator()(int i) const {
        if (deletedSoFar < deleteLimit && i % 2) {
            ++deletedSoFar;
            return true;
        }
        return false;
    }
};

int deletedSoFar = 0;
int deleteLimit = 10;
std::remove_if (myints.begin(), myints.end(), IsOdd(deleteLimit, deletedSoFar));

Using lambda (C++11):

int deletedSoFar = 0;
int deleteLimit = 10;
auto it = std::remove_if (myints.begin(), myints.end(), [deleteLimit,&deletedSoFar](int i){
    if (deletedSoFar < deleteLimit && i % 2) {
        ++deletedSoFar;
        return true;
    }
    return false;
});
myints.erase(it, myints.end());

Stateful Functors & Stl:Undefined Behaviour