Reference Invalidation After Applying Reverse_Iterator on a Custom Made Iterator

What could be a least bad implementation for an iterator over a proxied container?

OK, we have two similar but distinct concepts. So lets lay them out.

But first, I need to make a distinction between the named requirements of C++-pre-20, and the actual in-language concepts created for the Ranges TS and included in C++20. They're both called "concepts", but they're defined differently. As such, when I talk about concept-with-a-lowercase-c, I mean the pre-C++20 requirements. When I talk about Concept-with-a-captial-C, I mean the C++20 stuff.

Proxy Iterators

Proxy iterators are iterators where their reference is not a value_type&, but is instead some other type that behaves like a reference to value_type. In this case, *it returns a prvalue to this reference.

The InputIterator concept imposes no requirement on reference, other than that it is convertible to value_type. However, the ForwardIterator concept makes the explicit statement that "reference is a reference to T".

Therefore, a proxy iterator cannot fit the ForwardIterator concept. But it can still be an InputIterator. So you can safely pass a proxy iterator to any function that only requires InputIterators.

So, the problem with vector<bool>s iterators is not that they're proxy iterators. It's that they promise they fulfill the RandomAccessIterator concept (though the use of the appropriate tag), when they're really only InputIterators and OutputIterators.

The Ranges proposal (mostly) adopted into C++20 makes changes to the iterator Concepts which allow all iterators to be proxy iterators. So under Ranges, vector<bool>::iterator really fulfills the RandomAccessIterator Concept. Therefore, if you have code written against the Ranges concepts, then you can use proxy iterators of all kinds.

This is very useful for dealing with things like counting ranges. You can have reference and value_type be the same type, so you're just dealing with integers either way.

And of course, if you have control over the code consuming the iterator, you can make it do whatever you want, so long as you don't violate the concept your iterator is written against.

Stashing Iterators

Stashing iterators are iterators where reference is (directly or indirectly) a reference to an object stored in the iterator. Therefore, if you make a copy of an iterator, the copy will return a reference to a different object than the original, even though they refer to the same element. And when you increment the iterator, previous references are no longer valid.

Stashing iterators are usually implemented because computing the value you want to return is expensive. Maybe it would involve a memory allocation (such as path::iterator) or maybe it would involve a possibly-complex operation that should only be done once (such as regex_iterator). So you only want to do it when necessary.

One of the foundations of ForwardIterator as a concept (or Concept) is that a range of these iterators represents a range over values which exist independently of their iterators. This permits multipass operation, but it also makes doing other things useful. You can store references to items in the range, and then iterate elsewhere.

If you need an iterator to be a ForwardIterator or higher, you should never make it a stashing iterator. Of course, the C++ standard library is not always consistent with itself. But it usually calls out its inconsistencies.

path::iterator is a stashing iterator. The standard says that it is a BidirectionalIterator; however, it also gives this type an exception to the reference/pointer preservation rule. This means that you cannot pass path::iterator to any code that might rely on that preservation rule.

Now, this doesn't mean you can't pass it to anything. Any algorithm which requires only InputIterator will be able to take such an iterator, since such code cannot rely on that rule. And of course, any code which you write or which specifically states in its documentation that it doesn't rely on that rule can be used. But there's no guarantee that you can use reverse_iterator on it, even though it says that it is a BidirectionalIterator.

regex_iterators are even worse in this regard. They are said to be a ForwardIterators based on their tag, but the standard never says that they actually are ForwardIterators (unlike path::iterator). And the specification of them as having reference be an actual reference to a member object makes it impossible for them to be true ForwardIterators.

Note that I made no distinction between the pre-C++20 concept and the Ranges Concept. That's because the std::forward_iterator Concept still forbids stashing iterators. This is by design.

Usage

Now obviously, you can do whatever you want in your code. But code you don't control will be under the domain of its owners. They will be writing against the old concepts, the new Concepts, or some other c/Concept or requirement that they specify. So your iterators need to be able to be compatible with their needs.

The algorithms that the Ranges introduces uses the new Concepts, so you can always rely on them to work with proxy iterators. However, as I understand it, the Range Concepts are not back-ported into older algorithms.

Personally, I would suggest avoiding stashing iterator implementations entirely. By providing complete support for proxy iterators, most stashing iterators can be rewritten to return values rather than references to objects.

For example, if there were a path_view type, path::iterator could have returned that instead of a full-fledged path. That way, if you want to do the expensive copy operation, you can. Similarly, the regex_iterators could have returned copies of the match object. The new Concepts make it possible to work that way by supporting proxy iterators.

Now, stashing iterators handle caching in a useful way; iterators can cache their results so that repeated *it usage only does the expensive operation once. But remember the problem with stashing iterators: returning a reference to their contents. You don't need to do that just to get caching. You can cache the results in an optional<T> (which you invalidate when the iterator is in/decremented). So you can still return a value. It may involve an additional copy, but reference shouldn't be a complex type.

Of course, all of this means that auto &val = *it; isn't legal code anymore. However, auto &&val = *it; will always work. This is actually a big part of the Range TS version of iterators.

What causes the stored std::list::iterator to become invalid?

Replacing

return items.at(noOfitems-1)->rbegin().base();

with

return std::next(items.at(noOfitems-1)->end(), -1);

in lines 63 and 86 fixes the crash.

Working solution: https://wandbox.org/permlink/I68Szb0XMRKsPZqp

#include <iostream>
#include <iostream>
#include <vector>
#include <random>
#include <unordered_set>
#include <set>
#include <unordered_map>
#include <map>
#include <list>
#include <utility>
#include <chrono>
#include <sstream>
#include <memory>

class Increment;
struct Item;

struct Pairhash {
public:
template <typename T>
std::size_t operator()(const T &x) const
{
return std::hash<T>()(x) ^ std::hash<T>()(x);
}
};

struct Item {
Item() = default;
std::string name;
int counter;
Item(std::string name) : counter(0)
{
this->name = name;
}

bool operator==(const Item& p) const
{
return (this->name == p.name);
}

bool operator<(const Item& p) const
{
return (this->counter < p.counter);
}
};

class Increment {
private:
std::unordered_map<std::string, std::pair<std::list<std::string>::iterator , std::shared_ptr<Item> >, Pairhash > itemMap;
std::vector<std::shared_ptr<std::list<std::string>>> items;
public:
Increment() = default;
std::list<std::string>::iterator insertItem(std::string & name , int noOfitems)
{
if (noOfitems > items.size())
{
items.emplace_back(std::make_shared<std::list<std::string>>(std::initializer_list<std::string>{ name }));
return items.back()->begin();
}
else
{
items.at(noOfitems-1)->emplace_back(name);
return std::next(items.at(noOfitems-1)->end(), -1); // versus return items.at(noOfitems-1)->rbegin().base(); //Crashes
}
}

std::list<std::string>::iterator adjustItemPosition(std::string & name, int noOfitems, std::list<std::string>::iterator & iter)
{
if (noOfitems > items.size())
{
std::list<std::string> temp{name};
items.push_back(std::make_shared<std::list<std::string>>(temp));
}
else
{
items.at(noOfitems-1)->emplace_back(name);
}
/* // Works as expected
auto itr = std::find(items.at(noOfitems-2)->begin(), items.at(noOfitems-2)->end(), name);
if (itr != items.at(noOfitems-2)->end())
{
items.at(noOfitems-2)->erase(itr);
}
*/
items.at(noOfitems-2)->erase(iter++); //TODO Crashes
return std::next(items.at(noOfitems-1)->end(), -1); //versus return items.at(noOfitems-1)->rbegin().base(); //Crashes
}

void incrementByOne(std::string name)
{
auto it = itemMap.find(name);
if (it != itemMap.end()) //Item already in map
{
it->second.second->counter++;
it->second.first = adjustItemPosition(name, it->second.second->counter,
it->second.first);
}
else //New item to be inserted
{
std::shared_ptr<Item> temp = std::make_shared<Item>(Item(name));
temp->counter = 1;
std::list<std::string>::iterator listIter = insertItem(name, 1);
itemMap.emplace(name, std::make_pair( listIter, temp));
}
}

std::string printTop10() const
{
std::stringstream ss;
auto count(0);
for (auto it = items.rbegin(); it != items.rend(); ++it)
{
for (auto item : **it)
{
if (count == 10)
{
break;
}
ss << "We have " << itemMap.at(item).second->counter << " " << item << std::endl;
++count;
}
}
return ss.str();
}
};

int main(int argc, const char * argv[]) {
Increment incrementer;
std::vector<std::string> names{ "Bananas", "Apples", "Peaches", "Durians", "Hazelnuts", "Avocados", "Pineapples", "Cherries", "Almonds", "Olives", "Eggs", "Yoghurts", "Peas", "Blueberries" };

for (int i = 0; i < 100; ++i) {
incrementer.incrementByOne(names.at(i%10));
}
std::cout << incrementer.printTop10() << std::endl;
return 0;
}

I believe the answer as to why this is so was given in https://stackoverflow.com/a/33851951, namely that the pointer to the reverse iterator is in fact a copy of the original reference to it, which is deleted when it goes out of scope.
http://cplusplus.github.io/LWG/lwg-defects.html#2360

Is this proper behavior? std::map iterator invalidation

As quoted here http://en.cppreference.com/w/cpp/container/map/rbegin

Reverse iterator stores an iterator to the next element than the one it actually refers to

the side effect would be that if you insert something before that iterator (incliding end()) you will see that new value when you dereference that reverse iterator. I do not think that reverse iterator is invalidated in this case.

Why can I not convert a reverse iterator to a forward iterator?

You could write a helper function. One particularity of reverse_iterator is that base() gives a forward iterator that is next from the value that the reverse iterator dereferences to. This is because a reverse iterator physically points to the element after the one it logically points to. So to have the forward iterator to the same item as your reverse_iterator, you'll need to decrement the result of base() by one, or you could increment the reverse iterator first, then take the .base() of that.

Both examples are shown below:

#include <iostream>
#include <vector>
#include <iterator>

//result is undefined if passed container.rend()
template <class ReverseIterator>
typename ReverseIterator::iterator_type make_forward(ReverseIterator rit)
{
return --(rit.base()); // move result of .base() back by one.
// alternatively
// return (++rit).base() ;
// or
// return (rit+1).base().
}

int main()
{
std::vector<int> vec(1, 1);
std::vector<int>::reverse_iterator rit = vec.rbegin();
std::vector<int>::iterator fit = make_forward(rit);
std::cout << *fit << ' ' << *rit << '\n';
}

Warning: this behavior is different from that of the reverse_iterator(iterator) constructor.

vector::erase and reverse_iterator

In the question you have already quoted exactly what the standard says a reverse_iterator is:

The relationship between reverse_iterator and iterator is &*(reverse_iterator(i)) == &*(i- 1)

Remember that a reverse_iterator is just an 'adaptor' on top of the underlying iterator (reverse_iterator::current). The 'reference point', as you put it, for a reverse_iterator is that wrapped iterator, current. All operations on the reverse_iterator really occur on that underlying iterator. You can obtain that iterator using the reverse_iterator::base() function.

If you erase --vect_rit.base(), you are in effect erasing --current, so current will be invalidated.

As a side note, the expression --vect_rit.base() might not always compile. If the iterator is actually just a raw pointer (as might be the case for a vector), then vect_rit.base() returns an rvalue (a prvalue in C++11 terms), so the pre-decrement operator won't work on it since that operator needs a modifiable lvalue. See "Item 28: Understand how to use a reverse_iterator's base iterator" in "Effective STL" by Scott Meyers. (an early version of the item can be found online in "Guideline 3" of http://www.drdobbs.com/three-guidelines-for-effective-iterator/184401406).

You can use the even uglier expression, (++vect_rit).base(), to avoid that problem. Or since you're dealing with a vector and random access iterators: vect_rit.base() - 1

Either way, vect_rit is invalidated by the erase because vect_rit.current is invalidated.

However, remember that vector::erase() returns a valid iterator to the new location of the element that followed the one that was just erased. You can use that to 're-synchronize' vect_rit:

vect_rit = vector_type::reverse_iterator( vector.erase(vect_rit.base() - 1));


Related Topics



Leave a reply



Submit