Why Are Standard Iterator Ranges [Begin, End) Instead of [Begin, End]

Why are Standard iterator ranges [begin, end) instead of [begin, end]?

The best argument easily is the one made by Dijkstra himself:

You want the size of the range to be a simple difference end − begin;
including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n₀, n₁)[n₁, n₂)[n₂,n₃). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

Is there a type in the standard for storing the begin- and end-iterators of a container?

Is there a type in the standard for storing the begin- and end-iterators of a container?

Your description matches closely with the concept of a range.

std::ranges::subrange can be created from a pair of iterators. That will be templated based on the iterator type.

If you need to hide the iterator type for runtime polymorphism, you would need ranges::any_view which unfortunately isn't in the standard implementation of the ranges. Furthermore, there is a runtime cost associated with it.

For contiguous containers, another alternative is std::span which can point to any contiguous range without the cost of type erasure. For strings in particular, yet another alternative is std::string_view.

Why the begin/end vs first/last discrepancy?

The reason is likely to be historical: this is what they were called by Stepanov and Lee, first implementers of STL, which later evolved into C++ Standard Library.

Here is Stepanov's paper on STL. Page 47 describes sort

template <class RandomAccessIterator>
void sort(RandomAccessIterator first, RandomAccessIterator last);

Page 19 describes containers' operations begin() and end().

Note that in addition to begin/first and end/last C++ Standard Library describes optional sequence operations front() and back(). The difference in naming is easy to understand here, because the operations must be available on the same container, and back() is inclusive.

Why does a std::variant compile with begin and end iterators?

Your code compiles, because begin() and end() are dependent names - they depend on the function template arguments, so their lookup is postponed until the flat template instantiation. But it is never instantiated!

If you add the following, your code won't compile anymore:

int main () {
     &flat<int, int>;
}

Why are std::begin and std::end not memory safe?

The get_data function returns an object. When used the way shown, that object will be a temporary object, which will be destructed once the full expression ends. The iterator now references a vector object which no longer exists, and can't be dereferenced or used in any useful way.

What is the difference between std::ranges::begin and std::begin?

There are a few differences.

First, ranges::begin(x) works on all ranges while std::begin(x) does not. The latter will not do ADL lookup on begin, so ranges specified like:

struct R {
    ...
};
auto begin(R const&);
auto end(R const&);

won't work, which is why you have to write something like:

using std::begin, std::end;
auto it = begin(r);

You don't have to do that two-step with ranges::begin.

Second, ranges::begin(x) is a little safer. Ranges introduces this notion of a borrowed range, which is a range whose iterators that you can hold onto safely. vector<int> for instance is not a borrowed range - since once the vector dies the data dies. ranges::begin guards against that:

auto get_data() -> std::vector<int>;

auto a = std::begin(get_data());    // ok, but now we have a dangling iterator
auto b = ranges::begin(get_data()); // ill-formed

Third, ranges::begin and ranges::end have extra type checks. ranges::begin(r) requires the result of either r.begin() or begin(r) to model input_or_output_iterator. ranges::end(r) requires ranges::begin(r) to be valid and requires either r.end() or end(r) to model sentinel_for<decltype(ranges::begin(r))>. That is - that whatever we get from begin and end is actually a range.

This means that, for instance:

struct X {
    int begin() const { return 42; }
};

X x;
auto a = std::begin(x);    // ok, a == 42
auto b = ranges::begin(x); // ill-formed, int is not an iterator

Although more annoyingly is a case where you have an iterator type that might be incrementable, dereferenceable, comparable, etc... but fail to have a default constructor. That does not meet the requirements of C++20's input_or_output_iterator so ranges::begin will fail.

Fourth, ranges::begin is a function object, while std::begin is a set of overloaded function templates:

auto f = ranges::begin; // ok
auto g = std::begin;    // error: which std::begin did you want?

Fifth, some of the ranges customization point objects have other fallback behavior besides just calling a function of that name. std::size(r) always invokes a function named size (unless r is a raw array). std::empty(r) always invokes a function named empty(unless r is a raw array, in which case it's just false, or r is an initializer_list, in which case r.size() == 0). But ranges::size could under certain circumstances perform ranges::end(r) - ranges::begin(r) (as a fallback if size(r) and r.size() don't exist) just like ranges::empty could under certain circumstances either do ranges::size(r) == 0 or ranges::begin(r) == ranges::end(r).

why do values returned by std::end() change as the container changes, but not with std::begin()?

To get the last element's iterator, it can be achieved by: std::prev(std::end(l)). Your code stored the end iterator and dereference it, it's UB.

And the doc of std::list::end:

Returns an iterator to the element following the last element of the
container, This element acts as a placeholder; attempting to access it
results in undefined behavior.

For std::begin, we get the iterator to the first element of the container, it's safe the deference it and gets the corresponding element.

#include <iostream>
#include <list>
#include <unordered_map>

int main() {
  std::list<int> l;
  std::unordered_map<int, std::list<int>::iterator> listItems;

  for (int i = 0; i < 5; i++) {
    l.push_back(i);
    listItems[i] = std::prev(std::end(l));
  }

  for (int i = 0; i < 5; i++) std::cout << *(listItems[i]) << " ";
  std::cout << std::endl;
}

Online demo

Are view iterators valid beyond the lifetime of the view?

Are view iterators valid beyond the lifetime of the view?

The property here is called a borrowed range. If a range is a borrowed range, then its iterators are still valid even if a range is destroyed. R&, if R is a range, is the most trivial kind of borrowed range - since it's not the lifetime of the reference that the iterators would be tied into. There are several other familiar borrowed ranges - like span and string_view.

Some range adaptors are conditionally borrowed (P2017). That is, they don't add any additional state on top of the range they are adapting -- so the adapted range can be borrowed if the underlying range is (or underlying ranges are). For instance, views::reverse(r) is borrowed whenever r is borrowed. But views::split(r, pat) isn't conditionally borrowed - because the pattern is stored in the adaptor itself rather than in the iterators (hypothetically, it could also be stored in the iterators, at a cost).

views::values(r) is an example of such: it is a borrowed range whenever r is borrowed. And, in your example, the underlying range is a ref_view, which is itself always borrowed (by the same principle that R& is always borrowed).

Note that here:

auto begin() { return std::ranges::begin(values()); }
auto end() { return std::ranges::end(values()); }

Passing an rvalue range into ranges::begin is only valid if the range is borrowed. That's [range.access.begin]/2.1:

If E is an rvalue and enable_borrowed_range<remove_cv_t<T>> is false, ranges::begin(E) is ill-formed.

So because your original code compiled, you can be sure that it is valid.

How to interlace an iterator with itself from the end?

This solution works for all iterators that implement DoubleEndedIterator:

Note that this solution is guaranteed to return all items, regardless of whether the iterator contains an even or odd number of them.

fn selfinterlace<Iter>(mut iter: Iter) -> impl Iterator<Item = Iter::Item>
where
    Iter: DoubleEndedIterator,
{
    let mut from_front = false;

    std::iter::from_fn(move || {
        from_front = !from_front;
        if from_front {
            iter.next()
        } else {
            iter.next_back()
        }
    })
}

fn main() {
    let range = (0..=8).into_iter();
    let iter = selfinterlace(range);
    println!("{:?}", iter.collect::<Vec<_>>());
}

[0, 8, 1, 7, 2, 6, 3, 5, 4]

The idea is that you store whether the next item should be from the front or the back, and then flip that in every iteration.

from_fn can take FnMut, meaning, it can take closures that store internal state. The internal state in this closure consists of the variables iter and from_front, which get moved into the closure through the move || keyword.

Why in range for loop do begin/end need to be copyable?

Is there a reason the begin/end are taken like that, or is it an oversight? Wouldn't it be better to either move them:

In general no, because such specification could potentially prohibit copy elision.

Why in range for loop do begin/end need to be copyable?

The initialisation that you quote doesn't generally necessitate the iterators to be copyable, except in an unusual case where begin / end return an lvalue reference. I'm not sure if anyone cares about such case, given that would violate concept requirements.

Why Are Standard Iterator Ranges [Begin, End) Instead of [Begin, End]