Why Use Iterators Instead of Array Indices

Why use iterators instead of array indices?

The first form is efficient only if vector.size() is a fast operation. This is true for vectors, but not for lists, for example. Also, what are you planning to do within the body of the loop? If you plan on accessing the elements as in

T elem = some_vector[i];

then you're making the assumption that the container has operator[](std::size_t) defined. Again, this is true for vector but not for other containers.

The use of iterators bring you closer to container independence. You're not making assumptions about random-access ability or fast size() operation, only that the container has iterator capabilities.

You could enhance your code further by using standard algorithms. Depending on what it is you're trying to achieve, you may elect to use std::for_each(), std::transform() and so on. By using a standard algorithm rather than an explicit loop you're avoiding re-inventing the wheel. Your code is likely to be more efficient (given the right algorithm is chosen), correct and reusable.

Use of iterators over array indices

I presume you are talking about when using a vector, right?

The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).

Iterators.. why use them?

Note that the usually implementation of vector won't use an "int" as the type of the index/size. So your code will at the very least provoke compiler warnings.

Genericity

Iterators increase the genericity of your code.

For example:

typedef std::vector<int> Container ;

void doSomething(Container & p_aC)
{
for(Container::iterator it = p_aC.begin(), itEnd = p_aC.end(); it != itEnd; ++it)
{
int & i = *it ; // i is now a reference to the value iterated
// do something with "i"
}
}

Now, let's imagine you change the vector into a list (because in your case, the list is now better). You only need to change the typedef declaration, and recompile the code.

Should you have used index-based code instead, it would have needed to be re-written.

Access

The iterator should be viewed like a kind of super pointer.
It "points" to the value (or, in case of maps, to the pair of key/value).

But it has methods to move to the next item in the container. Or the previous. Some containers offer even random access (the vector and the deque).

Algorithms

Most STL algorithms work on iterators or on ranges of iterators (again, because of genericity). You won't be able to use an index, here.

Iterator Loop vs index loop

The special thing about iterators is that they provide the glue between algorithms and containers. For generic code, the recommendation would be to use a combination of STL algorithms (e.g. find, sort, remove, copy) etc. that carries out the computation that you have in mind on your data structure (vector, list, map etc.), and to supply that algorithm with iterators into your container.

Your particular example could be written as a combination of the for_each algorithm and the vector container (see option 3) below), but it's only one out of four distinct ways to iterate over a std::vector:

1) index-based iteration

for (std::size_t i = 0; i != v.size(); ++i) {
// access element as v[i]

// any code including continue, break, return
}

Advantages: familiar to anyone familiar with C-style code, can loop using different strides (e.g. i += 2).

Disadvantages: only for sequential random access containers (vector, array, deque), doesn't work for list, forward_list or the associative containers. Also the loop control is a little verbose (init, check, increment). People need to be aware of the 0-based indexing in C++.

2) iterator-based iteration

for (auto it = v.begin(); it != v.end(); ++it) {
// if the current index is needed:
auto i = std::distance(v.begin(), it);

// access element as *it

// any code including continue, break, return
}

Advantages: more generic, works for all containers (even the new unordered associative containers, can also use different strides (e.g. std::advance(it, 2));

Disadvantages: need extra work to get the index of the current element (could be O(N) for list or forward_list). Again, the loop control is a little verbose (init, check, increment).

3) STL for_each algorithm + lambda

std::for_each(v.begin(), v.end(), [](T const& elem) {
// if the current index is needed:
auto i = &elem - &v[0];

// cannot continue, break or return out of the loop
});

Advantages: same as 2) plus small reduction in loop control (no check and increment), this can greatly reduce your bug rate (wrong init, check or increment, off-by-one errors).

Disadvantages: same as explicit iterator-loop plus restricted possibilities for flow control in the loop (cannot use continue, break or return) and no option for different strides (unless you use an iterator adapter that overloads operator++).

4) range-for loop

for (auto& elem: v) {
// if the current index is needed:
auto i = &elem - &v[0];

// any code including continue, break, return
}

Advantages: very compact loop control, direct access to the current element.

Disadvantages: extra statement to get the index. Cannot use different strides.

What to use?

For your particular example of iterating over std::vector: if you really need the index (e.g. access the previous or next element, printing/logging the index inside the loop etc.) or you need a stride different than 1, then I would go for the explicitly indexed-loop, otherwise I'd go for the range-for loop.

For generic algorithms on generic containers I'd go for the explicit iterator loop unless the code contained no flow control inside the loop and needed stride 1, in which case I'd go for the STL for_each + a lambda.

Why is != used with iterators instead of ?

All iterators are equality comparable. Only random access iterators are relationally comparable. Input iterators, forward iterators, and bidirectional iterators are not relationally comparable.

Thus, the comparison using != is more generic and flexible than the comparison using <.


There are different categories of iterators because not all ranges of elements have the same access properties. For example,

  • if you have an iterators into an array (a contiguous sequence of elements), it's trivial to relationally compare them; you just have to compare the indices of the pointed to elements (or the pointers to them, since the iterators likely just contain pointers to the elements);

  • if you have iterators into a linked list and you want to test whether one iterator is "less than" another iterator, you have to walk the nodes of the linked list from the one iterator until either you reach the other iterator or you reach the end of the list.

The rule is that all operations on an iterator should have constant time complexity (or, at a minimum, sublinear time complexity). You can always perform an equality comparison in constant time since you just have to compare whether the iterators point to the same object. So, all iterators are equality comparable.


Further, you aren't allowed to increment an iterator past the end of the range into which it points. So, if you end up in a scenario where it != foo.end() does not do the same thing as it < foo.end(), you already have undefined behavior because you've iterated past the end of the range.

The same is true for pointers into an array: you aren't allowed to increment a pointer beyond one-past-the-end of the array; a program that does so exhibits undefined behavior. (The same is obviously not true for indices, since indices are just integers.)

Some Standard Library implementations (like the Visual C++ Standard Library implementation) have helpful debug code that will raise an assertion when you do something illegal with an iterator like this.

Why use string::iterator rather than index?

The index can only be used for containers that support random access - direct access to a given position.

The iterator offers a unified way to access any collection/data structure. The flexibility when refactoring your code is immense.

Is there any advantage of using a range for loop rather than an iterator?

Iterators predate range-based for loops, so they used to be the only of these two alternatives available. Nowadays, range-based for has mostly replaced iterators when dealing with a simple loop.

However, iterators can be used in various other contexts as well. For example, you can do std::sort(v.begin(), std::next(v.begin(), 5)) to sort the first five elements of a vector while leaving the rest of it alone.

Going back to iterating over a whole container:

  • If you can accomplish what you want with a range-based for, then it leads to more legible code, so they are preferable by default.

  • If you need iterators for some reason, such as using an algorithm that requires them, or because you need to jump ahead or back while iterating, then use those instead.

Also: In the later case, you can/should still use auto when declaring the iterator:

for(auto it = just_a_vec.begin(); it < just_a_vec.end(); it++) {
}

Edit: as asked: here's a simple, if a bit contrived, example where an iterator-based loop can still be useful:

// adds all values in the vector, but skips over twos values when encountering a 0
// e.g.: {1,2,0,4,5,2} => 5
int my_weird_accum(const std::vector<int>& data) {
int result = 0;

for(auto it = data.begin(); it != data.end(); ++it) {
auto v = *it;
result += v;

if(v == 0) {
// skip over the next two
assert(std::distance(it, data.end()) > 2);
std::advance(it, 2);
}
}
return 0;
}

why we recommend iterators instead of lists (low-level explanation)

But why does iterators takes a tiny constant size and why is looping over the iterator re-uses the same memory location again an again?

Let's say you are reading lines from a file. If you were to create a list from all the lines in the file:

lines = myfile.readlines()
for line in lines:
...

...this loads the entire file into memory. If the file is sufficiently large, you will consume allt he available memory and your program will crash.

On the other hand, if you use the iterator:

for line in myfile:
...

Then Python only needs to read in enough data to find the next EOL character. This uses substantially less memory as long as you are working with a line-oriented file (if the file has no EOL characters, then of course there is no advantage in this example).

The same reasoning applies, for example, to xrange() vs range() (where the latter returns a list, which will consumes a large number of resources if the range is large, while the former only needs to maintain a counter).



Related Topics



Leave a reply



Submit