What Are Iterator, Iterable, and Iteration

What are iterator, iterable, and iteration?

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

In Python, iterable and iterator have specific meanings.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.

What is the difference between iterator and iterable and how to use them?

An Iterable is a simple representation of a series of elements that can be iterated over. It does not have any iteration state such as a "current element". Instead, it has one method that produces an Iterator.

An Iterator is the object with iteration state. It lets you check if it has more elements using hasNext() and move to the next element (if any) using next().

Typically, an Iterable should be able to produce any number of valid Iterators.

Iterable and iterator

The csv.reader object is its own iterator. This is a common practice for iterables which are single-pass (i.e. can only be run through once). We can confirm this by inspection.

>>> data
<_csv.reader object at 0x7fe5d4a057b0>
>>> iter(data)
<_csv.reader object at 0x7fe5d4a057b0> # Note: Same as above
>>> id(data)
140625091516336
>>> id(iter(data))
140625091516336 # Note: Same as above
>>> data is iter(data)
True

Compare this to something like a list, which is an iterable but is not itself an iterator.

>>> lst = [1, 2, 3]
>>> lst
[1, 2, 3]
>>> iter(lst)
<list_iterator object at 0x7fe5d59747f0> # Note: NOT the same as before
>>> lst is iter(lst)
False

This allows us to iterate over a list several times by calling iter(lst) multiple times, since each call gives us a fresh iterator. But your csv.reader object is single-pass, so we only have the one iterator to it.

In Python, every iterator is an iterable, but not every iterable is an iterator. From the glossary

Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.

Iterator vs Iterable?

An iterator is an iterable, but an iterable is not necessarily an iterator.

An iterable is anything that has an __iter__ method defined - e.g. lists and tuples, as well as iterators.

Iterators are a subset of iterables whose values cannot all be accessed at the same time, as they are not all stored in memory at once. These can be generated using functions like map, filter and iter, as well as functions using yield.

In your example, map returns an iterator, which is also an iterable, which is why both functions work with it. However, if we take a list for instance:

>>> lst = [1, 2, 3]
>>> list(lst)
[1, 2, 3]
>>> next(lst)
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
next(lst)
TypeError: 'list' object is not an iterator

we can see that next complains, because the list, an iterable, is not an iterator.

Is an iterator also an iterable?

An iterable needs to implement an __iter__ method or a __getitem__ method:

An object can be iterated over with for if it implements __iter__() or __getitem__().

An iterator needs a __iter__ method (that returns self) and a __next__ method (I'm not 100% sure about the __next__).

it is true that an iterator always has __iter__ method?

Yes!

This is also documented in the Data model:

object.__iter__(self)

This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container.

Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types.

(Emphasis mine)

As to your second question:

Is an iterator also an iterable?

Yes, because it has a __iter__ method.

Additional notes

Besides the formal implementation it's easy to check if something is iterable by just checking if iter() can be called on it:

def is_iterable(something):
try:
iter(something)
except TypeError:
return False
else:
return True

Likewise it's possible to check if something is an iterator by checking if iter() called on something returns itself:

def is_iterator(something):
try:
return iter(something) is something # it needs to return itself to be an iterator
except TypeError:
return False

But don't use them in development code, these are just for "visualization". Mostly you just iterator over something using for ... in ... or if you need an iterator you use iterator = iter(...) and then process the iterator by calling next(iterator) until it throws a StopIteration.

Why is Java's Iterator not an Iterable?

Because an iterator generally points to a single instance in a collection. Iterable implies that one may obtain an iterator from an object to traverse over its elements - and there's no need to iterate over a single instance, which is what an iterator represents.

Confusion about iterators and iterables in Python

The documentation is creating some confusion here, by re-using the term 'iterator'.

There are three components to the iterator protocol:

  1. Iterables; things you can potentially iterate over and get their elements, one by one.

  2. Iterators; things that do the iteration. Every time you want to step through all items of an iterable, you need one of these to keep track of where you are in the process. These are not re-usable; once you reach the end, that's it. For most iterables, you can create multiple indepedent iterators, each tracking position independently.

  3. Consumers of iterators; those things that want to do something with the items.

A for loop is an example of the latter, so #3. A for loop uses the iter() function to produce an iterator (#2 above) for whatever you want to loop over, so that "whatever" must be an iterable (#1 above).

range() is an example of #1; it is iterable object. You can iterate over it multiple times, independently:

>>> r = range(5)
>>> r_iter_1 = iter(r)
>>> next(r_iter_1)
0
>>> next(r_iter_1)
1
>>> r_iter_2 = iter(r)
>>> next(r_iter_2)
0
>>> next(r_iter_1)
2

Here r_iter_1 and r_iter_2 are two separate iterators, and each time you ask for a next item they do so based on their own internal bookkeeping.

list() is an example of both an iterable (#1) and a iteration consumer (#3). If you pass another iterable (#1) to the list() call, a list object is produced containing all elements from that iterable. But list objects themselves are also iterables.

zip(), in Python 3, takes in multiple iterables (#1), and is itself an iterator (#2). zip() stores a new iterator (#2) for each of the iterables you gave it. Each time you ask zip() for the next element, zip() builds a new tuple with the next elements from each of the contained iterables:

>>> lst1, lst2 = ['foo', 'bar'], [42, 81]
>>> zipit = zip(lst1, lst2)
>>> next(zipit)
('foo', 42)
>>> next(zipit)
('bar', 81)

So in the end, list(zip(list1, list2)) uses both list1 and list2 as iterables (#1), zip() consumes those (#3) when it itself is being consumed by the outer list() call.

What exactly does iterable mean in Python? Why isn't my object which implements `__getitem__()` an iterable?

I think the point of confusion here is that, although implementing __getitem__ does allow you to iterate over an object, it isn't part of the interface defined by Iterable.

The abstract base classes allow a form of virtual subclassing, where classes that implement the specified methods (in the case of Iterable, only __iter__) are considered by isinstance and issubclass to be subclasses of the ABCs even if they don't explicitly inherit from them. It doesn't check whether the method implementation actually works, though, just whether or not it's provided.

For more information, see PEP-3119, which introduced ABCs.


using isinstance(e, collections.Iterable) is the most pythonic way
to check if an object is iterable

I disagree; I would use duck-typing and just attempt to iterate over the object. If the object isn't iterable a TypeError will be raised, which you can catch in your function if you want to deal with non-iterable inputs, or allow to percolate up to the caller if not. This completely side-steps how the object has decided to implement iteration, and just finds out whether or not it does at the most appropriate time.


To add a little more, I think the docs you've quoted are slightly misleading. To quote the iter docs, which perhaps clear this up:

object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence
protocol (the __getitem__() method with integer arguments starting
at 0).

This makes it clear that, although both protocols make the object iterable, only one is the actual "iteration protocol", and it is this that isinstance(thing, Iterable) tests for. Therefore we could conclude that one way to check for "things you can iterate over" in the most general case would be:

isinstance(thing, (Iterable, Sequence))

although this does also require you to implement __len__ along with __getitem__ to "virtually sub-class" Sequence.



Related Topics



Leave a reply



Submit