How to Build a Basic Iterator

How to build a basic iterator?

Iterator objects in python conform to the iterator protocol, which basically means they provide two methods: __iter__() and __next__().

  • The __iter__ returns the iterator object and is implicitly called
    at the start of loops.

  • The __next__() method returns the next value and is implicitly called at each loop increment. This method raises a StopIteration exception when there are no more value to return, which is implicitly captured by looping constructs to stop iterating.

Here's a simple example of a counter:

class Counter:
def __init__(self, low, high):
self.current = low - 1
self.high = high

def __iter__(self):
return self

def __next__(self): # Python 2: def next(self)
self.current += 1
if self.current < self.high:
return self.current
raise StopIteration


for c in Counter(3, 9):
print(c)

This will print:

3
4
5
6
7
8

This is easier to write using a generator, as covered in a previous answer:

def counter(low, high):
current = low
while current < high:
yield current
current += 1

for c in counter(3, 9):
print(c)

The printed output will be the same. Under the hood, the generator object supports the iterator protocol and does something roughly similar to the class Counter.

David Mertz's article, Iterators and Simple Generators, is a pretty good introduction.

Can I make an iterator with a simple function? (No generator or Symbol.iterator)

The problem is that your iterateThis function returns an iterator but the for/of construct expects a iterable.

Okay, wait, whats the difference?

From MDN's page on iteration protocols:

In order to be iterable, an object must implement the @@iterator
method, meaning that the object (or one of the objects up its
prototype chain) must have a property with a @@iterator key which is
available via constant Symbol.iterator:

On the other hand:

An object is an iterator when it implements a next() method with the
following semantics: Ommited due to length, TL;DR: The next method
returns an object of the form: {value: T, done: boolean}

They are related in that calling the @@iterator method of an iterable returns an iterator.

The for/of loop always expects an iterable, so if you want to use for/of, you have to use @@iterator/Symbol.iterator. There's just no way around it as far as I know. But your snippet can be easily modified to use it by just creating an object that returns your iterator when it's Symbol.iterator method is called:

function iterateThis(arr){    let i = 0;    return {        next: function() {            return i < arr.length ?                {value: arr[i++], done: false} :                {done: true};        }     };}
function makeIterableFromIterator(iterator) { return { [Symbol.iterator]: function() { return iterator; } }}
const iterator = iterateThis([1, 2, 3, 4, 5]);const iterable = makeIterableFromIterator(iterator);
for (item of iterable) { console.log(item);}

Iterable and iterator

The csv.reader object is its own iterator. This is a common practice for iterables which are single-pass (i.e. can only be run through once). We can confirm this by inspection.

>>> data
<_csv.reader object at 0x7fe5d4a057b0>
>>> iter(data)
<_csv.reader object at 0x7fe5d4a057b0> # Note: Same as above
>>> id(data)
140625091516336
>>> id(iter(data))
140625091516336 # Note: Same as above
>>> data is iter(data)
True

Compare this to something like a list, which is an iterable but is not itself an iterator.

>>> lst = [1, 2, 3]
>>> lst
[1, 2, 3]
>>> iter(lst)
<list_iterator object at 0x7fe5d59747f0> # Note: NOT the same as before
>>> lst is iter(lst)
False

This allows us to iterate over a list several times by calling iter(lst) multiple times, since each call gives us a fresh iterator. But your csv.reader object is single-pass, so we only have the one iterator to it.

In Python, every iterator is an iterable, but not every iterable is an iterator. From the glossary

Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.

Do Python iterators formally require an __iter__ method?

That's the tutorial. It glosses over things. If you check the data model documentation, you'll see an explicit requirement that iterators support __iter__:

The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:

iterator.__iter__()

Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.

iterator.__next__()

...

Python easily could have been designed to make iterators non-iterable, the way Java did, but that would have been counterproductive. Iterating over iterators is extremely common and standard in Python, and it was always intended to be.

Having to stick some kind of iterwrapper around iterators in every for loop would be like having to stick some kind of addablewrapper around your integers every time you wanted to add two integers.

Iterable class in python3

Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.

But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).

Because you wanted to create an iterable, you can just make __iter__ the generator here:

class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id

Note that a generator function should not use raise StopIteration, just returning from the function does that for you.

The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:

Iterable -> (call __iter__) -> Iterator

In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:

>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>

A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:

>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2

Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.

Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).

So this is an iterator:

class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value

This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.

You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.

The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:

>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE

The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.

If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:

>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)

You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):

class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)

class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value

That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.

A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.


(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).



Related Topics



Leave a reply



Submit