How to build a basic iterator?
Iterator objects in python conform to the iterator protocol, which basically means they provide two methods: __iter__()
and __next__()
.
The
__iter__
returns the iterator object and is implicitly called
at the start of loops.The
__next__()
method returns the next value and is implicitly called at each loop increment. This method raises a StopIteration exception when there are no more value to return, which is implicitly captured by looping constructs to stop iterating.
Here's a simple example of a counter:
class Counter:
def __init__(self, low, high):
self.current = low - 1
self.high = high
def __iter__(self):
return self
def __next__(self): # Python 2: def next(self)
self.current += 1
if self.current < self.high:
return self.current
raise StopIteration
for c in Counter(3, 9):
print(c)
This will print:
3
4
5
6
7
8
This is easier to write using a generator, as covered in a previous answer:
def counter(low, high):
current = low
while current < high:
yield current
current += 1
for c in counter(3, 9):
print(c)
The printed output will be the same. Under the hood, the generator object supports the iterator protocol and does something roughly similar to the class Counter.
David Mertz's article, Iterators and Simple Generators, is a pretty good introduction.
Can I make an iterator with a simple function? (No generator or Symbol.iterator)
The problem is that your iterateThis
function returns an iterator but the for/of
construct expects a iterable.
Okay, wait, whats the difference?
From MDN's page on iteration protocols:
In order to be iterable, an object must implement the
@@iterator
method, meaning that the object (or one of the objects up its
prototype chain) must have a property with a@@iterator
key which is
available via constantSymbol.iterator
:
On the other hand:
An object is an iterator when it implements a
next()
method with the
following semantics: Ommited due to length, TL;DR: The next method
returns an object of the form:{value: T, done: boolean}
They are related in that calling the @@iterator
method of an iterable returns an iterator.
The for/of
loop always expects an iterable, so if you want to use for/of
, you have to use @@iterator
/Symbol.iterator
. There's just no way around it as far as I know. But your snippet can be easily modified to use it by just creating an object that returns your iterator when it's Symbol.iterator
method is called:
function iterateThis(arr){ let i = 0; return { next: function() { return i < arr.length ? {value: arr[i++], done: false} : {done: true}; } };}
function makeIterableFromIterator(iterator) { return { [Symbol.iterator]: function() { return iterator; } }}
const iterator = iterateThis([1, 2, 3, 4, 5]);const iterable = makeIterableFromIterator(iterator);
for (item of iterable) { console.log(item);}
Iterable and iterator
The csv.reader
object is its own iterator. This is a common practice for iterables which are single-pass (i.e. can only be run through once). We can confirm this by inspection.
>>> data
<_csv.reader object at 0x7fe5d4a057b0>
>>> iter(data)
<_csv.reader object at 0x7fe5d4a057b0> # Note: Same as above
>>> id(data)
140625091516336
>>> id(iter(data))
140625091516336 # Note: Same as above
>>> data is iter(data)
True
Compare this to something like a list, which is an iterable but is not itself an iterator.
>>> lst = [1, 2, 3]
>>> lst
[1, 2, 3]
>>> iter(lst)
<list_iterator object at 0x7fe5d59747f0> # Note: NOT the same as before
>>> lst is iter(lst)
False
This allows us to iterate over a list several times by calling iter(lst)
multiple times, since each call gives us a fresh iterator. But your csv.reader
object is single-pass, so we only have the one iterator to it.
In Python, every iterator is an iterable, but not every iterable is an iterator. From the glossary
Iterators are required to have an
__iter__()
method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.
Do Python iterators formally require an __iter__ method?
That's the tutorial. It glosses over things. If you check the data model documentation, you'll see an explicit requirement that iterators support __iter__
:
The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:
iterator.__iter__()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.
iterator.__next__()
...
Python easily could have been designed to make iterators non-iterable, the way Java did, but that would have been counterproductive. Iterating over iterators is extremely common and standard in Python, and it was always intended to be.
Having to stick some kind of iterwrapper
around iterators in every for
loop would be like having to stick some kind of addablewrapper
around your integers every time you wanted to add two integers.
Iterable class in python3
Your __next__
method uses yield
, which makes it a generator function. Generator functions return a new iterator when called.
But the __next__
method is part of the iterator interface. It should not itself be an iterator. __next__
should return the next value, not something that returns all values(*).
Because you wanted to create an iterable, you can just make __iter__
the generator here:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id
Note that a generator function should not use raise StopIteration
, just returning from the function does that for you.
The above class is an iterable. Iterables only have an __iter__
method, and no __next__
method. Iterables produce an iterator when __iter__
is called:
Iterable -> (call __iter__
) -> Iterator
In the above example, because Test.__iter__
is a generator function, it creates a new object each time we call it:
>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>
A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:
>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2
Note that I called __next__()
on the object returned by test.__iter__()
, the iterator, not on test
itself, which doesn't have that method because it is only an iterable, not an iterator.
Iterators also have an __iter__
method, which always must return self
, because they are their own iterators. It is the __next__
method that makes them an iterator, and the job of __next__
is to be called repeatedly, until it raises StopIteration
. Until StopIteration
is raised, each call should return the next value. Once an iterator is done (has raised StopIteration
), it is meant to then always raise StopIteration
. Iterators can only be used once, unless they are infinite (never raise StopIteration
and just keep producing values each time __next__
is called).
So this is an iterator:
class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value
This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration
yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids)
or (i for i in ids)
you are creating a different iterator to delegate __next__
calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.
You don't usually see anything calling __iter__
or __next__
in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter()
and next()
functions, or just use the object in syntax or a function call that accepts an iterable.
The for
loop is such syntax. When you use a for
loop, Python uses the (moral equivalent) of calling __iter__()
on the object, then __next__()
on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:
>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE
The GET_ITER
opcode at position 2 calls test.__iter__()
, and FOR_ITER
uses __next__
on the resulting iterator to keep looping (executing STORE_NAME
to set t
to the next value, then jumping back to position 4), until StopIteration
is raised. Once that happens, it'll jump to position 10 to end the loop.
If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter()
and next()
on them. Like lists or tuples:
>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)
You could make Test
, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)
class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value
That's a lot like the original IteratorTest
class above, but TestIterator
keeps a reference to the Test
instance. That's really how tuple_iterator
works too.
A brief, final note on naming conventions here: I am sticking with using self
for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me
, however cute or short it may seem.
(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby()
iterator does, it is an iterator producing (object, group_iterator)
tuples, but I digress).
Related Topics
How to Modify List Entries During For Loop
Why Is _Init_() Always Called After _New_()
What Exactly Is Current Working Directory
How to "Log In" to a Website Using Python'S Requests Module
How to Sort a List of Strings Numerically
How to Programmatically Set an Attribute
How to Use the Apply() Function For a Single Column
Pip Install Failing With: Oserror: [Errno 13] Permission Denied on Directory
How to Fix "Attempted Relative Import in Non-Package" Even With _Init_.Py
How to Schedule Updates (F/E, to Update a Clock) in Tkinter
Why Is This Printing 'None' in the Output
Delete a Column from a Pandas Dataframe
How to Select a Drop-Down Menu Value With Selenium Using Python
Py2Exe - Generate Single Executable File
Do Regular Expressions from the Re Module Support Word Boundaries (\B)