What is the difference between an Iterator and a Generator?
Generators are iterators, but not all iterators are generators.
An iterator is typically something that has a next method to get the next element from a stream. A generator is an iterator that is tied to a function.
For example a generator in python:
def genCountingNumbers():
n = 0
while True:
yield n
n = n + 1
This has the advantage that you don't need to store infinite numbers in memory to iterate over them.
You'd use this as you would any iterator:
for i in genCountingNumbers():
print i
if i > 20: break # Avoid infinite loop
You could also iterate over an array:
for i in ['a', 'b', 'c']:
print i
Explain: Every generator is an iterator, but not vice versa
I know what's iterator, what's generator, what's iteration protocol, how to create both.
What's an iterator?
Per the glossary, an iterator is "an object representing a stream of data". It has an __iter__() method returns itself, and it has a next() method (which is __next__() in Python 3). The next-method is responsible for returning a value, advancing the iterator, and raising StopIteration when done.
What is a generator?
A generator is a regular Python function containing yield
. When called it returns a generator-iterator (one of the many kinds of iterator).
Examples of how to create generators and iterators
Generator example:
>>> def f(x): # "f" is a generator
yield x
yield x**2
yield x**3
>>> g = f(10) # calling "f" returns a generator-iterator
>>> type(f) # "f" is a regular python function with "yield"
<type 'function'>
>>> type(g)
<type 'generator'>
>>> next(g) # next() gets a value from the generator-iterator
10
>>> next(g)
100
>>> next(g)
1000
>>> next(g) # iterators signal that they are done with an Exception
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
next(g)
StopIteration
>>> dir(g) # generator-iterators have next() and \__iter__
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']
Iterator using a class:
>>> class Powers: # "Powers" is a class
def __init__(self, base):
self.base = base
self.exp = 0
def __iter__(self):
return self
def next(self):
self.exp += 1
if self.exp > 3:
raise StopIteration
return self.base ** self.exp
>>> g = Powers(10) # calling "Powers" returns an iterator
>>> type(Powers) # "Power" is a regular python class
<type 'classobj'>
>>> type(g) # "g" is a iterator instance with next() and __iter__()
<type 'instance'>
>>> next(g) # next() gets a value from the iterator
10
>>> next(g)
100
>>> next(g)
1000
>>> next(g) # iterators signal that they are done with an Exception
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
next(g)
StopIteration
Iterator from a sequence example:
>>> s = 'cat'
>>> it = iter(s) # creates an "iterator" from a sequence
>>> type(s) # "s" is a string which is "iterable"
<type 'str'>
>>> type(it) # An "iterator" with next() and __iter__()
<type 'iterator'>
>>> next(it)
'c'
>>> next(it)
'a'
>>> next(it)
't'
>>> next(it)
Traceback (most recent call last):
File "<pyshell#43>", line 1, in <module>
next(it)
StopIteration
Comparison and conclusion
An iterator is an object representing a stream of data. It has an __iter__() method and a next() method.
There are several ways to make an iterator:
1) Call a generator (a regular python function that uses yield
)
2) Instantiate a class that has an __iter__() method and a next() method.
From this, you can see that a generator is just one of many ways to make an iterator (there are other ways as well: itertools, iter() on a regular function and a sentinel, etc).
python iterators, generators and in between
Something equivalent to Itertest2
could be written using a separate iterator class.
class Itertest3:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
return Itertest3Iterator(self.data)
class Itertest3Iterator:
def __init__(self, data):
self.data = enumerate(data)
def __iter__(self):
return self
def __next__(self):
print("in __inter__()")
i, dp = next(self.state) # Let StopIteration exception propagate
print("idx:", i)
return dp
Compare this to Itertest1
, where the instance of Itertest1
itself carried the state of the iteration around in it. Each call to Itertest1.__iter__
returned the same object (the instance of Itertest1
), so they couldn't iterate over the data independently.
Notice I put print("in __iter__()")
in __next__
, not __iter__
. As you observed, nothing in a generator function actually executes until the first call to __next__
. The generator function itself only creates an generator; it does not actually start executing the code in it.
Is there a way to distinguish this iterator from this generator?
g
supports send
, as all generators do, while i
doesn't. (send
ing to g
isn't useful, but you can do it.)
>>> g.send(None)
1
>>> i.send(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list_iterator' object has no attribute 'send'
You can also throw
exceptions into g
or close
it, which you can't do with i
.
i
can be pickled, while g
can't.
Aside from that, you can do all sorts of explicit checks and introspection to distinguish them. Checking the types, examining the str
output, looking for attributes that only exist on one or the other (like g.gi_frame
), etc.
Most of this is implementation details or incidental, not something you should think of as "the difference between generators and iterators". Generators are a kind of iterator.
Python 3: Iterator that is NOT a generator?
"Every generator is an iterator, but not vice versa."
I don't think that's necessarily true. The Python glossary gives 3 entries that start with "generator".
Generator
This is any function with a yield
statement. A generator function is not an iterator, but it does return an iterator when you call it.
def getSquares(n):
for i in range(n):
yield i**2
Generator iterator
This is the thing that gets returned by a generator function.
Generator expression
This is just a comprehension inside parentheses.
(i**2 for i in range(10))
Generator expressions and the return values of generator functions both give <class 'generator'>
when you call type
on them. However, if you define your own class with a __next__
method, its instances will, of course, have that class as their type
.
What's the difference between Mypy iterators and generators?
Whenever you're not sure what exactly some builtin type is, I recommend checking Typeshed, the repository of type hints for the Python standard library (and some select 3rd party modules). Mypy bakes in a version of typeshed with each release.
For example, here are the definitions of what exactly an Iterator and a Generator are:
@runtime
class Iterator(Iterable[_T_co], Protocol[_T_co]):
@abstractmethod
def __next__(self) -> _T_co: ...
def __iter__(self) -> Iterator[_T_co]: ...
class Generator(Iterator[_T_co], Generic[_T_co, _T_contra, _V_co]):
@abstractmethod
def __next__(self) -> _T_co: ...
@abstractmethod
def send(self, value: _T_contra) -> _T_co: ...
@abstractmethod
def throw(self, typ: Type[BaseException], val: Optional[BaseException] = ...,
tb: Optional[TracebackType] = ...) -> _T_co: ...
@abstractmethod
def close(self) -> None: ...
@abstractmethod
def __iter__(self) -> Generator[_T_co, _T_contra, _V_co]: ...
@property
def gi_code(self) -> CodeType: ...
@property
def gi_frame(self) -> FrameType: ...
@property
def gi_running(self) -> bool: ...
@property
def gi_yieldfrom(self) -> Optional[Generator]: ...
Notice that:
- Iterators only have two methods:
__next__
and__iter__
but generators have many more. - Generators are a subtype of Iterators -- every single Generator is also an Iterator, but not vice-versa.
But what does this mean on a high-level?
Well, in short, with iterators, the flow of information is one-way only. When you have an iterator, all you can really do call the __next__
method to get the very next value to be yielded.
In contrast, the flow of information with generators is bidirectional: you can send information back into the generator via the send
method.
That's what the other two type parameters are for, actually -- when you do Generator[A, B, C]
, you're stating that the values you yield are of type A
, the values you send into the generator are of type B
, and the value that you return from the generator are of type C
.
Here's some additional useful reading material:
- python generator "send" function purpose?
- Difference between Python's Generators and Iterators
- Return in generator together with yield in Python 3.3
- https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
So, when should you use Iterator vs Generator?
Well, in general, you should bias towards using the type that helps the caller understand how you expect the return value to be used.
For example, take your fib
example. All you do there is yield values: the flow of information is one-way, and the code is not really set up to accept information from the caller.
So, it would be the most understandable to use Iterator instead of Generator in that case: Iterator best reflects the one-way nature of your fib implementation.
(And if you wrote a generator where the flow of data is meant to be bidirectional, you'd of course need to use Generator instead of Iterator.)
Related Topics
How to Declare Custom Exceptions in Modern Python
Fitting Empirical Distribution to Theoretical Ones With Scipy (Python)
Unresolved Reference Issue in Pycharm
Check If All Elements in a List Are Identical
Selecting With Complex Criteria from Pandas.Dataframe
Find Intersection of Two Nested Lists
Accessing Dict Keys Like an Attribute
Convert Utc Datetime String to Local Datetime
How to Get a Function Name as a String
Find Nearest Value in Numpy Array
Is There a Simple Way to Remove Multiple Spaces in a String
Difference Between 'Sorted(List)' VS 'List.Sort()'
Changes in Import Statement Python3
Loop Through All Nested Dictionary Values
How to Read a File in Reverse Order