Difference Between Python'S Generators and Iterators

What is the difference between an Iterator and a Generator?

Generators are iterators, but not all iterators are generators.

An iterator is typically something that has a next method to get the next element from a stream. A generator is an iterator that is tied to a function.

For example a generator in python:

def genCountingNumbers():
n = 0
while True:
yield n
n = n + 1

This has the advantage that you don't need to store infinite numbers in memory to iterate over them.

You'd use this as you would any iterator:

for i in genCountingNumbers():
print i
if i > 20: break # Avoid infinite loop

You could also iterate over an array:

for i in ['a', 'b', 'c']:
print i

Explain: Every generator is an iterator, but not vice versa

I know what's iterator, what's generator, what's iteration protocol, how to create both.

What's an iterator?

Per the glossary, an iterator is "an object representing a stream of data". It has an __iter__() method returns itself, and it has a next() method (which is __next__() in Python 3). The next-method is responsible for returning a value, advancing the iterator, and raising StopIteration when done.

What is a generator?

A generator is a regular Python function containing yield. When called it returns a generator-iterator (one of the many kinds of iterator).

Examples of how to create generators and iterators

Generator example:

>>> def f(x):           # "f" is a generator
yield x
yield x**2
yield x**3

>>> g = f(10) # calling "f" returns a generator-iterator
>>> type(f) # "f" is a regular python function with "yield"
<type 'function'>
>>> type(g)
<type 'generator'>
>>> next(g) # next() gets a value from the generator-iterator
10
>>> next(g)
100
>>> next(g)
1000
>>> next(g) # iterators signal that they are done with an Exception

Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
next(g)
StopIteration
>>> dir(g) # generator-iterators have next() and \__iter__
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']

Iterator using a class:

>>> class Powers:       # "Powers" is a class
def __init__(self, base):
self.base = base
self.exp = 0
def __iter__(self):
return self
def next(self):
self.exp += 1
if self.exp > 3:
raise StopIteration
return self.base ** self.exp

>>> g = Powers(10) # calling "Powers" returns an iterator
>>> type(Powers) # "Power" is a regular python class
<type 'classobj'>
>>> type(g) # "g" is a iterator instance with next() and __iter__()
<type 'instance'>
>>> next(g) # next() gets a value from the iterator
10
>>> next(g)
100
>>> next(g)
1000
>>> next(g) # iterators signal that they are done with an Exception

Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
next(g)
StopIteration

Iterator from a sequence example:

>>> s = 'cat'               
>>> it = iter(s) # creates an "iterator" from a sequence
>>> type(s) # "s" is a string which is "iterable"
<type 'str'>
>>> type(it) # An "iterator" with next() and __iter__()
<type 'iterator'>
>>> next(it)
'c'
>>> next(it)
'a'
>>> next(it)
't'
>>> next(it)

Traceback (most recent call last):
File "<pyshell#43>", line 1, in <module>
next(it)
StopIteration

Comparison and conclusion

An iterator is an object representing a stream of data. It has an __iter__() method and a next() method.

There are several ways to make an iterator:

1) Call a generator (a regular python function that uses yield)
2) Instantiate a class that has an __iter__() method and a next() method.

From this, you can see that a generator is just one of many ways to make an iterator (there are other ways as well: itertools, iter() on a regular function and a sentinel, etc).

python iterators, generators and in between

Something equivalent to Itertest2 could be written using a separate iterator class.

class Itertest3:
def __init__(self):
self.data = list(range(100))

def __iter__(self):
return Itertest3Iterator(self.data)

class Itertest3Iterator:
def __init__(self, data):
self.data = enumerate(data)

def __iter__(self):
return self

def __next__(self):
print("in __inter__()")
i, dp = next(self.state) # Let StopIteration exception propagate
print("idx:", i)
return dp

Compare this to Itertest1, where the instance of Itertest1 itself carried the state of the iteration around in it. Each call to Itertest1.__iter__ returned the same object (the instance of Itertest1), so they couldn't iterate over the data independently.

Notice I put print("in __iter__()") in __next__, not __iter__. As you observed, nothing in a generator function actually executes until the first call to __next__. The generator function itself only creates an generator; it does not actually start executing the code in it.

Is there a way to distinguish this iterator from this generator?

g supports send, as all generators do, while i doesn't. (sending to g isn't useful, but you can do it.)

>>> g.send(None)
1
>>> i.send(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list_iterator' object has no attribute 'send'

You can also throw exceptions into g or close it, which you can't do with i.

i can be pickled, while g can't.

Aside from that, you can do all sorts of explicit checks and introspection to distinguish them. Checking the types, examining the str output, looking for attributes that only exist on one or the other (like g.gi_frame), etc.

Most of this is implementation details or incidental, not something you should think of as "the difference between generators and iterators". Generators are a kind of iterator.

Python 3: Iterator that is NOT a generator?

"Every generator is an iterator, but not vice versa."

I don't think that's necessarily true. The Python glossary gives 3 entries that start with "generator".

Generator

This is any function with a yield statement. A generator function is not an iterator, but it does return an iterator when you call it.

def getSquares(n):
for i in range(n):
yield i**2

Generator iterator

This is the thing that gets returned by a generator function.

Generator expression

This is just a comprehension inside parentheses.

(i**2 for i in range(10))

Generator expressions and the return values of generator functions both give <class 'generator'> when you call type on them. However, if you define your own class with a __next__ method, its instances will, of course, have that class as their type.

What's the difference between Mypy iterators and generators?

Whenever you're not sure what exactly some builtin type is, I recommend checking Typeshed, the repository of type hints for the Python standard library (and some select 3rd party modules). Mypy bakes in a version of typeshed with each release.

For example, here are the definitions of what exactly an Iterator and a Generator are:

@runtime
class Iterator(Iterable[_T_co], Protocol[_T_co]):
@abstractmethod
def __next__(self) -> _T_co: ...
def __iter__(self) -> Iterator[_T_co]: ...

class Generator(Iterator[_T_co], Generic[_T_co, _T_contra, _V_co]):
@abstractmethod
def __next__(self) -> _T_co: ...

@abstractmethod
def send(self, value: _T_contra) -> _T_co: ...

@abstractmethod
def throw(self, typ: Type[BaseException], val: Optional[BaseException] = ...,
tb: Optional[TracebackType] = ...) -> _T_co: ...

@abstractmethod
def close(self) -> None: ...

@abstractmethod
def __iter__(self) -> Generator[_T_co, _T_contra, _V_co]: ...

@property
def gi_code(self) -> CodeType: ...
@property
def gi_frame(self) -> FrameType: ...
@property
def gi_running(self) -> bool: ...
@property
def gi_yieldfrom(self) -> Optional[Generator]: ...

Notice that:

  1. Iterators only have two methods: __next__ and __iter__ but generators have many more.
  2. Generators are a subtype of Iterators -- every single Generator is also an Iterator, but not vice-versa.

But what does this mean on a high-level?

Well, in short, with iterators, the flow of information is one-way only. When you have an iterator, all you can really do call the __next__ method to get the very next value to be yielded.

In contrast, the flow of information with generators is bidirectional: you can send information back into the generator via the send method.

That's what the other two type parameters are for, actually -- when you do Generator[A, B, C], you're stating that the values you yield are of type A, the values you send into the generator are of type B, and the value that you return from the generator are of type C.

Here's some additional useful reading material:

  1. python generator "send" function purpose?
  2. Difference between Python's Generators and Iterators
  3. Return in generator together with yield in Python 3.3
  4. https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/

So, when should you use Iterator vs Generator?

Well, in general, you should bias towards using the type that helps the caller understand how you expect the return value to be used.

For example, take your fib example. All you do there is yield values: the flow of information is one-way, and the code is not really set up to accept information from the caller.

So, it would be the most understandable to use Iterator instead of Generator in that case: Iterator best reflects the one-way nature of your fib implementation.

(And if you wrote a generator where the flow of data is meant to be bidirectional, you'd of course need to use Generator instead of Iterator.)



Related Topics



Leave a reply



Submit