How Does a Python for Loop with Iterable Work

What are iterator, iterable, and iteration?

Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.

In Python, iterable and iterator have specific meanings.

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.

A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.

Python for loop and iterator behavior

Your suspicion is correct: the iterator has been consumed.

In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.

type((i for i in range(5))) # says it's type generator 

def another_generator():
yield 1 # the yield expression makes it a generator, not a function

type(another_generator()) # also a generator

The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:

def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function

for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!

Some other corrections to help improve your understanding:

  • The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
  • One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
  • The keyword combination for in accepts an iterable object as its second argument.
  • The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
  • The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
  • If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
  • The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).

You can "manually" implement a for loop in python this way (probably not perfect, but close enough):

try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue

There is pretty much no difference between the above and your example code.

Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.

  • When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:

    1 in [1, 2, 3] # True
    'He' in 'Hello' # True
    3 in range(10) # True
    'eH' in 'Hello'[::-1] # True
  • If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on1. A generator is just one type of iterator.

  • If the call to __iter__ is successful, the in keyword applies the function next() to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method).
  • If the object doesn't have a __iter__ method to return an iterator, in then falls back on the old-style iteration protocol using the object's __getitem__ method2.
  • If all of the above attempts fail, you'll get a TypeError exception.

If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).

class MyIterable():
def __iter__(self):
yield 1

m = MyIterable()
for _ in m: print(_) # 1
1 in m # True

The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).

If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.

class MyUselessContainer():
def __contains__(self, obj):
return True

m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True

1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.

2 See this answer for the different ways to create iterable classes.

How does the Python for loop actually work?

Yes, that's a good approximation of how the for loop construct is implemented. It certainly matches the for loop statement documentation:

The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the expression_list. The suite is then executed once for each item provided by the iterator, in the order returned by the iterator. Each item in turn is assigned to the target list using the standard rules for assignments (see Assignment statements), and then the suite is executed. When the items are exhausted (which is immediately when the sequence is empty or an iterator raises a StopIteration exception), the suite in the else clause, if present, is executed, and the loop terminates.

You only missed the assigned to the target list using the standard rules for assignments part; you'd have to use i = next(iter_list) and print(i) rather than print the result of the next() call directly.

Python source code is compiled to bytecode, which the interpreter loop then executes. You can look at the bytecode for a for loop by using the dis module:

>>> import dis
>>> dis.dis('for i in mylist: pass')
1 0 SETUP_LOOP 12 (to 14)
2 LOAD_NAME 0 (mylist)
4 GET_ITER
>> 6 FOR_ITER 4 (to 12)
8 STORE_NAME 1 (i)
10 JUMP_ABSOLUTE 6
>> 12 POP_BLOCK
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE

The various opcodes named are documented in the same dis module, and their implementation can be found in the CPython evaluation loop (look for the TARGET(<opcode>) switch targets); the above opcodes break down to:

  • SETUP_LOOP 12 marks the start of the suite, a block of statements, so the interpreter knows where to jump to in case of a break, and what cleanup needs to be done in case of an exception or return statement; the clean-up opcode is located 12 bytes of bytecode after this opcode (so POP_BLOCK here).
  • LOAD_NAME 0 (mylist) loads the mylist variable value, putting it on the top of the stack (TOS in opcode descriptions).
  • GET_ITER calls iter() on the object on the TOS, then replaces the TOS with the result.
  • FOR_ITER 4 calls next() on the TOS iterator. If that gives a result, then that's pushed to the TOS. If there is a StopIteration exception, then the iterator is removed from TOS, and 4 bytes of bytecode are skipped to the POP_BLOCK opcode.
  • STORE_NAME 1 takes the TOS and puts it in the named variable, here that's i.
  • JUMP_ABSOLUTE 6 marks the end of the loop body; it tells the interpreter to go back up to bytecode offset 6, to the FOR_ITER instruction above. If we did something interesting in the loop, then that would happen after STORE_NAME, before the JUMP_ABSOLUTE.
  • POP_BLOCK removes the block bookkeeping set up by SETUP_LOOP and removes the iterator from the stack.

The >> markers are jump targets, there as visual cues to make it easier to spot those when reading the opcode line that jumps to them.

Use more than 1 iterable in a python for loop

 for files,script in TEMPLATE_FILE.items():
print(files,scripts)

is the construction you're looking for.

(in python 2 there's an iteritems which is removed in python 3 so for small dictionaries items is OK and portable)

of course you can do:

for files in TEMPLATE_FILE:
scripts = TEMPLATE_FILE[files]

but that's not as efficient as you're hashing the key at each iteration, whereas you could get the values without that. Reserve hashing for random access cases.

Note that you can iterate through sorted keys like this (frequent question):

 for files,script in sorted(TEMPLATE_FILE.items()):
print(files,scripts)

What is the behavior of a loop when the iterable on which we are looping is modified at each iteration

for i in iterable: 
# some code with i

is (with sufficient precision in this context) equivalent to

iterator = iter(iterable)
while True:
try:
i = next(iterator)
except StopIteration:
break
# some code with i

You can see

  • i is reassigned in each iteration
  • the iterator is created exactly once
  • mutations to the iterable may or may not lead to unexpected behavior, depending on how iterable.__iter__ is implemented. __iter__ is the method responsible for creating the iterator.

In the case of lists the iterator keeps track of an integer index, i.e. which element to pull next from the list. When you remove an item during iteration, the index of the subsequent elements change by -1, but the iterator is not being informed of this.

>>> l = ['a', 'b', 'c', 'd']
>>> li = iter(l) # iterator pulls index 0 next
>>> next(li)
'a' # iterator pulled index 0, will pull index 1 next
>>> l.remove('a') # 'b' is now at index 0
>>> l
['b', 'c', 'd']
>>> next(li)
'c'

Modifying size of iterable during for loop - how is looping determined?

The loop executes till iterable says it has no more elements. After two cycles, the iterable has gone through two elements, and has lost two elements, which means it is at its end, and the loop terminates.

Your code is equivalent to this:

y = [1, 2, 3, 4]
i = iter(y)
while True:
try:
x=next(i)
except StopIteration:
break
print(y)
print(y.pop(0))

The list iterator holds the index that is up to be read next. In the third cycle, the list is [3, 4], and next(i) would be needing to read y[2], which is not possible, so next raises StopIteration, which ends the loop.

EDIT As to your other questions:

How do iter and StopIteration, and __getitem__(i) and IndexError factor in?

The first two are as described above: it is what defines a for loop. Or, if you will, it is the contract of iter: it will yield stuff till it stops with StopIteration.

The latter two, I don't think participate at all, since the list iterator is implemented in C; for example, the check for whether the iterator is exhausted directly compares the current index with PyList_GET_SIZE, which directly looks at ->ob_size field; it doesn't pass through Python any more. Obviously, you could make a list iterator that would be fully in pure Python, and you'd likely be either using len to perform the check, or catching IndexError and again letting the underlying C code perform the check against ->ob_size.

What about iterators that aren't lists?

You can define any object to be iterable. When you call iter(obj), it is the same as calling obj.__iter__(). This is expected to return an iterator, which knows what to do with i.__next__() (which is what next(i) translates to). I believe dicts iterate (I think, haven't checked) by having an index into the list of its keys. You can make an iterator that will do anything you want, if you code it. For example:

class AlwaysEmpty:
def __iter__(self):
return self
def __next__(self):
raise StopIteration

for x in AlwaysEmpty():
print("there was something")

will, predictably, print nothing.

And most importantly, is this / where is this in the docs?

Iterator Types

modifying iterator in for loop in python

The for loop is walking through the iterable range(100).

Modifying the current value does not affect what appears next in the iterable (and indeed, you could have any iterable; the next value might not be a number!).

Option 1 use a while loop:

i = 0
while i < 100:
i += 4

Option 2, use the built in step size argument of range:

 for i in range(0,100,10):
pass

This example may make it clearer why your method doesn't make much sense:

for i in [1,2,3,4,5,'cat','fish']:
i = i + i
print i

This is entirely valid python code (string addition is defined); modifying the iterable would require something unintuitive.

See here for more information on how iterables work, and how to modify them dynamically

Why is the iterator in for loop a string and not an integer?

b is a list of strings, correct? This is b:

['5', '3', '1']

So when you start iterating through b, each value of i will be a string because strings are all there is in b.
When you say int(i) in your if statement, you are not changing i to an integer. You are only getting the integer value of i.

If you want b to be a list of integers instead of a list of strings, use this instead:

b = [int(num) for num in a.split(",")]

Output:

<class 'str'>
<class 'list'>
[5, 3, 1]
5
<class 'int'>

You can see that i is now an integer.



Related Topics



Leave a reply



Submit