What are iterator, iterable, and iteration?
Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.
In Python, iterable and iterator have specific meanings.
An iterable is an object that has an __iter__
method which returns an iterator, or which defines a __getitem__
method that can take sequential indexes starting from zero (and raises an IndexError
when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
An iterator is an object with a next
(Python 2) or __next__
(Python 3) method.
Whenever you use a for
loop, or map
, or a list comprehension, etc. in Python, the next
method is called automatically to get each item from the iterator, thus going through the process of iteration.
A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.
Python for loop and iterator behavior
Your suspicion is correct: the iterator has been consumed.
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
type((i for i in range(5))) # says it's type generator
def another_generator():
yield 1 # the yield expression makes it a generator, not a function
type(another_generator()) # also a generator
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function
for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
Some other corrections to help improve your understanding:
- The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
- One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
- The keyword combination
for
in
accepts an iterable object as its second argument. - The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a
list
, ordict
, or astr
object (string), or a user-defined type that provides the required functionality. - The
iter
function is applied to the object to get an iterator (by the way: don't useiter
as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's__iter__
method is called (which is, for the most part, all theiter
function does anyway;__iter__
is one of Python's so-called "magic methods"). - If the call to
__iter__
is successful, the functionnext()
is applied to the iterable object over and over again, in a loop, and the first variable supplied tofor
in
is assigned to the result of thenext()
function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's__next__
method, which is another "magic method". - The
for
loop ends whennext()
raises theStopIteration
exception (which usually happens when the iterable does not have another object to yield whennext()
is called).
You can "manually" implement a for
loop in python this way (probably not perfect, but close enough):
try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue
There is pretty much no difference between the above and your example code.
Actually, the more interesting part of a for
loop is not the for
, but the in
. Using in
by itself produces a different effect than for
in
, but it is very useful to understand what in
does with its arguments, since for
in
implements very similar behavior.
When used by itself, the
in
keyword first calls the object's__contains__
method, which is yet another "magic method" (note that this step is skipped when usingfor
in
). Usingin
by itself on a container, you can do things like this:1 in [1, 2, 3] # True
'He' in 'Hello' # True
3 in range(10) # True
'eH' in 'Hello'[::-1] # TrueIf the iterable object is NOT a container (i.e. it doesn't have a
__contains__
method),in
next tries to call the object's__iter__
method. As was said previously: the__iter__
method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic functionnext()
on1. A generator is just one type of iterator.- If the call to
__iter__
is successful, thein
keyword applies the functionnext()
to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's__next__
method). - If the object doesn't have a
__iter__
method to return an iterator,in
then falls back on the old-style iteration protocol using the object's__getitem__
method2. - If all of the above attempts fail, you'll get a
TypeError
exception.
If you wish to create your own object type to iterate over (i.e, you can use for
in
, or just in
, on it), it's useful to know about the yield
keyword, which is used in generators (as mentioned above).
class MyIterable():
def __iter__(self):
yield 1
m = MyIterable()
for _ in m: print(_) # 1
1 in m # True
The presence of yield
turns a function or method into a generator instead of a regular function/method. You don't need the __next__
method if you use a generator (it brings __next__
along with it automatically).
If you wish to create your own container object type (i.e, you can use in
on it by itself, but NOT for
in
), you just need the __contains__
method.
class MyUselessContainer():
def __contains__(self, obj):
return True
m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True
1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__
and __iter__
methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__
method is actually next
(no underscores) in Python 2.
2 See this answer for the different ways to create iterable classes.
How does the Python for loop actually work?
Yes, that's a good approximation of how the for
loop construct is implemented. It certainly matches the for
loop statement documentation:
The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the
expression_list
. The suite is then executed once for each item provided by the iterator, in the order returned by the iterator. Each item in turn is assigned to the target list using the standard rules for assignments (see Assignment statements), and then the suite is executed. When the items are exhausted (which is immediately when the sequence is empty or an iterator raises aStopIteration
exception), the suite in theelse
clause, if present, is executed, and the loop terminates.
You only missed the assigned to the target list using the standard rules for assignments part; you'd have to use i = next(iter_list)
and print(i)
rather than print the result of the next()
call directly.
Python source code is compiled to bytecode, which the interpreter loop then executes. You can look at the bytecode for a for
loop by using the dis
module:
>>> import dis
>>> dis.dis('for i in mylist: pass')
1 0 SETUP_LOOP 12 (to 14)
2 LOAD_NAME 0 (mylist)
4 GET_ITER
>> 6 FOR_ITER 4 (to 12)
8 STORE_NAME 1 (i)
10 JUMP_ABSOLUTE 6
>> 12 POP_BLOCK
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
The various opcodes named are documented in the same dis
module, and their implementation can be found in the CPython evaluation loop (look for the TARGET(<opcode>)
switch targets); the above opcodes break down to:
SETUP_LOOP 12
marks the start of the suite, a block of statements, so the interpreter knows where to jump to in case of abreak
, and what cleanup needs to be done in case of an exception orreturn
statement; the clean-up opcode is located 12 bytes of bytecode after this opcode (soPOP_BLOCK
here).LOAD_NAME 0 (mylist)
loads themylist
variable value, putting it on the top of the stack (TOS in opcode descriptions).GET_ITER
callsiter()
on the object on the TOS, then replaces the TOS with the result.FOR_ITER 4
callsnext()
on the TOS iterator. If that gives a result, then that's pushed to the TOS. If there is aStopIteration
exception, then the iterator is removed from TOS, and 4 bytes of bytecode are skipped to thePOP_BLOCK
opcode.STORE_NAME 1
takes the TOS and puts it in the named variable, here that'si
.JUMP_ABSOLUTE 6
marks the end of the loop body; it tells the interpreter to go back up to bytecode offset 6, to theFOR_ITER
instruction above. If we did something interesting in the loop, then that would happen afterSTORE_NAME
, before theJUMP_ABSOLUTE
.POP_BLOCK
removes the block bookkeeping set up bySETUP_LOOP
and removes the iterator from the stack.
The >>
markers are jump targets, there as visual cues to make it easier to spot those when reading the opcode line that jumps to them.
Use more than 1 iterable in a python for loop
for files,script in TEMPLATE_FILE.items():
print(files,scripts)
is the construction you're looking for.
(in python 2 there's an iteritems
which is removed in python 3 so for small dictionaries items
is OK and portable)
of course you can do:
for files in TEMPLATE_FILE:
scripts = TEMPLATE_FILE[files]
but that's not as efficient as you're hashing the key at each iteration, whereas you could get the values without that. Reserve hashing for random access cases.
Note that you can iterate through sorted keys like this (frequent question):
for files,script in sorted(TEMPLATE_FILE.items()):
print(files,scripts)
What is the behavior of a loop when the iterable on which we are looping is modified at each iteration
for i in iterable:
# some code with i
is (with sufficient precision in this context) equivalent to
iterator = iter(iterable)
while True:
try:
i = next(iterator)
except StopIteration:
break
# some code with i
You can see
i
is reassigned in each iteration- the
iterator
is created exactly once - mutations to the
iterable
may or may not lead to unexpected behavior, depending on howiterable.__iter__
is implemented.__iter__
is the method responsible for creating theiterator
.
In the case of lists the iterator
keeps track of an integer index, i.e. which element to pull next from the list. When you remove an item during iteration, the index of the subsequent elements change by -1, but the iterator
is not being informed of this.
>>> l = ['a', 'b', 'c', 'd']
>>> li = iter(l) # iterator pulls index 0 next
>>> next(li)
'a' # iterator pulled index 0, will pull index 1 next
>>> l.remove('a') # 'b' is now at index 0
>>> l
['b', 'c', 'd']
>>> next(li)
'c'
Modifying size of iterable during for loop - how is looping determined?
The loop executes till iterable says it has no more elements. After two cycles, the iterable has gone through two elements, and has lost two elements, which means it is at its end, and the loop terminates.
Your code is equivalent to this:
y = [1, 2, 3, 4]
i = iter(y)
while True:
try:
x=next(i)
except StopIteration:
break
print(y)
print(y.pop(0))
The list iterator holds the index that is up to be read next. In the third cycle, the list is [3, 4]
, and next(i)
would be needing to read y[2]
, which is not possible, so next
raises StopIteration
, which ends the loop.
EDIT As to your other questions:
How do
iter
andStopIteration
, and__getitem__(i)
andIndexError
factor in?
The first two are as described above: it is what defines a for
loop. Or, if you will, it is the contract of iter
: it will yield stuff till it stops with StopIteration
.
The latter two, I don't think participate at all, since the list iterator is implemented in C; for example, the check for whether the iterator is exhausted directly compares the current index with PyList_GET_SIZE
, which directly looks at ->ob_size
field; it doesn't pass through Python any more. Obviously, you could make a list iterator that would be fully in pure Python, and you'd likely be either using len
to perform the check, or catching IndexError
and again letting the underlying C code perform the check against ->ob_size
.
What about iterators that aren't lists?
You can define any object to be iterable. When you call iter(obj)
, it is the same as calling obj.__iter__()
. This is expected to return an iterator, which knows what to do with i.__next__()
(which is what next(i)
translates to). I believe dicts iterate (I think, haven't checked) by having an index into the list of its keys. You can make an iterator that will do anything you want, if you code it. For example:
class AlwaysEmpty:
def __iter__(self):
return self
def __next__(self):
raise StopIteration
for x in AlwaysEmpty():
print("there was something")
will, predictably, print nothing.
And most importantly, is this / where is this in the docs?
Iterator Types
modifying iterator in for loop in python
The for loop is walking through the iterable range(100)
.
Modifying the current value does not affect what appears next in the iterable (and indeed, you could have any iterable; the next value might not be a number!).
Option 1 use a while loop:
i = 0
while i < 100:
i += 4
Option 2, use the built in step size argument of range:
for i in range(0,100,10):
pass
This example may make it clearer why your method doesn't make much sense:
for i in [1,2,3,4,5,'cat','fish']:
i = i + i
print i
This is entirely valid python code (string addition is defined); modifying the iterable would require something unintuitive.
See here for more information on how iterables work, and how to modify them dynamically
Why is the iterator in for loop a string and not an integer?
b
is a list of strings, correct? This is b
:
['5', '3', '1']
So when you start iterating through b
, each value of i
will be a string because strings are all there is in b
.
When you say int(i)
in your if
statement, you are not changing i
to an integer. You are only getting the integer value of i
.
If you want b
to be a list of integers instead of a list of strings, use this instead:
b = [int(num) for num in a.split(",")]
Output:
<class 'str'>
<class 'list'>
[5, 3, 1]
5
<class 'int'>
You can see that i
is now an integer.
Related Topics
Create PDF from a List of Images
How to Extract Data from Matplotlib Plot
Typeerror: Str Does Not Support Buffer Interface
How to Use Pil to Make All White Pixels Transparent
Deleting List Elements Based on Condition
Get Relative Path from Comparing Two Absolute Paths
Installing Numpy on 64Bit Windows 7 with Python 2.7.3
Automating Pydrive Verification Process
Python How to Write to a Binary File
What Do I Do When I Need a Self Referential Dictionary
Check If String Has Date, Any Format
How to Use SQL Parameters with Python
Python Pip Specify a Library Directory and an Include Directory