Can Iterators Be Reset in Python

Can iterators be reset in Python?

I see many answers suggesting itertools.tee, but that's ignoring one crucial warning in the docs for it:

This itertool may require significant
auxiliary storage (depending on how
much temporary data needs to be
stored). In general, if one iterator
uses most or all of the data before
another iterator starts, it is faster
to use list() instead of tee().

Basically, tee is designed for those situation where two (or more) clones of one iterator, while "getting out of sync" with each other, don't do so by much -- rather, they say in the same "vicinity" (a few items behind or ahead of each other). Not suitable for the OP's problem of "redo from the start".

L = list(DictReader(...)) on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new "iterator from the start" (very lightweight and low-overhead) can be made at any time with iter(L), and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.

As several answers rightly remarked, in the specific case of csv you can also .seek(0) the underlying file object (a rather special case). I'm not sure that's documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list I recommmend as the general approach would have too large a memory footprint.

Resetting an iterator, which is a map object?

Iterating an iterator/generator consumes the values from it (infinite generators being an exception), meaning that they will no longer be available on future iterations (as you've seen). For a typical iterator/generator in Python, the only true way to "restart" it is to re-initialize it.

>>> sol = map(pow, [1, 2, 3], [4, 5, 6])      
>>> list(sol)
[1, 32, 729]
>>> next(sol)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> next(sol)
1

There are ways that you can work with the iterator to make it reusable though, such as with itertools.tee (as mentioned by one of the answers to the question linked by @JanChristophTerasa), or to convert the iterator into a list, which will persist its data.

itertools.tee

>>> from itertools import tee
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> a, b = tee(sol, 2)
>>> list(a)
[1, 32, 729]
>>> list(b)
[1, 32, 729]
>>> list(a)
[]

with tee though, both a and b will still be iterators, so you'll have the same problem with them.

Another common way to handle this is with list()

sol = list(map(pow, [1, 2, 3], [4, 5, 6]))
>>> sol
[1, 32, 729]
>>> sol
[1, 32, 729]

Now, sol is a list of values instead of an iterator, which means you can iterate it as many times as you want - the values will remain there. This does mean you can't use next with it (in the sense of next(sol)), but you can get an iterator back from your new list with iter(sol) if you need an iterator specifically.

Edit

I saw itertools.cycle mentioned in the comments, which is also a valid option so I thought I might add some info on it here as well.

itertools.cycle is one of those infinite generators I mentioned at the start. It is still an iterator, but in a way that you'll never run out of values.

>>> from itertools import cycle
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> infinite = cycle(sol)
>>> for _ in range(5):
... print(next(infinite))
...
1
32
729
1
32
>>>

A few notes on this - after iterating infinite N times, it will be positioned after whatever the last value was pulled from it. Iterating it again later will resume from that position, not from the start.

Also, and this is very important, do not iterate an infinite generator in an unbounded fashion, like list(infinite) or for x in infinite:, or you're gonna have a bad time.

In Python, is it a bad practice to reset an iterator when __iter__ is called?

iter is expected to have no side effects. By violating this assumption, your code breaks all sorts of things. For example, the standard test for whether a thing is iterable:

try:
iter(thing)
except TypeError:
do_whatever()

will reset your file. Similarly, the itertools consume recipe:

def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is None, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)

will produce an incorrect file position instead of advancing n records after consume(your_file, n). Skipping the first few records with next before a loop will also fail:

f = MySpecialFile(whatever)
next(f) # Skip a header, or try, anyway.
for record in f:
# We get the header anyway.
uhoh()

Resetting generator object in Python

Another option is to use the itertools.tee() function to create a second version of your generator:

import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.

How to reset a loop that iterates over a set?

One way to do so would be by using iterators. You could define an iterator by simply calling iter() on your set, and call its next method on each iteration. When the condition is met, you can simply create again the iterator object from the set and repeat the process:

s = {1,2,3,4,5}
s_ = iter(s)
# Just a counter to avoid endless loop
cont = 0
while cont < 10:
try:
i = next(s_)
except StopIteration:
break
# Some condition
if flag == True:
# Reset the iterator when the condition is met
s_ = iter(s)
continue
cont += 1

How to reset and shuffle a next iterator?

Generator endlessly shuffling and yielding:

def endless_shuffling(iterable):
values = list(iterable)
while True:
random.shuffle(values)
yield from values

Instead of your iter(all_angles), use endless_shuffling(all_angles) (and remove your own other shuffling).

One way to then get your list:

random_angles = endless_shuffling(range(-180, 180))
n_list = list(islice(random_angles, 1000))

If you give it an empty iterable and ask it for a value, it'll "hang", so either don't do that or guard against that case (e.g., with an extra if values: or with while values:).

I also tried a faster way to iterate than sending every value through a generator, but the shuffling dominates so it doesn't make a big difference:

with shuffling:
448.3 ms endless_shuffling1
426.7 ms endless_shuffling2

without shuffling:
26.4 ms endless_shuffling1
5.1 ms endless_shuffling2

Full code (Try it online!):

from random import shuffle
from itertools import chain, islice
from timeit import default_timer as time

def endless_shuffling1(iterable):
values = list(iterable)
while True:
shuffle(values)
yield from values

def endless_shuffling2(iterable):
values = list(iterable)
return chain.from_iterable(iter(
lambda: shuffle(values) or values,
[]
))

funcs = endless_shuffling1, endless_shuffling2

for f in funcs:
print(*islice(f('abc'), 21))

for i in range(6):
for f in funcs:
t0 = time()
next(islice(f(range(-180,180)), 999999, 1000000))
print('%5.1f ms ' % ((time() - t0) * 1e3), f.__name__)
print()
if i == 2:
print('without shuffling:\n')
def shuffle(x):
pass

Reset Factory Iterator in Factoryboy

The field declaration stays available through the class:

CountryFactory.code2.reset()

You can also access the declaration objects of a factory through the class' _meta attribute:

CountryFactory._meta.declarations['code2'].reset()

Calling setter to reset iterator

Rather than explicitly resetting self._node_wave, just define a private generator that takes care of cycling over the values.

def _wave_generator(self):
while True:
yield from sorted(WAVELENGTH, key=lambda k: random.random())

def __init__(self):
self._node_wave = _wave_generator()

@property
def node_wave(self):
return next(self._node_wave)

Because of how _wave_generator is defined, next(self._node_wave) will never raise StopIteration. When one sorted list is exhausted, another one is automatically created.



Related Topics



Leave a reply



Submit