Can iterators be reset in Python?
I see many answers suggesting itertools.tee, but that's ignoring one crucial warning in the docs for it:
This itertool may require significant
auxiliary storage (depending on how
much temporary data needs to be
stored). In general, if one iterator
uses most or all of the data before
another iterator starts, it is faster
to uselist()
instead oftee()
.
Basically, tee
is designed for those situation where two (or more) clones of one iterator, while "getting out of sync" with each other, don't do so by much -- rather, they say in the same "vicinity" (a few items behind or ahead of each other). Not suitable for the OP's problem of "redo from the start".
L = list(DictReader(...))
on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new "iterator from the start" (very lightweight and low-overhead) can be made at any time with iter(L)
, and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.
As several answers rightly remarked, in the specific case of csv
you can also .seek(0)
the underlying file object (a rather special case). I'm not sure that's documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list
I recommmend as the general approach would have too large a memory footprint.
Resetting an iterator, which is a map object?
Iterating an iterator/generator consumes the values from it (infinite generators being an exception), meaning that they will no longer be available on future iterations (as you've seen). For a typical iterator/generator in Python, the only true way to "restart" it is to re-initialize it.
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> list(sol)
[1, 32, 729]
>>> next(sol)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> next(sol)
1
There are ways that you can work with the iterator to make it reusable though, such as with itertools.tee
(as mentioned by one of the answers to the question linked by @JanChristophTerasa), or to convert the iterator into a list, which will persist its data.
itertools.tee
>>> from itertools import tee
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> a, b = tee(sol, 2)
>>> list(a)
[1, 32, 729]
>>> list(b)
[1, 32, 729]
>>> list(a)
[]
with tee
though, both a
and b
will still be iterators, so you'll have the same problem with them.
Another common way to handle this is with list()
sol = list(map(pow, [1, 2, 3], [4, 5, 6]))
>>> sol
[1, 32, 729]
>>> sol
[1, 32, 729]
Now, sol
is a list of values instead of an iterator, which means you can iterate it as many times as you want - the values will remain there. This does mean you can't use next
with it (in the sense of next(sol)
), but you can get an iterator back from your new list with iter(sol)
if you need an iterator specifically.
Edit
I saw itertools.cycle
mentioned in the comments, which is also a valid option so I thought I might add some info on it here as well.
itertools.cycle
is one of those infinite generators I mentioned at the start. It is still an iterator, but in a way that you'll never run out of values.
>>> from itertools import cycle
>>> sol = map(pow, [1, 2, 3], [4, 5, 6])
>>> infinite = cycle(sol)
>>> for _ in range(5):
... print(next(infinite))
...
1
32
729
1
32
>>>
A few notes on this - after iterating infinite
N times, it will be positioned after whatever the last value was pulled from it. Iterating it again later will resume from that position, not from the start.
Also, and this is very important, do not iterate an infinite generator in an unbounded fashion, like list(infinite)
or for x in infinite:
, or you're gonna have a bad time.
In Python, is it a bad practice to reset an iterator when __iter__ is called?
iter
is expected to have no side effects. By violating this assumption, your code breaks all sorts of things. For example, the standard test for whether a thing is iterable:
try:
iter(thing)
except TypeError:
do_whatever()
will reset your file. Similarly, the itertools consume
recipe:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is None, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
will produce an incorrect file position instead of advancing n
records after consume(your_file, n)
. Skipping the first few records with next
before a loop will also fail:
f = MySpecialFile(whatever)
next(f) # Skip a header, or try, anyway.
for record in f:
# We get the header anyway.
uhoh()
Resetting generator object in Python
Another option is to use the itertools.tee()
function to create a second version of your generator:
import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)
This could be beneficial from memory usage point of view if the original iteration might not process all the items.
How to reset a loop that iterates over a set?
One way to do so would be by using iterators. You could define an iterator by simply calling iter()
on your set, and call its next
method on each iteration. When the condition is met, you can simply create again the iterator object from the set and repeat the process:
s = {1,2,3,4,5}
s_ = iter(s)
# Just a counter to avoid endless loop
cont = 0
while cont < 10:
try:
i = next(s_)
except StopIteration:
break
# Some condition
if flag == True:
# Reset the iterator when the condition is met
s_ = iter(s)
continue
cont += 1
How to reset and shuffle a next iterator?
Generator endlessly shuffling and yielding:
def endless_shuffling(iterable):
values = list(iterable)
while True:
random.shuffle(values)
yield from values
Instead of your iter(all_angles)
, use endless_shuffling(all_angles)
(and remove your own other shuffling).
One way to then get your list:
random_angles = endless_shuffling(range(-180, 180))
n_list = list(islice(random_angles, 1000))
If you give it an empty iterable and ask it for a value, it'll "hang", so either don't do that or guard against that case (e.g., with an extra if values:
or with while values:
).
I also tried a faster way to iterate than sending every value through a generator, but the shuffling dominates so it doesn't make a big difference:
with shuffling:
448.3 ms endless_shuffling1
426.7 ms endless_shuffling2
without shuffling:
26.4 ms endless_shuffling1
5.1 ms endless_shuffling2
Full code (Try it online!):
from random import shuffle
from itertools import chain, islice
from timeit import default_timer as time
def endless_shuffling1(iterable):
values = list(iterable)
while True:
shuffle(values)
yield from values
def endless_shuffling2(iterable):
values = list(iterable)
return chain.from_iterable(iter(
lambda: shuffle(values) or values,
[]
))
funcs = endless_shuffling1, endless_shuffling2
for f in funcs:
print(*islice(f('abc'), 21))
for i in range(6):
for f in funcs:
t0 = time()
next(islice(f(range(-180,180)), 999999, 1000000))
print('%5.1f ms ' % ((time() - t0) * 1e3), f.__name__)
print()
if i == 2:
print('without shuffling:\n')
def shuffle(x):
pass
Reset Factory Iterator in Factoryboy
The field declaration stays available through the class:
CountryFactory.code2.reset()
You can also access the declaration objects of a factory through the class' _meta
attribute:
CountryFactory._meta.declarations['code2'].reset()
Calling setter to reset iterator
Rather than explicitly resetting self._node_wave
, just define a private generator that takes care of cycling over the values.
def _wave_generator(self):
while True:
yield from sorted(WAVELENGTH, key=lambda k: random.random())
def __init__(self):
self._node_wave = _wave_generator()
@property
def node_wave(self):
return next(self._node_wave)
Because of how _wave_generator
is defined, next(self._node_wave)
will never raise StopIteration
. When one sorted list is exhausted, another one is automatically created.
Related Topics
Writing to Existing Workbook Using Xlwt
Group by & Count Function in SQLalchemy
Too Many Different Python Versions on My System and Causing Problems
Why Does This Not Work as an Array Membership Test
How to Normalize a Numpy Array to a Unit Vector
How to Break a Long Line to Multiple Lines in Python
How to Get Most Informative Features for Scikit-Learn Classifiers
How to Implement the Softmax Function in Python
Matplotlib Legend Markers Only Once
Configuring So That Pip Install Can Work from Github
Mysql-Python Install Error: Cannot Open Include File 'Config-Win.H'
How to Turn Off Info Logging in Spark
Pipe Subprocess Standard Output to a Variable
Is There Any Difference Between "Foo Is None" and "Foo == None"