Iterate an iterator by chunks (of n) in Python?
The grouper()
recipe from the itertools
documentation's recipes comes close to what you want:
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
It will fill up the last chunk with a fill value, though.
A less general solution that only works on sequences but does handle the last chunk as desired is
[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]
Finally, a solution that works on general iterators and behaves as desired is
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk
How to iterate over a list in chunks
Modified from the Recipes section of Python's itertools
docs:
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
Example
grouper('ABCDEFG', 3, 'x') # --> 'ABC' 'DEF' 'Gxx'
Note: on Python 2 use izip_longest
instead of zip_longest
.
how to split an iterable in constant-size chunks
This is probably more efficient (faster)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
Example using list
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data
for x in batch(data, 3):
print(x)
# Output
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]
It avoids building new lists.
Chunking a generator
Each time you call g()
you restart the generator from the beginning. You need to assign the result to a variable so it will keep using the same generator.
And as mentioned in a comment, the islice
object is always truthy. To tell if you reached the end, check whether the for c in chunk:
loop did anything.
from itertools import islice
def g():
for x in range(11):
print("generating: ", x)
yield x
size = 2
gen = g()
while True:
chunk = islice(gen, size)
print("at chunk")
empty = True
for c in chunk:
print(c)
empty = False
if empty:
break
idiomatic way to take groups of n items from a list in Python?
From http://docs.python.org/library/itertools.html:
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
i = grouper(3,range(100))
i.next()
(0, 1, 2)
How to iterate a file in chunks?
So, you actually want the functionality provided by itertools.groupby
. This will work if your first-column is sorted:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> with io.StringIO(s) as f:
... for k, g in groupby(f, itemgetter(0)):
... print(list(g))
...
['1 foo bar\n', '1 lorem ipsum gypsum\n', '1 baba loo too\n']
['2 hello goodbye seeya\n']
['3 kobe magic wilt\n', '3 foo sneaks bar\n', '3 more stuff\n', '3 last line in file']
>>>
If you want to clean up that output a bit, you can map str.split
onto your group:
>>> with io.StringIO(s) as f:
... for k, g in groupby(f, itemgetter(0)):
... print(list(map(str.strip, g)))
...
['1 foo bar', '1 lorem ipsum gypsum', '1 baba loo too']
['2 hello goodbye seeya']
['3 kobe magic wilt', '3 foo sneaks bar', '3 more stuff', '3 last line in file']
If you wanted to implement this from scratch, an inflexible and naive generator could look something like this:
>>> def groupby_first_column(f):
... line = next(f)
... k = line[0]
... group = [line]
... for line in f:
... if line[0] == k:
... group.append(line)
... else:
... yield group
... group = [line]
... k = line[0]
... yield group
...
>>> with io.StringIO(s) as f:
... for group in groupby_first_column(f):
... print(list(group))
...
['1 foo bar\n', '1 lorem ipsum gypsum\n', '1 baba loo too\n']
['2 hello goodbye seeya\n']
['3 kobe magic wilt\n', '3 foo sneaks bar\n', '3 more stuff\n', '3 last line in file']
>>>
Warning the above generator only works if each line has the first column in exactly the first position, and it is only 1 character long. This was not meant to be very useful, only to illustrate the idea. If you wanted to roll your own, you would have to be much more thorough
Python consume an iterator pair-wise
First of all, don't use the variable name iter
, because that's already the name of a builtin function.
To answer your question, simply use itertools.izip
(Python 2) or zip
(Python 3) on the iterator.
Your code may look as simple as
for next_1, next_2 in zip(iterator, iterator):
# stuff
edit: whoops, my original answer was the correct one all along, don't mind the itertools recipe.
edit 2: Consider itertools.izip_longest
if you deal with iterators that could yield an uneven amount of objects:
>>> from itertools import izip_longest
>>> iterator = (x for x in (1,2,3))
>>>
>>> for next_1, next_2 in izip_longest(iterator, iterator):
... next_1, next_2
...
(1, 2)
(3, None)
Split a generator into chunks without pre-walking it
One way would be to peek at the first element, if any, and then create and return the actual generator.
def head(iterable, max=10):
first = next(iterable) # raise exception when depleted
def head_inner():
yield first # yield the extracted first element
for cnt, el in enumerate(iterable):
yield el
if cnt + 1 >= max: # cnt + 1 to include first
break
return head_inner()
Just use this in your chunk
generator and catch the StopIteration
exception like you did with your custom exception.
Update: Here's another version, using itertools.islice
to replace most of the head
function, and a for
loop. This simple for
loop in fact does exactly the same thing as that unwieldy while-try-next-except-break
construct in the original code, so the result is much more readable.
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator: # stops when iterator is depleted
def chunk(): # construct generator for next chunk
yield first # yield element from for loop
for more in islice(iterator, size - 1):
yield more # yield more elements from the iterator
yield chunk() # in outer generator, yield next chunk
And we can get even shorter than that, using itertools.chain
to replace the inner generator:
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
split a generator/iterable every n items in python (splitEvery)
from itertools import islice
def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))
Some tests:
>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]
>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]
>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']
>>> list(split_every(100, []))
[]
Related Topics
I Can't Install Pyaudio on Windows? How to Solve "Error: Microsoft Visual C++ 14.0 Is Required."
How to Set Time Limit on Raw_Input
How to Groupby Consecutive Values in Pandas Dataframe
Error: Pandas Hashtable Keyerror
Tensorflow Not Found Using Pip
Import Error: No Module Name Urllib2
Run Certain Code Every N Seconds
Python Error "Importerror: No Module Named"
Determine Whether Integer Is Between Two Other Integers
Fast Punctuation Removal with Pandas
What Do Ellipsis [...] Mean in a List
How to Filter Foreignkey Choices in a Django Modelform
What's the Difference Between Globals(), Locals(), and Vars()
How to Protect My Python Scripts on Raspberry Pi
Remove Duplicates by Columns A, Keeping the Row with the Highest Value in Column B