Iterate an Iterator by Chunks (Of N) in Python

Iterate an iterator by chunks (of n) in Python?

The grouper() recipe from the itertools documentation's recipes comes close to what you want:

def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

It will fill up the last chunk with a fill value, though.

A less general solution that only works on sequences but does handle the last chunk as desired is

[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]

Finally, a solution that works on general iterators and behaves as desired is

def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk

How to iterate over a list in chunks

Modified from the Recipes section of Python's itertools docs:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

Example

grouper('ABCDEFG', 3, 'x')  # --> 'ABC' 'DEF' 'Gxx'

Note: on Python 2 use izip_longest instead of zip_longest.

how to split an iterable in constant-size chunks

This is probably more efficient (faster)

def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
print x

Example using list

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

It avoids building new lists.

Chunking a generator

Each time you call g() you restart the generator from the beginning. You need to assign the result to a variable so it will keep using the same generator.

And as mentioned in a comment, the islice object is always truthy. To tell if you reached the end, check whether the for c in chunk: loop did anything.

from itertools import islice

def g():
for x in range(11):
print("generating: ", x)
yield x

size = 2
gen = g()
while True:
chunk = islice(gen, size)

print("at chunk")
empty = True
for c in chunk:
print(c)
empty = False

if empty:
break

idiomatic way to take groups of n items from a list in Python?

From http://docs.python.org/library/itertools.html:

from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

i = grouper(3,range(100))
i.next()
(0, 1, 2)

How to iterate a file in chunks?

So, you actually want the functionality provided by itertools.groupby. This will work if your first-column is sorted:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> with io.StringIO(s) as f:
... for k, g in groupby(f, itemgetter(0)):
... print(list(g))
...
['1 foo bar\n', '1 lorem ipsum gypsum\n', '1 baba loo too\n']
['2 hello goodbye seeya\n']
['3 kobe magic wilt\n', '3 foo sneaks bar\n', '3 more stuff\n', '3 last line in file']
>>>

If you want to clean up that output a bit, you can map str.split onto your group:

>>> with io.StringIO(s) as f:
... for k, g in groupby(f, itemgetter(0)):
... print(list(map(str.strip, g)))
...
['1 foo bar', '1 lorem ipsum gypsum', '1 baba loo too']
['2 hello goodbye seeya']
['3 kobe magic wilt', '3 foo sneaks bar', '3 more stuff', '3 last line in file']

If you wanted to implement this from scratch, an inflexible and naive generator could look something like this:

>>> def groupby_first_column(f):
... line = next(f)
... k = line[0]
... group = [line]
... for line in f:
... if line[0] == k:
... group.append(line)
... else:
... yield group
... group = [line]
... k = line[0]
... yield group
...
>>> with io.StringIO(s) as f:
... for group in groupby_first_column(f):
... print(list(group))
...
['1 foo bar\n', '1 lorem ipsum gypsum\n', '1 baba loo too\n']
['2 hello goodbye seeya\n']
['3 kobe magic wilt\n', '3 foo sneaks bar\n', '3 more stuff\n', '3 last line in file']
>>>

Warning the above generator only works if each line has the first column in exactly the first position, and it is only 1 character long. This was not meant to be very useful, only to illustrate the idea. If you wanted to roll your own, you would have to be much more thorough

Python consume an iterator pair-wise

First of all, don't use the variable name iter, because that's already the name of a builtin function.

To answer your question, simply use itertools.izip (Python 2) or zip (Python 3) on the iterator.

Your code may look as simple as

for next_1, next_2 in zip(iterator, iterator):
# stuff

edit: whoops, my original answer was the correct one all along, don't mind the itertools recipe.

edit 2: Consider itertools.izip_longest if you deal with iterators that could yield an uneven amount of objects:

>>> from itertools import izip_longest
>>> iterator = (x for x in (1,2,3))
>>>
>>> for next_1, next_2 in izip_longest(iterator, iterator):
... next_1, next_2
...
(1, 2)
(3, None)

Split a generator into chunks without pre-walking it

One way would be to peek at the first element, if any, and then create and return the actual generator.

def head(iterable, max=10):
first = next(iterable) # raise exception when depleted
def head_inner():
yield first # yield the extracted first element
for cnt, el in enumerate(iterable):
yield el
if cnt + 1 >= max: # cnt + 1 to include first
break
return head_inner()

Just use this in your chunk generator and catch the StopIteration exception like you did with your custom exception.


Update: Here's another version, using itertools.islice to replace most of the head function, and a for loop. This simple for loop in fact does exactly the same thing as that unwieldy while-try-next-except-break construct in the original code, so the result is much more readable.

def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator: # stops when iterator is depleted
def chunk(): # construct generator for next chunk
yield first # yield element from for loop
for more in islice(iterator, size - 1):
yield more # yield more elements from the iterator
yield chunk() # in outer generator, yield next chunk

And we can get even shorter than that, using itertools.chain to replace the inner generator:

def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))

split a generator/iterable every n items in python (splitEvery)

from itertools import islice

def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))

Some tests:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]


Related Topics



Leave a reply



Submit