Split a generator into chunks without pre-walking it
One way would be to peek at the first element, if any, and then create and return the actual generator.
def head(iterable, max=10):
first = next(iterable) # raise exception when depleted
def head_inner():
yield first # yield the extracted first element
for cnt, el in enumerate(iterable):
yield el
if cnt + 1 >= max: # cnt + 1 to include first
break
return head_inner()
Just use this in your chunk
generator and catch the StopIteration
exception like you did with your custom exception.
Update: Here's another version, using itertools.islice
to replace most of the head
function, and a for
loop. This simple for
loop in fact does exactly the same thing as that unwieldy while-try-next-except-break
construct in the original code, so the result is much more readable.
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator: # stops when iterator is depleted
def chunk(): # construct generator for next chunk
yield first # yield element from for loop
for more in islice(iterator, size - 1):
yield more # yield more elements from the iterator
yield chunk() # in outer generator, yield next chunk
And we can get even shorter than that, using itertools.chain
to replace the inner generator:
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
Chunking a generator
Each time you call g()
you restart the generator from the beginning. You need to assign the result to a variable so it will keep using the same generator.
And as mentioned in a comment, the islice
object is always truthy. To tell if you reached the end, check whether the for c in chunk:
loop did anything.
from itertools import islice
def g():
for x in range(11):
print("generating: ", x)
yield x
size = 2
gen = g()
while True:
chunk = islice(gen, size)
print("at chunk")
empty = True
for c in chunk:
print(c)
empty = False
if empty:
break
how to split an iterable in constant-size chunks
This is probably more efficient (faster)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
Example using list
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data
for x in batch(data, 3):
print(x)
# Output
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]
It avoids building new lists.
Split list into chunks with repeats between chunks
Something like this with list comprehension:
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
Example:
import math
l = list(range(15))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
m, M = 2, 5
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [[0, 1, 2, 3, 4],
# [3, 4, 5, 6, 7],
# [6, 7, 8, 9, 10],
# [9, 10, 11, 12, 13],
# [12, 13, 14]]
m, M = 3, 5
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [[0, 1, 2, 3, 4],
# [2, 3, 4, 5, 6],
# [4, 5, 6, 7, 8],
# [6, 7, 8, 9, 10],
# [8, 9, 10, 11, 12],
# [10, 11, 12, 13, 14]]
l = range(5)
m, M = 2, 3
[l[i*(M-m):i*(M-m)+M] for i in range(math.ceil((len(l)-m)/(M-m)))]
# [range(0, 3), range(1, 4), range(2, 5)]
Explanation:
Chunk i
starts at index i*(M-m)
and ends M positions later at index i*(M-m) + M
.
chunk index starts ends
-------------------------------------------------
0 0 M
1 M-m M-m+M = 2*M-m
2 2*M-m-m=2(M-m) 2*(M-m)+M = 3M-2m
...
Now the problem is to determine how many chunks.
At each step we increase the initial index by M-m
, so to count the total number of steps we need to divide the length of the list by M-m
(but after subtracting m
because in the first chunk we're not skipping anything).
Finally, use the ceiling function to add the last incomplete chunk in case the division is not exact.
iterating over an iterable using generator function
Try initializing tuple inside for
loop
def bunch_together(iterable,n):
for k in range(0,len(iterable),n):
tup = tuple()
for i in range(k,k+n):
tup += (iterable[i] if i<len(iterable) else None,) # condition to check overflow
yield tup
for x in bunch_together(range(10),3):
print(x)
Output
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)
(9, None, None)
Split dataframe into relatively even chunks according to length
You can take the floor division of a sequence up to the amount of rows in the dataframe, and use it to groupby
splitting the dataframe into equally sized chunks:
n = 400
for g, df in test.groupby(np.arange(len(test)) // n):
print(df.shape)
# (400, 2)
# (400, 2)
# (311, 2)
How to randomly split a generator into two generators by given ratio?
This solution doesn't store values. It sets up two identical generators and two identical random number streams. The generators share the same cutoff percentage and one only yields below it and the other only yields above it:
from random import Random
def percentage_generators(generator, percentage):
def generator_1(state):
twister = Random()
twister.setstate(state)
for value in generator():
if twister.random() < percentage:
yield value
def generator_2(state):
twister = Random()
twister.setstate(state)
for value in generator():
if twister.random() >= percentage:
yield value
state = Random().getstate()
return [generator_1(state), generator_2(state)]
if __name__ == "__main__":
def test_generator():
for n in range(20):
yield n
generator1, generator2 = percentage_generators(test_generator, 0.7)
for number in generator1:
print(1, number)
print()
for number in generator2:
print(2, number)
OUTPUT
% python3 test.py
1 0
1 1
1 2
1 3
1 6
1 7
1 8
1 10
1 11
1 12
1 13
1 14
1 15
1 17
2 4
2 5
2 9
2 16
2 18
2 19
%
The code can probably be reduced by generating the generator wrappers via a loop, i.e. looping over operator.lt
and operator.ge
, or some such.
Related Topics
Launch a Completely Independent Process
Python Comparison Operators Chaining/Grouping Left to Right
Numpy Random Choice to Produce a 2D-Array with All Unique Values
Append to a List Defined in a Tuple - Is It a Bug
How to Use Qscrollarea to Make Scrollbars Appear
How to Scroll Frame Using Mouse Wheel & Adding Horizontal Scrollbar
Paramiko Ssh Die/Hang with Big Output
Filling in Login Forms in Instagram Using Selenium and Webdriver (Chrome) Python Osx
Log into Gmail Using Selenium in Python
How to I Lazily Read Multiple JSON Values from a File/Stream in Python
How to Apply Piecewise Linear Fit in Python
Could Not Find a Version That Satisfies the Requirement Tensorflow
Character Reading from File in Python
Binary Representation of Float in Python (Bits Not Hex)
List Directory Tree Structure in Python
Update a Dataframe in Pandas While Iterating Row by Row
How to Convert a Time.Struct_Time Object into a Datetime Object