Split Dataframe into Relatively Even Chunks According to Length

Split dataframe into relatively even chunks according to length

You can take the floor division of a sequence up to the amount of rows in the dataframe, and use it to groupby splitting the dataframe into equally sized chunks:

n = 400
for g, df in test.groupby(np.arange(len(test)) // n):
    print(df.shape)
# (400, 2)
# (400, 2)
# (311, 2)

Pandas - Slice large dataframe into chunks

You can use list comprehension to split your dataframe into smaller dataframes contained in a list.

n = 200000  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]

Or use numpy array_split, see this comment for discrepancies:

list_df = np.array_split(df, n)

You can access the chunks with:

list_df[0]
list_df[1]
etc...

Then you can assemble it back into a one dataframe using pd.concat.

By AcctName

list_df = []

for n,g in df.groupby('AcctName'):
    list_df.append(g)

Split a pandas dataframe every 5 rows

Use floor division on the index to create your groups, then we can use DataFrame.groupby to create different dataframes:

grps = df.groupby(df.index // 5)

for _, dfg in grps:
    print(dfg)


  COLUMN_Y
0   value1
1   value2
2   value3
3   value4
4   value5 

  COLUMN_Y
5   value6
6   value7
7   value8
8   value9
9  value10 

   COLUMN_Y
10  value11
11  value12
12  value13
13  value14
14  value15 

   COLUMN_Y
15  value16

How can I evenly split up a pandas.DataFrame into n-groups?

Use np.array_split to break it up into a list of "evenly" sized DataFrames. You can shuffle too if you sample the full DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(24).reshape(-1,2), columns=['A', 'B'])
N = 5    

np.array_split(df, N)
#np.array_split(df.sample(frac=1), N)  # Shuffle and split

[   A  B
 0  0  1
 1  2  3
 2  4  5,
     A   B
 3   6   7
 4   8   9
 5  10  11,
     A   B
 6  12  13
 7  14  15,
     A   B
 8  16  17
 9  18  19,
      A   B
 10  20  21
 11  22  23]

Split one dataframe to multiple with maximum n rows for each in Python

One way using pandas.Dataframe.groupby:

n = 10
[d for _, d in df.groupby(df.index//n)]

Output:

[          a         b         c
 0  0.897134 -0.356157 -0.396212
 1 -2.357861  2.066570 -0.512687
 2 -0.080665  0.719328  0.604294
 3 -0.639392 -0.912989 -1.029892
 4 -0.550007 -0.633733 -0.748733
 5 -0.712962 -1.612912 -0.248270
 6 -0.571474  1.310807 -0.271137
 7 -0.228068  0.675771  0.433016
 8  0.005606 -0.154633  0.985484
 9  0.691329 -0.837302 -0.607225,
            a         b         c
 10 -0.011909 -0.304162  0.422001
 11  0.127570  0.956831  1.837523
 12 -1.074771  0.379723 -1.889117
 13 -1.449475 -0.799574 -0.878192
 14 -1.029757  0.551023  2.519929
 15 -1.001400  0.838614 -1.006977
 16  0.677216 -0.403859  0.451338
 17  0.221596 -0.323259  0.324158
 18 -0.241935 -2.251687 -0.088494
 19 -0.995426  0.665569 -2.228848,
            a         b         c
 20  1.714709 -0.353391  0.671539
 21  0.155050  1.136433 -0.005721
 22 -0.502412 -0.610901  1.520165
 23 -0.853906  0.648321  1.124464
 24  1.149151 -0.187300 -0.412946
 25  0.329229 -1.690569 -2.746895]

Split dataframe to sub dataframes and fill content according to the relevant dataframe?

You can create a dict of .groupby() objects of x grouped by id, as follows:

x_df_dict = {a: b for a, b in df.groupby('id')['x']}

Then, you can access the sub-dataframes (more accurately sub-Series) of x by id, as follows:

print(x_df_dict[1])

0    A
1    B
2    C
3    D
4    E
Name: x, dtype: object

print(x_df_dict[2])

5    A
6    D
7    E
8    F
9    Z
Name: x, dtype: object

Split Dataframe into Relatively Even Chunks According to Length