Split Data.Frame into Groups by Column Name

Split pandas dataframe based on groupby

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

Split dataframe into smaller dataframe by column Names

Assume this is your dataframe:

 Name  price
0   aal      1
1   aal      2
2   aal      3
3   aal      4
4   aal      5
5   aal      6
6   bll      7
7   bll      8
8   bll      9
9   bll      8
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

Then do the following:

for Name, df in df.groupby('Name'):
    df.to_csv("Price_{}".format(Name)+".csv", sep=";")

That'll save all sub-dataframes as csv.
To view what the code does:

for Name, df in df.groupby('Name'):
    print(df)

returns:

Name  price
0  aal      1
1  aal      2
2  aal      3
3  aal      4
4  aal      5
5  aal      6
  Name  price
6  bll      7
7  bll      8
8  bll      9
9  bll      8
   Name  price
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

If you need to reset the index in every df, do this:

for Name, df in df.groupby('Name'):
    gf = df.reset_index()
    print(gf)

which gives:

index Name  price
0      0  aal      1
1      1  aal      2
2      2  aal      3
3      3  aal      4
4      4  aal      5
5      5  aal      6
   index Name  price
0      6  bll      7
1      7  bll      8
2      8  bll      9
3      9  bll      8
   index Name  price
0     10  dll      7
1     11  dll     56
2     12  dll      4
3     13  dll      3
4     14  dll      3
5     15  dll      5

how to split data in groups by two column conditions pandas

You can use a mask and groupby to split the dataframe:

cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)

groups = [g for k,g in df[mask].groupby((~mask).cumsum())]

output:

[    flag_0  flag_1  dd
 8        3       1   8
 9        3       1   1
 10       3       1   1
 11       3       1   1,
     flag_0  flag_1  dd
 14       3       1   7]

groups[0]

    flag_0  flag_1  dd
8        3       1   8
9        3       1   1
10       3       1   1
11       3       1   1

How do I split a data frame into groups of a fixed size?

This will give you a list of DataFrames:

lst = [df.iloc[i:i+group_size] for i in range(0,len(df)-group_size+1,group_size)]

It just uses built-in indexing, so it should be pretty fast. The fidgeting with the stop index takes care of discarding the last frame if it's too small - you can also break it down with

lst = [df.iloc[i:i+group_size] for i in range(0,len(df),group_size)]
if len(lst[-1]) < group_size:
   lst.pop()

How to split dataframe into multiple dataframes based on column-name?

Use wide_to_long for reshape original DataFrame first and then aggregate mean:

cols = ['total_tracks']
df1 = (pd.wide_to_long(df, 
                     stubnames=['t_dur','t_dance'], 
                     i=cols,
                     j='tmp')
        .reset_index()
        .drop('tmp', 1)
        .groupby(cols, as_index=False)
        .mean())

print (df1)
   total_tracks          t_dur   t_dance
0             4  293071.000000  0.563667
1             8  157071.666667  0.886333
2            12  213577.666667  0.663000
3            17  216151.000000  0.766333
4            59  146673.333333  0.283667

Details:

cols = ['total_tracks']
print(pd.wide_to_long(df, 
                     stubnames=['t_dur','t_dance'], 
                     i=cols,
                     j='tmp'))

                     t_dur  t_dance
total_tracks tmp                   
4            0    292720.0    0.549
12           0    213760.0    0.871
59           0    157124.0    0.289
8            0    127896.0    0.886
17           0    210320.0    0.724
4            1    293760.0    0.623
12           1    181000.0    0.702
59           1    130446.0    0.328
8            1    176351.0    0.947
17           1    226253.0    0.791
4            2    292733.0    0.519
12           2    245973.0    0.416
59           2    152450.0    0.234
8            2    166968.0    0.826
17           2    211880.0    0.784

How to split up a column of a dataframe into new columns in R?

With tidyverse, we could create a new group everytime c appears in the x column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c column names.

library(tidyverse)

results <- df %>% 
  group_by(idx = cumsum(x == "c")) %>% 
  filter(x != "c") %>% 
  mutate(rn = row_number()) %>% 
  pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>% 
  select(-rn)

Output

  c_1   c_2   c_3  
  <chr> <chr> <chr>
1 a     b     d    
2 a     b     d    
3 a     b     d    
4 a     b     d

However, if you really want duplicate names, then we could add on set_names:

purrr::set_names(results, "c")

  c     c     c    
  <chr> <chr> <chr>
1 a     b     d    
2 a     b     d    
3 a     b     d    
4 a     b     d

Or in base R, we could create the grouping with cumsum, then split those groups, then bind back together with cbind. Then, we remove the first row that contains the c characters.

names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]

#  c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d