Split a Column by Group

Split pandas dataframe based on groupby

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

How to group by a common value and split into columns based on it in pandas?

Use DataFrame.set_index with DataFrame.unstack and DataFrame.sort_index, last flatten MultiIndex:

df = df.set_index(['Labels','Pattern','Status']).unstack().sort_index(level=1, axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()

print (df)
   Labels Pattern  Count_Checked  \
0   Apple   Green              3   
1   Apple     Red             79   
2  Grapes  Violet             20   
3  Orange     Bad             28   
4  Orange    Good             13   

                                   Some_Link_Checked  \
0                               https://example.com/   
1                            http://www.example.com/   
2          http://www.example.com/boundary/boat.aspx   
3  https://www.example.org/?afternoon=approval&ba...   
4                        http://example.com/blow/bag   

                           Some_Link2_Checked  Count_Not_Checked  \
0                https://example.com/aswq.php                306   
1        http://www.example.com/beef/approval                221   
2  http://basketball.example.com/bone/bedroom                290   
3          https://www.example.com/#apparatus                281   
4                    https://www.example.com/                297   

                             Some_Link_Not_Checked  \
0                   http://www.example.com/www.php   
1                        http://angle.example.com/   
2                  http://www.example.com/sssl.php   
3  https://example.net/babies/badge?amount=balance   
4              https://www.example.com/?baby=brake   

                              Some_Link2_Not_Checked  
0  http://www.example.com/believe/bike.php?amount...  
1                    https://www.example.com/qqa.php  
2                   https://www.example.org/box/back  
3                    https://www.example.com/asd.php  
4             http://www.example.com/?beef=acoustics

Split a data frame by a grouping and remove that group if the value in another column is invariant for a particular string

We can also do

library(dplyr)
df %>%
    group_by(group)  %>%
    filter(sum(colour != 'blue') > 0)
# A tibble: 4 x 3
# Groups:   group [2]
#  ID    group  colour
#  <chr> <chr>  <chr> 
#1 ID3   Group2 green 
#2 ID4   Group3 green 
#3 ID5   Group3 blue  
#4 ID6   Group3 blue

How to split a pandas dataframe into many columns after groupby

I agree with @PhillipCloud. I assume that this is probably some intermediate step toward the solution of your problem, but maybe it's easier to just go strait to the thing you really want to solve without the intermediat step.

But if this is what you really want, you can do it using:

>>> df.groupby('time').apply(
        lambda g: pd.Series(g['data'].values)
    ).rename(columns=lambda x: 'data%s' % x)

      data0  data1
time              
1         2    2.1
2         3    3.1
3         4    4.1

Do Group by on one column and then split the group on categorical column's 2 specific values and the finally get the First and last records

This should work:

g = df.groupby('Id')['Treatment'].transform(lambda x: (x.eq('Inactive').shift().fillna(0))).cumsum()

ndf = df.loc[g.groupby([df['Id'],g]).transform('count').ne(1)].groupby(['Id',g],as_index=False).nth([0,-1])

ndf.assign(PercentDrop = ndf.groupby(['Id',g])['Score'].pct_change())

Output:

    Id  Score   Dx    EncDate Treatment ProviderName  PercentDrop
0   21     22  F11 2015-02-28    Active     Doe, Kim          NaN
2   21      9  F11 2015-04-30  Inactive     Doe, Kim    -0.307692
4   29     25  F72 2015-06-30    Active     Lee, Mei          NaN
8   29      8  F72 2015-10-31  Inactive     Lee, Mei    -0.272727
9   29     28  F72 2015-11-30    Active     Lee, Mei          NaN
12  29      8  F72 2016-02-29  Inactive     Lee, Mei    -0.466667
13  67     26  F72 2016-03-31    Active   Shah, Neha          NaN
16  67     10  F72 2016-06-30  Inactive   Shah, Neha    -0.375000
17  67     24  F72 2016-07-31    Active   Shah, Neha          NaN
19  67      7  F72 2016-09-30  Inactive   Shah, Neha    -0.533333

Pandas: Group by distinct values of each cell and split column into multiple columns

You're almost there.

cols = ['Department', 'Age', 'Salary']
parts = [df.groupby([col, 'Status']).Count.sum() for col in cols]
df2 = pd.concat(parts).unstack(fill_value=0)

I used pd.concat() instead of repeated append() because as you pointed out, append() is not very good (it's slow).

Splitting on Status is easy: just add it to groupby() and then unstack() it at the end to turn it into column rather than row labels.

how to split data in groups by two column conditions pandas

You can use a mask and groupby to split the dataframe:

cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)

groups = [g for k,g in df[mask].groupby((~mask).cumsum())]

output:

[    flag_0  flag_1  dd
 8        3       1   8
 9        3       1   1
 10       3       1   1
 11       3       1   1,
     flag_0  flag_1  dd
 14       3       1   7]

groups[0]

    flag_0  flag_1  dd
8        3       1   8
9        3       1   1
10       3       1   1
11       3       1   1

How to ensure the setNames() attributes the correct names to my group_split() datasets when splitting by multiple groups?

Instead of splitting by the two columns, use the factor column that was created, which ensures that it splits by the order of the levels created in the type_factor. In addition, using the unique on type_factor can have some issues if the order of the values in 'type_factor' is different i.e. unique gets the first non-duplicated value based on its occurrence. Instead, levels is better. In fact, it may be more appropriate to droplevels as well in case of unused levels.

the_test <- example %>%
  group_split(type_factor) %>% 
  setNames(levels(example$type_factor))

group_split returns unnamed list. If we want to avoid the pain of renaming incorrectly, use split from base R which does return a named list. Thus, it can return in any order as long as the key/value pairs are correct

# 1 - return in a different order based on alphabetic order
split(example, example[c("even_or_odd", "prime_or_not")], drop = TRUE)

# 2 - return order based on the levels of the factor column
split(example, example$type_factor)
# 3 - With dplyr pipe
example %>% 
       split(.$type_factor)
# 4 - or using magrittr exposition operator
library(magrittr)
example %$%
      split(x = ., f = type_factor)