Split a Column by Group

Split pandas dataframe based on groupby

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

How to group by a common value and split into columns based on it in pandas?

Use DataFrame.set_index with DataFrame.unstack and DataFrame.sort_index, last flatten MultiIndex:

df = df.set_index(['Labels','Pattern','Status']).unstack().sort_index(level=1, axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()


print (df)
Labels Pattern Count_Checked \
0 Apple Green 3
1 Apple Red 79
2 Grapes Violet 20
3 Orange Bad 28
4 Orange Good 13

Some_Link_Checked \
0 https://example.com/
1 http://www.example.com/
2 http://www.example.com/boundary/boat.aspx
3 https://www.example.org/?afternoon=approval&ba...
4 http://example.com/blow/bag

Some_Link2_Checked Count_Not_Checked \
0 https://example.com/aswq.php 306
1 http://www.example.com/beef/approval 221
2 http://basketball.example.com/bone/bedroom 290
3 https://www.example.com/#apparatus 281
4 https://www.example.com/ 297

Some_Link_Not_Checked \
0 http://www.example.com/www.php
1 http://angle.example.com/
2 http://www.example.com/sssl.php
3 https://example.net/babies/badge?amount=balance
4 https://www.example.com/?baby=brake

Some_Link2_Not_Checked
0 http://www.example.com/believe/bike.php?amount...
1 https://www.example.com/qqa.php
2 https://www.example.org/box/back
3 https://www.example.com/asd.php
4 http://www.example.com/?beef=acoustics

Split a data frame by a grouping and remove that group if the value in another column is invariant for a particular string

We can also do

library(dplyr)
df %>%
group_by(group) %>%
filter(sum(colour != 'blue') > 0)
# A tibble: 4 x 3
# Groups: group [2]
# ID group colour
# <chr> <chr> <chr>
#1 ID3 Group2 green
#2 ID4 Group3 green
#3 ID5 Group3 blue
#4 ID6 Group3 blue

How to split a pandas dataframe into many columns after groupby

I agree with @PhillipCloud. I assume that this is probably some intermediate step toward the solution of your problem, but maybe it's easier to just go strait to the thing you really want to solve without the intermediat step.

But if this is what you really want, you can do it using:

>>> df.groupby('time').apply(
lambda g: pd.Series(g['data'].values)
).rename(columns=lambda x: 'data%s' % x)

data0 data1
time
1 2 2.1
2 3 3.1
3 4 4.1

Do Group by on one column and then split the group on categorical column's 2 specific values and the finally get the First and last records

This should work:

g = df.groupby('Id')['Treatment'].transform(lambda x: (x.eq('Inactive').shift().fillna(0))).cumsum()

ndf = df.loc[g.groupby([df['Id'],g]).transform('count').ne(1)].groupby(['Id',g],as_index=False).nth([0,-1])

ndf.assign(PercentDrop = ndf.groupby(['Id',g])['Score'].pct_change())

Output:

    Id  Score   Dx    EncDate Treatment ProviderName  PercentDrop
0 21 22 F11 2015-02-28 Active Doe, Kim NaN
2 21 9 F11 2015-04-30 Inactive Doe, Kim -0.307692
4 29 25 F72 2015-06-30 Active Lee, Mei NaN
8 29 8 F72 2015-10-31 Inactive Lee, Mei -0.272727
9 29 28 F72 2015-11-30 Active Lee, Mei NaN
12 29 8 F72 2016-02-29 Inactive Lee, Mei -0.466667
13 67 26 F72 2016-03-31 Active Shah, Neha NaN
16 67 10 F72 2016-06-30 Inactive Shah, Neha -0.375000
17 67 24 F72 2016-07-31 Active Shah, Neha NaN
19 67 7 F72 2016-09-30 Inactive Shah, Neha -0.533333

Pandas: Group by distinct values of each cell and split column into multiple columns

You're almost there.

cols = ['Department', 'Age', 'Salary']
parts = [df.groupby([col, 'Status']).Count.sum() for col in cols]
df2 = pd.concat(parts).unstack(fill_value=0)

I used pd.concat() instead of repeated append() because as you pointed out, append() is not very good (it's slow).

Splitting on Status is easy: just add it to groupby() and then unstack() it at the end to turn it into column rather than row labels.

how to split data in groups by two column conditions pandas

You can use a mask and groupby to split the dataframe:

cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)

groups = [g for k,g in df[mask].groupby((~mask).cumsum())]

output:

[    flag_0  flag_1  dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]

flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1

How to ensure the setNames() attributes the correct names to my group_split() datasets when splitting by multiple groups?

Instead of splitting by the two columns, use the factor column that was created, which ensures that it splits by the order of the levels created in the type_factor. In addition, using the unique on type_factor can have some issues if the order of the values in 'type_factor' is different i.e. unique gets the first non-duplicated value based on its occurrence. Instead, levels is better. In fact, it may be more appropriate to droplevels as well in case of unused levels.

the_test <- example %>%
group_split(type_factor) %>%
setNames(levels(example$type_factor))

group_split returns unnamed list. If we want to avoid the pain of renaming incorrectly, use split from base R which does return a named list. Thus, it can return in any order as long as the key/value pairs are correct

# 1 - return in a different order based on alphabetic order
split(example, example[c("even_or_odd", "prime_or_not")], drop = TRUE)

# 2 - return order based on the levels of the factor column
split(example, example$type_factor)
# 3 - With dplyr pipe
example %>%
split(.$type_factor)
# 4 - or using magrittr exposition operator
library(magrittr)
example %$%
split(x = ., f = type_factor)


Related Topics



Leave a reply



Submit