Split pandas dataframe based on groupby
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
How to group by a common value and split into columns based on it in pandas?
Use DataFrame.set_index
with DataFrame.unstack
and DataFrame.sort_index
, last flatten MultiIndex:
df = df.set_index(['Labels','Pattern','Status']).unstack().sort_index(level=1, axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
Labels Pattern Count_Checked \
0 Apple Green 3
1 Apple Red 79
2 Grapes Violet 20
3 Orange Bad 28
4 Orange Good 13
Some_Link_Checked \
0 https://example.com/
1 http://www.example.com/
2 http://www.example.com/boundary/boat.aspx
3 https://www.example.org/?afternoon=approval&ba...
4 http://example.com/blow/bag
Some_Link2_Checked Count_Not_Checked \
0 https://example.com/aswq.php 306
1 http://www.example.com/beef/approval 221
2 http://basketball.example.com/bone/bedroom 290
3 https://www.example.com/#apparatus 281
4 https://www.example.com/ 297
Some_Link_Not_Checked \
0 http://www.example.com/www.php
1 http://angle.example.com/
2 http://www.example.com/sssl.php
3 https://example.net/babies/badge?amount=balance
4 https://www.example.com/?baby=brake
Some_Link2_Not_Checked
0 http://www.example.com/believe/bike.php?amount...
1 https://www.example.com/qqa.php
2 https://www.example.org/box/back
3 https://www.example.com/asd.php
4 http://www.example.com/?beef=acoustics
Split a data frame by a grouping and remove that group if the value in another column is invariant for a particular string
We can also do
library(dplyr)
df %>%
group_by(group) %>%
filter(sum(colour != 'blue') > 0)
# A tibble: 4 x 3
# Groups: group [2]
# ID group colour
# <chr> <chr> <chr>
#1 ID3 Group2 green
#2 ID4 Group3 green
#3 ID5 Group3 blue
#4 ID6 Group3 blue
How to split a pandas dataframe into many columns after groupby
I agree with @PhillipCloud. I assume that this is probably some intermediate step toward the solution of your problem, but maybe it's easier to just go strait to the thing you really want to solve without the intermediat step.
But if this is what you really want, you can do it using:
>>> df.groupby('time').apply(
lambda g: pd.Series(g['data'].values)
).rename(columns=lambda x: 'data%s' % x)
data0 data1
time
1 2 2.1
2 3 3.1
3 4 4.1
Do Group by on one column and then split the group on categorical column's 2 specific values and the finally get the First and last records
This should work:
g = df.groupby('Id')['Treatment'].transform(lambda x: (x.eq('Inactive').shift().fillna(0))).cumsum()
ndf = df.loc[g.groupby([df['Id'],g]).transform('count').ne(1)].groupby(['Id',g],as_index=False).nth([0,-1])
ndf.assign(PercentDrop = ndf.groupby(['Id',g])['Score'].pct_change())
Output:
Id Score Dx EncDate Treatment ProviderName PercentDrop
0 21 22 F11 2015-02-28 Active Doe, Kim NaN
2 21 9 F11 2015-04-30 Inactive Doe, Kim -0.307692
4 29 25 F72 2015-06-30 Active Lee, Mei NaN
8 29 8 F72 2015-10-31 Inactive Lee, Mei -0.272727
9 29 28 F72 2015-11-30 Active Lee, Mei NaN
12 29 8 F72 2016-02-29 Inactive Lee, Mei -0.466667
13 67 26 F72 2016-03-31 Active Shah, Neha NaN
16 67 10 F72 2016-06-30 Inactive Shah, Neha -0.375000
17 67 24 F72 2016-07-31 Active Shah, Neha NaN
19 67 7 F72 2016-09-30 Inactive Shah, Neha -0.533333
Pandas: Group by distinct values of each cell and split column into multiple columns
You're almost there.
cols = ['Department', 'Age', 'Salary']
parts = [df.groupby([col, 'Status']).Count.sum() for col in cols]
df2 = pd.concat(parts).unstack(fill_value=0)
I used pd.concat()
instead of repeated append()
because as you pointed out, append()
is not very good (it's slow).
Splitting on Status is easy: just add it to groupby()
and then unstack()
it at the end to turn it into column rather than row labels.
how to split data in groups by two column conditions pandas
You can use a mask and groupby
to split the dataframe:
cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)
groups = [g for k,g in df[mask].groupby((~mask).cumsum())]
output:
[ flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]
flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1
How to ensure the setNames() attributes the correct names to my group_split() datasets when splitting by multiple groups?
Instead of splitting by the two columns, use the factor column that was created, which ensures that it splits by the order of the levels
created in the type_factor
. In addition, using the unique
on type_factor
can have some issues if the order of the values in 'type_factor' is different i.e. unique
gets the first non-duplicated value based on its occurrence. Instead, levels
is better. In fact, it may be more appropriate to droplevels
as well in case of unused levels.
the_test <- example %>%
group_split(type_factor) %>%
setNames(levels(example$type_factor))
group_split
returns unnamed list
. If we want to avoid the pain of renaming incorrectly, use split
from base R
which does return a named list
. Thus, it can return in any order as long as the key/value pairs are correct
# 1 - return in a different order based on alphabetic order
split(example, example[c("even_or_odd", "prime_or_not")], drop = TRUE)
# 2 - return order based on the levels of the factor column
split(example, example$type_factor)
# 3 - With dplyr pipe
example %>%
split(.$type_factor)
# 4 - or using magrittr exposition operator
library(magrittr)
example %$%
split(x = ., f = type_factor)
Related Topics
Create 3D Plot Colored According to the Z-Axis
Importing Excel File Using Url Using Read.Xls
Continuous Colour of Geom_Line According to Y Value
Color Points with the Color as a Column in Ggplot2
How to Italicize One Category in a Legend in Ggplot2
R - Ggplot Line Color (Using Geom_Line) Doesn't Change
R: Find Vector in List of Vectors
Dplyr/Rlang: Parse_Expr with Multiple Expressions
Using the Geosphere Distm Function on a Data.Table to Calculate Distances
Milliseconds Puzzle When Calling Strptime in R
How to Divide a Number of Columns by One Column
What Are the Differences Between Concatenating Strings with Cat() and Paste()