Split pandas dataframe based on groupby
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
Split dataframe into smaller dataframe by column Names
Assume this is your dataframe:
Name price
0 aal 1
1 aal 2
2 aal 3
3 aal 4
4 aal 5
5 aal 6
6 bll 7
7 bll 8
8 bll 9
9 bll 8
10 dll 7
11 dll 56
12 dll 4
13 dll 3
14 dll 3
15 dll 5
Then do the following:
for Name, df in df.groupby('Name'):
df.to_csv("Price_{}".format(Name)+".csv", sep=";")
That'll save all sub-dataframes as csv.
To view what the code does:
for Name, df in df.groupby('Name'):
print(df)
returns:
Name price
0 aal 1
1 aal 2
2 aal 3
3 aal 4
4 aal 5
5 aal 6
Name price
6 bll 7
7 bll 8
8 bll 9
9 bll 8
Name price
10 dll 7
11 dll 56
12 dll 4
13 dll 3
14 dll 3
15 dll 5
If you need to reset the index in every df, do this:
for Name, df in df.groupby('Name'):
gf = df.reset_index()
print(gf)
which gives:
index Name price
0 0 aal 1
1 1 aal 2
2 2 aal 3
3 3 aal 4
4 4 aal 5
5 5 aal 6
index Name price
0 6 bll 7
1 7 bll 8
2 8 bll 9
3 9 bll 8
index Name price
0 10 dll 7
1 11 dll 56
2 12 dll 4
3 13 dll 3
4 14 dll 3
5 15 dll 5
how to split data in groups by two column conditions pandas
You can use a mask and groupby
to split the dataframe:
cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)
groups = [g for k,g in df[mask].groupby((~mask).cumsum())]
output:
[ flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]
flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1
How do I split a data frame into groups of a fixed size?
This will give you a list of DataFrames:
lst = [df.iloc[i:i+group_size] for i in range(0,len(df)-group_size+1,group_size)]
It just uses built-in indexing, so it should be pretty fast. The fidgeting with the stop index takes care of discarding the last frame if it's too small - you can also break it down with
lst = [df.iloc[i:i+group_size] for i in range(0,len(df),group_size)]
if len(lst[-1]) < group_size:
lst.pop()
How to split dataframe into multiple dataframes based on column-name?
Use wide_to_long
for reshape original DataFrame first and then aggregate mean
:
cols = ['total_tracks']
df1 = (pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp')
.reset_index()
.drop('tmp', 1)
.groupby(cols, as_index=False)
.mean())
print (df1)
total_tracks t_dur t_dance
0 4 293071.000000 0.563667
1 8 157071.666667 0.886333
2 12 213577.666667 0.663000
3 17 216151.000000 0.766333
4 59 146673.333333 0.283667
Details:
cols = ['total_tracks']
print(pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp'))
t_dur t_dance
total_tracks tmp
4 0 292720.0 0.549
12 0 213760.0 0.871
59 0 157124.0 0.289
8 0 127896.0 0.886
17 0 210320.0 0.724
4 1 293760.0 0.623
12 1 181000.0 0.702
59 1 130446.0 0.328
8 1 176351.0 0.947
17 1 226253.0 0.791
4 2 292733.0 0.519
12 2 245973.0 0.416
59 2 152450.0 0.234
8 2 166968.0 0.826
17 2 211880.0 0.784
How to split up a column of a dataframe into new columns in R?
With tidyverse
, we could create a new group everytime c
appears in the x
column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c
column names.
library(tidyverse)
results <- df %>%
group_by(idx = cumsum(x == "c")) %>%
filter(x != "c") %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>%
select(-rn)
Output
c_1 c_2 c_3
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d
However, if you really want duplicate names, then we could add on set_names
:
purrr::set_names(results, "c")
c c c
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d
Or in base R, we could create the grouping with cumsum
, then split those groups, then bind back together with cbind
. Then, we remove the first row that contains the c
characters.
names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]
# c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d
Related Topics
How to Run a Function Every Second
Use of .By and .Eachi in the Data.Table Package
Understanding Lm and Environment
How to Log an R Session to a File
Labelling the Plots with Images on Graph in Ggplot2
Convert Map Data to Data Frame Using Fortify {Ggplot2} for Spatial Objects in R
How to Prevent User from Setting the End Date Before the Start Date Using the Shiny Daterangeinput
How to Configure Box.Color in Directlabels "Draw.Rects"
Get(X) Does Not Work in R Data.Table When X Is Also a Column in the Data Table
As.Posixct Gives an Unexpected Timezone
Extract Time (Hms) from Lubridate Date Time Object
Plot Margins in Rmarkdown/Knitr
How to Print the Name of Current Row When Using Apply in R
How to Put a Complicated Equation into a R Formula
Major and Minor Tickmarks with Plotly