Split Data.Frame into Groups by Column Name

Split pandas dataframe based on groupby

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

Split dataframe into smaller dataframe by column Names

Assume this is your dataframe:

 Name  price
0 aal 1
1 aal 2
2 aal 3
3 aal 4
4 aal 5
5 aal 6
6 bll 7
7 bll 8
8 bll 9
9 bll 8
10 dll 7
11 dll 56
12 dll 4
13 dll 3
14 dll 3
15 dll 5

Then do the following:

for Name, df in df.groupby('Name'):
df.to_csv("Price_{}".format(Name)+".csv", sep=";")

That'll save all sub-dataframes as csv.
To view what the code does:

for Name, df in df.groupby('Name'):
print(df)

returns:

Name  price
0 aal 1
1 aal 2
2 aal 3
3 aal 4
4 aal 5
5 aal 6
Name price
6 bll 7
7 bll 8
8 bll 9
9 bll 8
Name price
10 dll 7
11 dll 56
12 dll 4
13 dll 3
14 dll 3
15 dll 5

If you need to reset the index in every df, do this:

for Name, df in df.groupby('Name'):
gf = df.reset_index()
print(gf)

which gives:

index Name  price
0 0 aal 1
1 1 aal 2
2 2 aal 3
3 3 aal 4
4 4 aal 5
5 5 aal 6
index Name price
0 6 bll 7
1 7 bll 8
2 8 bll 9
3 9 bll 8
index Name price
0 10 dll 7
1 11 dll 56
2 12 dll 4
3 13 dll 3
4 14 dll 3
5 15 dll 5

how to split data in groups by two column conditions pandas

You can use a mask and groupby to split the dataframe:

cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)

groups = [g for k,g in df[mask].groupby((~mask).cumsum())]

output:

[    flag_0  flag_1  dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]

flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1

How do I split a data frame into groups of a fixed size?

This will give you a list of DataFrames:

lst = [df.iloc[i:i+group_size] for i in range(0,len(df)-group_size+1,group_size)]

It just uses built-in indexing, so it should be pretty fast. The fidgeting with the stop index takes care of discarding the last frame if it's too small - you can also break it down with

lst = [df.iloc[i:i+group_size] for i in range(0,len(df),group_size)]
if len(lst[-1]) < group_size:
lst.pop()

How to split dataframe into multiple dataframes based on column-name?

Use wide_to_long for reshape original DataFrame first and then aggregate mean:

cols = ['total_tracks']
df1 = (pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp')
.reset_index()
.drop('tmp', 1)
.groupby(cols, as_index=False)
.mean())

print (df1)
total_tracks t_dur t_dance
0 4 293071.000000 0.563667
1 8 157071.666667 0.886333
2 12 213577.666667 0.663000
3 17 216151.000000 0.766333
4 59 146673.333333 0.283667

Details:

cols = ['total_tracks']
print(pd.wide_to_long(df,
stubnames=['t_dur','t_dance'],
i=cols,
j='tmp'))

t_dur t_dance
total_tracks tmp
4 0 292720.0 0.549
12 0 213760.0 0.871
59 0 157124.0 0.289
8 0 127896.0 0.886
17 0 210320.0 0.724
4 1 293760.0 0.623
12 1 181000.0 0.702
59 1 130446.0 0.328
8 1 176351.0 0.947
17 1 226253.0 0.791
4 2 292733.0 0.519
12 2 245973.0 0.416
59 2 152450.0 0.234
8 2 166968.0 0.826
17 2 211880.0 0.784

How to split up a column of a dataframe into new columns in R?

With tidyverse, we could create a new group everytime c appears in the x column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c column names.

library(tidyverse)

results <- df %>%
group_by(idx = cumsum(x == "c")) %>%
filter(x != "c") %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>%
select(-rn)

Output

  c_1   c_2   c_3  
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d

However, if you really want duplicate names, then we could add on set_names:

purrr::set_names(results, "c")

c c c
<chr> <chr> <chr>
1 a b d
2 a b d
3 a b d
4 a b d

Or in base R, we could create the grouping with cumsum, then split those groups, then bind back together with cbind. Then, we remove the first row that contains the c characters.

names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]

# c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d


Related Topics



Leave a reply



Submit