Pandas Groupby, Then Sort Within Groups

pandas groupby, then sort within groups

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)
Out[65]:
job source
market A 5
D 4
B 3
sales E 7
C 6
B 4
dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

pandas: sorting observations within groupby groups

Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation

The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly.

In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. They could do:

df.sort(['job','count'],ascending=False).groupby('job').head(3)

grouping, counting by time and then sorting within group using pandas

  1. Group by 2 columns
grouper = df.groupby([pd.Grouper(key='TimeCreated', freq='1D'), 'Institution_Name'])

grouper = grouper.count().groupby('TimeCreated', group_keys=False)

  1. Sort the elements(the count) in each group of date
grouper_count_desc = grouper.apply(lambda x: x.sort_values(by='EventID', ascending=False))
In[65]: grouper_count_desc
Out[65]:
EventID
TimeCreated Institution_Name
2021-03-22 H2 7
H1 1
2021-03-23 H2 2
H8 2
H1 1
H10 1
H3 1
H4 1
H5 1
H6 1
H7 1
H9 1

  1. Sort the groups of date. The order of elements in each group wouldn't change
grouper_date_asc = grouper_count_desc.sort_values(by='TimeCreated', ascending=True)
In[70]: grouper_date_desc = grouper_count_desc.sort_values(by='TimeCreated', ascending=False) # to show result, I used descending
In[71]: grouper_date_desc
Out[71]:
EventID
TimeCreated Institution_Name
2021-03-23 H2 2
H8 2
H1 1
H10 1
H3 1
H4 1
H5 1
H6 1
H7 1
H9 1
2021-03-22 H2 7
H1 1


  1. Reset index and show result
print(grouper_date_asc.reset_index())

Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

You should reset the index in the last passage:

df_2 = df_1.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()

Then you can plot with seaborn.FacetGrid:

g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')

plt.show()

Sample Image

pandas sort within group then aggregation

Use DataFrame.sort_values before groupby, if need apply same function is possible use list of columns names:

df = (toy_data.sort_values(['session_id','log_time'])
.groupby('session_id')[['query','log_time','cate_feat_0', 'num_feat_0']]
.agg(list))


print (df)
query log_time cate_feat_0 num_feat_0
session_id
1 [hi, pandas] [4, 6] [apple, apple] [1, 3]
2 [groupby, dude] [1, 5] [banana, banana] [4, 2]
3 [sort, agg] [2, 3] [apple, banana] [5, 6]

Pandas groupby sort within groups retaining multiple aggregates

I've sorted it. Instead of indexing the grouped table and doing the subsequent groupby and sort_values as above, I needed to apply the sort_values to the un-indexed DataFrame, specifying the column to sort on explicitly:

g = dfg.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('rating', 'mean'), ascending=False).head(2))

Giving me the desired result:

Sample Image

pandas groupby with many categories and sort them by value

You can use:

out = (df.groupby(['country', 'goodCode'], as_index=False).sum()
.sort_values(['country', 'Totalvalue'], ascending=[True, False]))

For example:

>>> df
country goodCode Totalvalue
0 CN 27000400 1700000000
1 KZ 15000000 700000000
2 AN 45000000 200000000
3 CN 65000000 100000000
4 CA 15000000 50000000
5 AE 27000400 25000000
6 KZ 37000400 20000000

>>> df.sort_values(['country', 'Totalvalue'], ascending=[True, False])
country goodCode Totalvalue
5 AE 27000400 25000000
2 AN 45000000 200000000
4 CA 15000000 50000000
0 CN 27000400 1700000000
3 CN 65000000 100000000
1 KZ 15000000 700000000
6 KZ 37000400 20000000

Pandas groupby and then sort based on groups

I don't believe this is the nicest answer, but I found a way to do it.

I grouped the total list first and saved the total count per concept_label as a variable that I then merged with the existing dataframe. This way I can just sort on that column and secondary on the actual count.

#adding count column to existing table
df_grouped = df.groupby(['concept_label'])['concept_label'].agg(['count']).sort_values(by=['count'])
df_grouped.rename(columns={'count':'concept_count'}, inplace=True)
df_count = pd.merge(df, df_grouped, left_on='concept_label', right_on='concept_label')

#sorting
df_sentiment = df_count.groupby(['concept_label','source_uri','concept_count'])['sentiment_article'].agg(['mean', 'count']).sort_values(by=['concept_count','count'], ascending=False)


Related Topics



Leave a reply



Submit