Pandas Groupby, Then Sort Within Groups

pandas groupby, then sort within groups

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)
Out[65]:
job     source
market  A         5
        D         4
        B         3
sales   E         7
        C         6
        B         4
dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

pandas: sorting observations within groupby groups

Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation

The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly.

In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. They could do:

df.sort(['job','count'],ascending=False).groupby('job').head(3)

grouping, counting by time and then sorting within group using pandas

Group by 2 columns

grouper = df.groupby([pd.Grouper(key='TimeCreated', freq='1D'), 'Institution_Name'])

grouper = grouper.count().groupby('TimeCreated', group_keys=False)

Sort the elements(the count) in each group of date

grouper_count_desc = grouper.apply(lambda x: x.sort_values(by='EventID', ascending=False))

In[65]: grouper_count_desc
Out[65]: 
                              EventID
TimeCreated Institution_Name         
2021-03-22  H2                      7
            H1                      1
2021-03-23  H2                      2
            H8                      2
            H1                      1
            H10                     1
            H3                      1
            H4                      1
            H5                      1
            H6                      1
            H7                      1
            H9                      1

Sort the groups of date. The order of elements in each group wouldn't change

grouper_date_asc = grouper_count_desc.sort_values(by='TimeCreated', ascending=True)

In[70]: grouper_date_desc = grouper_count_desc.sort_values(by='TimeCreated', ascending=False) # to show result, I used descending
In[71]: grouper_date_desc
Out[71]: 
                              EventID
TimeCreated Institution_Name         
2021-03-23  H2                      2
            H8                      2
            H1                      1
            H10                     1
            H3                      1
            H4                      1
            H5                      1
            H6                      1
            H7                      1
            H9                      1
2021-03-22  H2                      7
            H1                      1

Reset index and show result

print(grouper_date_asc.reset_index())

Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

You should reset the index in the last passage:

df_2 = df_1.groupby(level=0, group_keys=False).apply(
                   lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()

Then you can plot with seaborn.FacetGrid:

g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')

plt.show()

Sample Image

pandas sort within group then aggregation

Use DataFrame.sort_values before groupby, if need apply same function is possible use list of columns names:

df = (toy_data.sort_values(['session_id','log_time'])
              .groupby('session_id')[['query','log_time','cate_feat_0', 'num_feat_0']]
              .agg(list))

    
print (df)
                      query log_time       cate_feat_0 num_feat_0
session_id                                                       
1              [hi, pandas]   [4, 6]    [apple, apple]     [1, 3]
2           [groupby, dude]   [1, 5]  [banana, banana]     [4, 2]
3               [sort, agg]   [2, 3]   [apple, banana]     [5, 6]

Pandas groupby sort within groups retaining multiple aggregates

I've sorted it. Instead of indexing the grouped table and doing the subsequent groupby and sort_values as above, I needed to apply the sort_values to the un-indexed DataFrame, specifying the column to sort on explicitly:

g = dfg.groupby(level=0, group_keys=False).apply(
      lambda x: x.sort_values(('rating', 'mean'), ascending=False).head(2))

Giving me the desired result:

Sample Image

pandas groupby with many categories and sort them by value

You can use:

out = (df.groupby(['country', 'goodCode'], as_index=False).sum()
         .sort_values(['country', 'Totalvalue'], ascending=[True, False]))

For example:

>>> df
  country  goodCode  Totalvalue
0      CN  27000400  1700000000
1      KZ  15000000   700000000
2      AN  45000000   200000000
3      CN  65000000   100000000
4      CA  15000000    50000000
5      AE  27000400    25000000
6      KZ  37000400    20000000

>>> df.sort_values(['country', 'Totalvalue'], ascending=[True, False])
  country  goodCode  Totalvalue
5      AE  27000400    25000000
2      AN  45000000   200000000
4      CA  15000000    50000000
0      CN  27000400  1700000000
3      CN  65000000   100000000
1      KZ  15000000   700000000
6      KZ  37000400    20000000

Pandas groupby and then sort based on groups

I don't believe this is the nicest answer, but I found a way to do it.

I grouped the total list first and saved the total count per concept_label as a variable that I then merged with the existing dataframe. This way I can just sort on that column and secondary on the actual count.

#adding count column to existing table
df_grouped = df.groupby(['concept_label'])['concept_label'].agg(['count']).sort_values(by=['count'])
df_grouped.rename(columns={'count':'concept_count'}, inplace=True)
df_count = pd.merge(df, df_grouped, left_on='concept_label', right_on='concept_label')

#sorting
df_sentiment = df_count.groupby(['concept_label','source_uri','concept_count'])['sentiment_article'].agg(['mean', 'count']).sort_values(by=['concept_count','count'], ascending=False)

Pandas Groupby, Then Sort Within Groups