pandas groupby, then sort within groups
What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.
Starting from the result of the first groupby:
In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})
We group by the first level of the index:
In [63]: g = df_agg['count'].groupby('job', group_keys=False)
Then we want to sort ('order') each group and take the first three elements:
In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))
However, for this, there is a shortcut function to do this, nlargest
:
In [65]: g.nlargest(3)
Out[65]:
job source
market A 5
D 4
B 3
sales E 7
C 6
B 4
dtype: int64
So in one go, this looks like:
df_agg['count'].groupby('job', group_keys=False).nlargest(3)
pandas: sorting observations within groupby groups
Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation
The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly.
In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. They could do:
df.sort(['job','count'],ascending=False).groupby('job').head(3)
grouping, counting by time and then sorting within group using pandas
- Group by 2 columns
grouper = df.groupby([pd.Grouper(key='TimeCreated', freq='1D'), 'Institution_Name'])
grouper = grouper.count().groupby('TimeCreated', group_keys=False)
- Sort the elements(the count) in each group of date
grouper_count_desc = grouper.apply(lambda x: x.sort_values(by='EventID', ascending=False))
In[65]: grouper_count_desc
Out[65]:
EventID
TimeCreated Institution_Name
2021-03-22 H2 7
H1 1
2021-03-23 H2 2
H8 2
H1 1
H10 1
H3 1
H4 1
H5 1
H6 1
H7 1
H9 1
- Sort the groups of date. The order of elements in each group wouldn't change
grouper_date_asc = grouper_count_desc.sort_values(by='TimeCreated', ascending=True)
In[70]: grouper_date_desc = grouper_count_desc.sort_values(by='TimeCreated', ascending=False) # to show result, I used descending
In[71]: grouper_date_desc
Out[71]:
EventID
TimeCreated Institution_Name
2021-03-23 H2 2
H8 2
H1 1
H10 1
H3 1
H4 1
H5 1
H6 1
H7 1
H9 1
2021-03-22 H2 7
H1 1
- Reset index and show result
print(grouper_date_asc.reset_index())
Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet
You should reset the index in the last passage:
df_2 = df_1.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()
Then you can plot with seaborn.FacetGrid
:
g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')
plt.show()
pandas sort within group then aggregation
Use DataFrame.sort_values
before groupby
, if need apply same function is possible use list of columns names:
df = (toy_data.sort_values(['session_id','log_time'])
.groupby('session_id')[['query','log_time','cate_feat_0', 'num_feat_0']]
.agg(list))
print (df)
query log_time cate_feat_0 num_feat_0
session_id
1 [hi, pandas] [4, 6] [apple, apple] [1, 3]
2 [groupby, dude] [1, 5] [banana, banana] [4, 2]
3 [sort, agg] [2, 3] [apple, banana] [5, 6]
Pandas groupby sort within groups retaining multiple aggregates
I've sorted it. Instead of indexing the grouped table and doing the subsequent groupby
and sort_values
as above, I needed to apply the sort_values
to the un-indexed DataFrame
, specifying the column to sort on explicitly:
g = dfg.groupby(level=0, group_keys=False).apply(
lambda x: x.sort_values(('rating', 'mean'), ascending=False).head(2))
Giving me the desired result:
pandas groupby with many categories and sort them by value
You can use:
out = (df.groupby(['country', 'goodCode'], as_index=False).sum()
.sort_values(['country', 'Totalvalue'], ascending=[True, False]))
For example:
>>> df
country goodCode Totalvalue
0 CN 27000400 1700000000
1 KZ 15000000 700000000
2 AN 45000000 200000000
3 CN 65000000 100000000
4 CA 15000000 50000000
5 AE 27000400 25000000
6 KZ 37000400 20000000
>>> df.sort_values(['country', 'Totalvalue'], ascending=[True, False])
country goodCode Totalvalue
5 AE 27000400 25000000
2 AN 45000000 200000000
4 CA 15000000 50000000
0 CN 27000400 1700000000
3 CN 65000000 100000000
1 KZ 15000000 700000000
6 KZ 37000400 20000000
Pandas groupby and then sort based on groups
I don't believe this is the nicest answer, but I found a way to do it.
I grouped the total list first and saved the total count per concept_label as a variable that I then merged with the existing dataframe. This way I can just sort on that column and secondary on the actual count.
#adding count column to existing table
df_grouped = df.groupby(['concept_label'])['concept_label'].agg(['count']).sort_values(by=['count'])
df_grouped.rename(columns={'count':'concept_count'}, inplace=True)
df_count = pd.merge(df, df_grouped, left_on='concept_label', right_on='concept_label')
#sorting
df_sentiment = df_count.groupby(['concept_label','source_uri','concept_count'])['sentiment_article'].agg(['mean', 'count']).sort_values(by=['concept_count','count'], ascending=False)
Related Topics
How to Add an Image or Icon to a Button Rectangle in Pygame
Why Is It String.Join(List) Instead of List.Join(String)
How Could I Use Requests in Asyncio
How to Install Psycopg2 with "Pip" on Python
Making Python Loggers Output All Messages to Stdout in Addition to Log File
How to Find the Location of Python Module Sources
How to Run Python Code from Sublime Text 2
Turn a String into a Valid Filename
Creating a Pandas Dataframe from a Numpy Array: How to Specify the Index Column and Column Headers
How to Get Indices of a Sorted Array in Python
How to Write a Multidimensional Array to a Text File
Weird Try-Except-Else-Finally Behavior with Return Statements
How to Remove Nan Value While Combining Two Column in Panda Data Frame
How to Merge a Transparent Png Image with Another Image Using Pil