Pandas rank by multiple columns
Another way would be to type-cast both the columns of interest to str
and combine them by concatenating them. Convert these back to numerical values so that they could be differentiated based on their magnitude.
In method=dense
, ranks of duplicated values would remain unchanged. (Here: 6)
Since you want to rank these in their descending order, specifying ascending=False
in Series.rank()
would let you achieve the desired result.
col1 = df["SaleCount"].astype(str)
col2 = df["TotalRevenue"].astype(str)
df['Rank'] = (col1+col2).astype(int).rank(method='dense', ascending=False).astype(int)
df.sort_values('Rank')
Pandas ranking order based on two columns
Your code gives me expected result.
I can get something similar to your current result if I change columns order ["Bucket","Count"]
to ["Count","Bucket"]
- so maybe you use wrong data.
Minimal working code:
import pandas as pd
data = {
'B': [5,5,4,4,3,2,2,1],
'C': [60,3,10,2,35,2,2,3],
}
df = pd.DataFrame(data)
df['CB'] = df[['C', 'B']].apply(tuple, axis=1).rank(method='dense', ascending=False).astype(int)
df['BC'] = df[['B', 'C']].apply(tuple, axis=1).rank(method='dense', ascending=False).astype(int)
print(df)
Result:
B C CB BC
0 5 60 1 1
1 5 3 4 2
2 4 10 3 3
3 4 2 6 4
4 3 35 2 5
5 2 2 7 6
6 2 2 7 6
7 1 3 5 7
Pandas - Group by and rank within group based on multiple columns
method1:
df.sort_values(['asset_id', 'method_rank', 'conf_score'], ascending=[True, True, False], inplace=True)
df['overall_rank'] = 1
df['overall_rank'] = df.groupby(['asset_id'])['overall_rank'].cumsum()
df
asset_id method_id method_rank conf_score overall_rank
2 10 p4 2 0.8 1
1 10 p3 2 0.6 2
0 10 p2 5 0.8 3
5 20 p2 1 0.5 1
3 20 p3 2 0.9 2
4 20 p1 5 0.7 3
method2:
define a function to sort every group:
def handle_group(group):
group.sort_values(['method_rank', 'conf_score'], ascending=[True, False], inplace=True)
group['overall_rank'] = np.arange(1, len(group)+1)
return group
df.groupby('asset_id', as_index=False).apply(handle_group)
performance test:
def run1(df):
df = df.sort_values(['asset_id', 'method_rank', 'conf_score'], ascending=[True, True, False])
df['overall_rank'] = 1
df['overall_rank'] = df.groupby(['asset_id'])['overall_rank'].cumsum()
return df
def handle_group(group):
group.sort_values(['method_rank', 'conf_score'], ascending=[True, False], inplace=True)
group['overall_rank'] = np.arange(1, len(group)+1)
return group
def run2(df):
df = df.groupby('asset_id', as_index=False).apply(handle_group)
return df
dfn = pd.concat([df]*10000, ignore_index=True)
%%timeit
df1 = run1(dfn)
# 8.61 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df2 = run2(dfn).droplevel(0)
# 31.6 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Pandas rank based on several columns
sort_values('event_id')
prior to grouping then pass method='first'
to rank
Also note that if occurred_at
isn't already datetime
, make it datetime
.
# unnecessary if already datetime, but doesn't hurt to do it anyway
df.occurred_at = pd.to_datetime(df.occurred_at)
df['rank'] = df.sort_values('event_id') \
.groupby('user_id').occurred_at \
.rank(method='first')
df
Reference for complete verifiable code
from StringIO import StringIO
import pandas as pd
text = """event_id occurred_at user_id
19148 2015-10-01 1
19693 2015-10-05 2
20589 2015-10-12 1
20996 2015-10-15 1
20998 2015-10-15 1
23301 2015-10-23 2
23630 2015-10-26 1
25172 2015-11-03 1
31699 2015-12-11 1
32186 2015-12-14 2
43426 2016-01-13 1
68300 2016-04-04 2
71926 2016-04-19 1"""
df = pd.read_csv(StringIO(text), delim_whitespace=True)
df.occurred_at = pd.to_datetime(df.occurred_at)
df['rank'] = df.sort_values('event_id').groupby('user_id').occurred_at.rank(method='first')
df
Rank DataFrame based on multiple columns
Here is a one-line approach using sort_values
:
In [135]: df['rank'] = df.sort_values(['col_a', 'col_b'])['col_b'].index + 1
In [136]: df
Out[136]:
col_a col_b rank
0 0 5 2
1 0 2 1
2 0 8 3
3 1 3 4
4 1 7 6
5 1 4 5
The logic behind this snippet: Basically, the DataFrame.sort_values
function accepts multiple column names and returns a sorted copy of the dataframe based on the order of passed column names. The default sorting order is ascending
which is what we want. If you wanted another order you could pass the order as an iterable of booleans to the ascending
keyword argument. At the end the new indices of the column_b
is what we want (plus one).
Is there a way to rank a value within multiple rows and columns in Pandas Dataframe?
Try this with join
and apply
:
lst = sorted(df.values.flatten().tolist())[::-1]
print(df.join(df.apply(lambda x: [lst.index(i) + 1 for i in x]), rsuffix='_rank'))
Output:
Column1 Column2 Column3 Column1_rank Column2_rank Column3_rank
Row1 60 20 10 3 8 9
Row2 40 30 80 6 7 1
Row3 70 50 50 2 4 4
How can I rank based on condition in Pandas
Use:
#convert columns to numeric
df[['Ratio','Value']]=df[['Ratio','Value']].apply(lambda x: x.str.strip('%')).astype(float)
Remove row with CPI
by condition - test rows if no only CPI
per Cluster
:
m = df['Group'].eq('CPI')
m1 = ~df['Cluster'].isin(df.loc[m, 'Cluster']) | m
df['RankRatio'] = df[m1].groupby('Cluster')['Ratio'].rank(method='first', ascending=True)
df['RankValue'] = df[m1].groupby('Cluster')['Value'].rank(method='first', ascending=False)
print (df)
Cluster Variable Group Ratio Value RankRatio RankValue
0 1 GDP_M3 GDP 20.0 70.0 1.0 2.0
1 1 HPI_M6 HPI 40.0 80.0 3.0 1.0
2 1 GDP_lg2 GDP 35.0 50.0 2.0 3.0
3 2 CPI_M9 CPI 10.0 50.0 NaN NaN
4 2 HPI_lg6 HPI 15.0 65.0 1.0 1.0
5 3 CPI_lg12 CPI 15.0 90.0 1.0 2.0
6 3 CPI_lg1 CPI 20.0 95.0 2.0 1.0
How it working:
For mask2
are filter all Cluster
values if match mask1
and filtered original column Cluster
, then invert mask by ~
. Last chain both conditions by |
for bitwise OR
for all rows without CPI
if exist with another values per Cluster
:
print (df.assign(mask1 = m, mask2 = ~df['Cluster'].isin(df.loc[m, 'Cluster']), both = m1))
Cluster Variable Group Ratio Value mask1 mask2 both
0 1 GDP_M3 GDP 20.0 70.0 False True True
1 1 HPI_M6 HPI 40.0 80.0 False True True
2 1 GDP_lg2 GDP 35.0 50.0 False True True
3 2 CPI_M9 CPI 10.0 50.0 True False True
4 2 HPI_lg6 HPI 15.0 65.0 False False False
5 3 CPI_lg12 CPI 15.0 90.0 True False True
6 3 CPI_lg1 CPI 20.0 95.0 True False True
EDIT:
df[['Ratio','Value']]=df[['Ratio','Value']].apply(lambda x: x.str.strip('%')).astype(float)
m = df['Group'].isin(['CPI','HPI'])
m2 = df.groupby('Cluster')['Group'].transform('nunique').ne(1)
m1 = (~df['Cluster'].isin(df.loc[~m, 'Cluster']) | m) & m2
df['RankRatio'] = df[~m1].groupby('Cluster')['Ratio'].rank(method='first', ascending=True)
df['RankValue'] = df[~m1].groupby('Cluster')['Value'].rank(method='first', ascending=False)
print (df)
Cluster Variable Group Ratio Value RankRatio RankValue
0 1 GDP_M3 GDP 20.0 70.0 1.0 1.0
1 1 HPI_M6 HPI 40.0 80.0 NaN NaN
2 1 GDP_lg2 GDP 35.0 50.0 2.0 2.0
3 2 CPI_M9 CPI 10.0 50.0 NaN NaN
4 2 HPI_lg6 HPI 15.0 65.0 NaN NaN
5 3 CPI_lg12 CPI 15.0 90.0 1.0 2.0
6 3 CPI_lg1 CPI 20.0 95.0 2.0 1.0
print (df.assign(mask1 = m, mask2 = ~df['Cluster'].isin(df.loc[~m, 'Cluster']), m2=m2, all = ~m1))
Cluster Variable Group Ratio Value RankRatio RankValue mask1 mask2 \
0 1 GDP_M3 GDP 20.0 70.0 1.0 1.0 False False
1 1 HPI_M6 HPI 40.0 80.0 NaN NaN True False
2 1 GDP_lg2 GDP 35.0 50.0 2.0 2.0 False False
3 2 CPI_M9 CPI 10.0 50.0 NaN NaN True True
4 2 HPI_lg6 HPI 15.0 65.0 NaN NaN True True
5 3 CPI_lg12 CPI 15.0 90.0 1.0 2.0 True True
6 3 CPI_lg1 CPI 20.0 95.0 2.0 1.0 True True
m2 all
0 True True
1 True False
2 True True
3 True False
4 True False
5 False True
6 False True
Pandas Ranking for String Columns
In your case do
df['new'] = df.groupby(['item_number','location_id'])['Date'].rank(ascending=False)
0 5.0
1 4.0
2 3.0
3 2.0
4 1.0
5 5.0
6 4.0
7 3.0
8 2.0
9 1.0
Name: Date, dtype: float64
How to rank rows in pandas with multiple condition?
USE GROUPBY
:
df = df.sort_values(['S','L','C','CTR','IM'], ascending=[True,True,True,False,False])
df['Rank'] = df.groupby(['S','L','C']).cumcount() + 1
OR:
df = df.convert_dtypes()
def rank_count(x):
x = x.sort_values(['CTR','IM'], ascending=False).reset_index(drop=True)
x['Rank'] = x.index + 1
return x
result = df.groupby(['S','L','C'], sort=False).apply(rank_count).reset_index(drop=True)
Related Topics
How to Open Different Urls At the Same Time by Using Python Selenium
How to Check Whether All Elements of Array Are in Between Two Values
Hiding Axis Text in Matplotlib Plots
Regex to Match Digits and At Most One Space Between Them
Python: Requests.Exceptions.Connectionerror. Max Retries Exceeded With Url
How to Copy/Repeat an Array N Times into a New Array
Retrieve Top N in Each Group of a Dataframe in Pyspark
Python Multiprocessing Pool Hangs At Join
Windowserror: [Error 126] the Specified Module Could Not Be Found
A Better Way Than Looping and Calling Functions That Loop and Call Another Functions
How to Move to One Folder Back in Python
How to Share Single Sqlite Connection in Multi-Threaded Python Application
Opencv - Saving Images to a Particular Folder of Choice
Optimal Way to Store Data from Pandas to Snowflake
Sqlalchemy - Select for Update Example