Pandas Rank by Multiple Columns

Pandas rank by multiple columns

Another way would be to type-cast both the columns of interest to str and combine them by concatenating them. Convert these back to numerical values so that they could be differentiated based on their magnitude.

In method=dense, ranks of duplicated values would remain unchanged. (Here: 6)

Since you want to rank these in their descending order, specifying ascending=False in Series.rank() would let you achieve the desired result.

col1 = df["SaleCount"].astype(str) 
col2 = df["TotalRevenue"].astype(str)
df['Rank'] = (col1+col2).astype(int).rank(method='dense', ascending=False).astype(int)
df.sort_values('Rank')

enter image description here

Pandas ranking order based on two columns

Your code gives me expected result.

I can get something similar to your current result if I change columns order ["Bucket","Count"] to ["Count","Bucket"] - so maybe you use wrong data.

Minimal working code:

import pandas as pd

data = {
    'B': [5,5,4,4,3,2,2,1], 
    'C': [60,3,10,2,35,2,2,3],
}

df = pd.DataFrame(data)

df['CB'] = df[['C', 'B']].apply(tuple, axis=1).rank(method='dense', ascending=False).astype(int)
df['BC'] = df[['B', 'C']].apply(tuple, axis=1).rank(method='dense', ascending=False).astype(int)

print(df)

Result:

   B   C  CB  BC
0  5  60   1   1
1  5   3   4   2
2  4  10   3   3
3  4   2   6   4
4  3  35   2   5
5  2   2   7   6
6  2   2   7   6
7  1   3   5   7

Pandas - Group by and rank within group based on multiple columns

method1:

df.sort_values(['asset_id', 'method_rank', 'conf_score'], ascending=[True, True, False], inplace=True)
df['overall_rank'] = 1
df['overall_rank'] = df.groupby(['asset_id'])['overall_rank'].cumsum()

   asset_id method_id  method_rank  conf_score  overall_rank
2        10        p4            2         0.8             1
1        10        p3            2         0.6             2
0        10        p2            5         0.8             3
5        20        p2            1         0.5             1
3        20        p3            2         0.9             2
4        20        p1            5         0.7             3

method2:

define a function to sort every group:

def handle_group(group):
    group.sort_values(['method_rank', 'conf_score'], ascending=[True, False], inplace=True)
    group['overall_rank'] = np.arange(1, len(group)+1)
    return group

df.groupby('asset_id', as_index=False).apply(handle_group)

performance test:

def run1(df):
    df = df.sort_values(['asset_id', 'method_rank', 'conf_score'], ascending=[True, True, False])
    df['overall_rank'] = 1
    df['overall_rank'] = df.groupby(['asset_id'])['overall_rank'].cumsum()    
    return df

def handle_group(group):
    group.sort_values(['method_rank', 'conf_score'], ascending=[True, False], inplace=True)
    group['overall_rank'] = np.arange(1, len(group)+1)
    return group

def run2(df):
    df = df.groupby('asset_id', as_index=False).apply(handle_group)
    return df

dfn = pd.concat([df]*10000, ignore_index=True)

%%timeit
df1 = run1(dfn)
# 8.61 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit
df2 = run2(dfn).droplevel(0)
# 31.6 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Pandas rank based on several columns

sort_values('event_id') prior to grouping then pass method='first' to rank

Also note that if occurred_at isn't already datetime, make it datetime.

# unnecessary if already datetime, but doesn't hurt to do it anyway
df.occurred_at = pd.to_datetime(df.occurred_at) 

df['rank'] = df.sort_values('event_id') \
                 .groupby('user_id').occurred_at \
                 .rank(method='first')
df

Sample Image

Reference for complete verifiable code

from StringIO import StringIO
import pandas as pd

text = """event_id  occurred_at  user_id
   19148   2015-10-01        1
   19693   2015-10-05        2
   20589   2015-10-12        1
   20996   2015-10-15        1
   20998   2015-10-15        1
   23301   2015-10-23        2
   23630   2015-10-26        1
   25172   2015-11-03        1
   31699   2015-12-11        1
   32186   2015-12-14        2
   43426   2016-01-13        1
   68300   2016-04-04        2
   71926   2016-04-19        1"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

df.occurred_at = pd.to_datetime(df.occurred_at) 

df['rank'] = df.sort_values('event_id').groupby('user_id').occurred_at.rank(method='first')

df

Rank DataFrame based on multiple columns

Here is a one-line approach using sort_values:

In [135]: df['rank'] = df.sort_values(['col_a', 'col_b'])['col_b'].index + 1

In [136]: df
Out[136]: 
   col_a  col_b  rank
0      0      5     2
1      0      2     1
2      0      8     3
3      1      3     4
4      1      7     6
5      1      4     5

The logic behind this snippet: Basically, the DataFrame.sort_values function accepts multiple column names and returns a sorted copy of the dataframe based on the order of passed column names. The default sorting order is ascending which is what we want. If you wanted another order you could pass the order as an iterable of booleans to the ascending keyword argument. At the end the new indices of the column_b is what we want (plus one).

Is there a way to rank a value within multiple rows and columns in Pandas Dataframe?

Try this with join and apply:

lst = sorted(df.values.flatten().tolist())[::-1]
print(df.join(df.apply(lambda x: [lst.index(i) + 1 for i in x]), rsuffix='_rank'))

Output:

      Column1  Column2  Column3  Column1_rank  Column2_rank  Column3_rank
Row1       60       20       10             3             8             9
Row2       40       30       80             6             7             1
Row3       70       50       50             2             4             4

How can I rank based on condition in Pandas

Use:

#convert columns to numeric
df[['Ratio','Value']]=df[['Ratio','Value']].apply(lambda x: x.str.strip('%')).astype(float)

Remove row with CPI by condition - test rows if no only CPI per Cluster:

m = df['Group'].eq('CPI')
m1 = ~df['Cluster'].isin(df.loc[m, 'Cluster']) | m
df['RankRatio'] = df[m1].groupby('Cluster')['Ratio'].rank(method='first', ascending=True)
df['RankValue'] = df[m1].groupby('Cluster')['Value'].rank(method='first', ascending=False)


print (df)
   Cluster  Variable Group  Ratio  Value  RankRatio  RankValue
0        1    GDP_M3   GDP   20.0   70.0        1.0        2.0
1        1    HPI_M6   HPI   40.0   80.0        3.0        1.0
2        1   GDP_lg2   GDP   35.0   50.0        2.0        3.0
3        2    CPI_M9   CPI   10.0   50.0        NaN        NaN
4        2   HPI_lg6   HPI   15.0   65.0        1.0        1.0
5        3  CPI_lg12   CPI   15.0   90.0        1.0        2.0
6        3   CPI_lg1   CPI   20.0   95.0        2.0        1.0

How it working:

For mask2 are filter all Cluster values if match mask1 and filtered original column Cluster, then invert mask by ~. Last chain both conditions by | for bitwise OR for all rows without CPI if exist with another values per Cluster:

print (df.assign(mask1 = m, mask2 = ~df['Cluster'].isin(df.loc[m, 'Cluster']), both = m1))
   Cluster  Variable Group  Ratio  Value  mask1  mask2   both
0        1    GDP_M3   GDP   20.0   70.0  False   True   True
1        1    HPI_M6   HPI   40.0   80.0  False   True   True
2        1   GDP_lg2   GDP   35.0   50.0  False   True   True
3        2    CPI_M9   CPI   10.0   50.0   True  False   True
4        2   HPI_lg6   HPI   15.0   65.0  False  False  False
5        3  CPI_lg12   CPI   15.0   90.0   True  False   True
6        3   CPI_lg1   CPI   20.0   95.0   True  False   True

EDIT:

df[['Ratio','Value']]=df[['Ratio','Value']].apply(lambda x: x.str.strip('%')).astype(float)

m = df['Group'].isin(['CPI','HPI'])
m2 = df.groupby('Cluster')['Group'].transform('nunique').ne(1)
m1 = (~df['Cluster'].isin(df.loc[~m, 'Cluster']) | m) & m2
df['RankRatio'] = df[~m1].groupby('Cluster')['Ratio'].rank(method='first', ascending=True)
df['RankValue'] = df[~m1].groupby('Cluster')['Value'].rank(method='first', ascending=False)
print (df)
   Cluster  Variable Group  Ratio  Value  RankRatio  RankValue
0        1    GDP_M3   GDP   20.0   70.0        1.0        1.0
1        1    HPI_M6   HPI   40.0   80.0        NaN        NaN
2        1   GDP_lg2   GDP   35.0   50.0        2.0        2.0
3        2    CPI_M9   CPI   10.0   50.0        NaN        NaN
4        2   HPI_lg6   HPI   15.0   65.0        NaN        NaN
5        3  CPI_lg12   CPI   15.0   90.0        1.0        2.0
6        3   CPI_lg1   CPI   20.0   95.0        2.0        1.0

print (df.assign(mask1 = m, mask2 = ~df['Cluster'].isin(df.loc[~m, 'Cluster']), m2=m2, all = ~m1))
   Cluster  Variable Group  Ratio  Value  RankRatio  RankValue  mask1  mask2  \
0        1    GDP_M3   GDP   20.0   70.0        1.0        1.0  False  False   
1        1    HPI_M6   HPI   40.0   80.0        NaN        NaN   True  False   
2        1   GDP_lg2   GDP   35.0   50.0        2.0        2.0  False  False   
3        2    CPI_M9   CPI   10.0   50.0        NaN        NaN   True   True   
4        2   HPI_lg6   HPI   15.0   65.0        NaN        NaN   True   True   
5        3  CPI_lg12   CPI   15.0   90.0        1.0        2.0   True   True   
6        3   CPI_lg1   CPI   20.0   95.0        2.0        1.0   True   True   

      m2    all  
0   True   True  
1   True  False  
2   True   True  
3   True  False  
4   True  False  
5  False   True  
6  False   True

Pandas Ranking for String Columns

In your case do

df['new'] = df.groupby(['item_number','location_id'])['Date'].rank(ascending=False)
0    5.0
1    4.0
2    3.0
3    2.0
4    1.0
5    5.0
6    4.0
7    3.0
8    2.0
9    1.0
Name: Date, dtype: float64

How to rank rows in pandas with multiple condition?

USE GROUPBY:

df = df.sort_values(['S','L','C','CTR','IM'], ascending=[True,True,True,False,False])
df['Rank'] = df.groupby(['S','L','C']).cumcount() + 1

OR:

df = df.convert_dtypes()
def rank_count(x):
    x = x.sort_values(['CTR','IM'], ascending=False).reset_index(drop=True)
    x['Rank'] = x.index + 1
    return x
result = df.groupby(['S','L','C'], sort=False).apply(rank_count).reset_index(drop=True)

Pandas Rank by Multiple Columns