Adding a Ranking Column to a Dataframe

Adding a ranking column to a dataframe

You can compute a ranking from an ordering as follows:

dat$rank <- NA
dat$rank[order.scores] <- 1:nrow(dat)
dat
# fname score rank
# 1 Joe 500 5
# 2 Bob 490 3
# 3 Bill 500 4
# 4 Tom 750 8
# 5 Sue 550 7
# 6 Sam 500 6
# 7 Jane 210 1
# 8 Ruby 320 2

DataFrame - Add a new ranking column

I think this should be like this:

import pandas as pd

Original data:

df = pd.DataFrame({
'fruit': ['Apple', 'Apple', 'Apple', 'Pear', 'Pear', 'Pear', 'Pear', 'Peach', 'Peach'],
'percentage': [23, 99, 50, 45, 87, 67, 70, 93, 75]
})

Output

Sample Image

Create new 'rank' column based on grouping the df dataframe on fruit and rank the value of percentage within the group.

df['rank'] = df.groupby('fruit')['percentage'].rank()

Output:

Sample Image

Pandas rank by column value

Here's one way to do it in Pandas-way

You could groupby on Auction_ID and take rank() on Bid_Price with ascending=False

In [68]: df['Auction_Rank'] = df.groupby('Auction_ID')['Bid_Price'].rank(ascending=False)

In [69]: df
Out[69]:
Auction_ID Bid_Price Auction_Rank
0 123 9 1
1 123 7 2
2 123 6 3
3 123 2 4
4 124 3 1
5 124 2 2
6 124 1 3
7 125 1 1

More efficient way to rank columns in a dataframe

You can use the Numba JIT to compute this more efficiently and in parallel. The idea is to compute the rank of each column in parallel. Here is the resulting code:

# Equivalent of df.rank(ascending=False, method='min')
@nb.njit('int32[:,:](int32[:,:])', parallel=True)
def fastRanks(df):
n, m = df.shape
res = np.empty((n, m), dtype=np.int32)

for col in nb.prange(m):
dfCol = -df[:, col]
order = np.argsort(dfCol)

# Compute the ranks with the min method
if n > 0:
prevVal = dfCol[order[0]]
prevRank = 1
res[order[0], col] = 1

for row in range(1, n):
curVal = dfCol[order[row]]
if curVal == prevVal:
res[order[row], col] = prevRank
else:
res[order[row], col] = row + 1
prevVal = curVal
prevRank = row + 1

return res

df = pd.DataFrame(np.random.randint(0, 500, size=(500, 1000)), columns=list(range(0, 1000)))
ranking = pd.DataFrame(range(0, 500), columns=['Lineup'])
ranking = pd.concat([ranking, pd.DataFrame(fastRanks(df[range(0, 1000)].to_numpy()))], axis=1)

On my 6-core machine, the computation of the ranks is about 7 times faster. The overall computation is bounded by the slow pd.concat.

You can further improve the speed of the overall computation by building directly the output of fastRanks with the "Lineup" column. The names of the dataframe columns have to be set manually from the Numpy array produced by the Numba function. Note that this optimization require all the columns to be of the same type, which is the case in your example.

Note that the ranks are of types int32 in this solution for sake of performance (since float64 are not needed here).

How to make a rank column in R

Using rank and relocate:

library(dplyr)

df1 %>%
mutate(across(M1:M2, ~ rank(-.x), .names = "{.col}_rank"),
M3_rank = rank(M3)) %>%
relocate(order(colnames(.)))

M1 M1_rank M2 M2_rank M3 M3_rank
1 400 1 500 1 420 4
2 300 2 200 2 330 3
3 200 3 10 4 230 2
4 50 4 100 3 51 1

If you have duplicate values in your vector, then you have to choose a method for ties. By default, you get the average rank, but you can choose "first".

Another possibility, which is I think what you want to do, is to convert to factor and then to numeric, so that you get a only entire values (not the average).

df1 <- data.frame(M1 = c(400,300, 50, 300))
df1 %>%
mutate(M1_rankAverage = rank(-M1),
M1_rankFirst = rank(-M1, ties.method = "first"),
M1_unique = as.numeric(as.factor(rank(-M1))))

M1 M1_rankAverage M1_rankFirst M1_unique
1 400 1.0 1 1
2 300 2.5 2 2
3 50 4.0 4 3
4 300 2.5 3 2

Add a ranking ordered column to a Pandas Dataframe

You can try:

df['rank'] = df.index + 1

print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8

Or use rank with parameter ascending=False:

df['rank'] = df['conference'].rank(ascending=False)
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8

Add a rank column to a data frame

There is a rank function to help you with that:

transform(df, 
year.rank = ave(count, year,
FUN = function(x) rank(-x, ties.method = "first")))
item year count year.rank
1 a 2010 1 3
2 b 2010 4 2
3 c 2010 6 1
4 a 2011 3 2
5 b 2011 8 1
6 c 2011 3 3
7 a 2012 5 3
8 b 2012 7 2
9 c 2012 9 1

Add a Rank column to MultiIndex Dataframe

You can use groupby():

df['RANK'] = df.groupby(['latitude','longitude'])['FFDI'].rank(ascending=False)

Update per comment, you can try:

df['RANK'] = (df.sort_values(['FFDI','Time'], ascending=[False,True])
.groupby(['latitude','longitude']).cumcount() + 1
)

You can also try to pass method='first' to rank on the original answer, given the Time is sorted.

Adding multiple ranking columns in a dataframe in R

We can loop across the columns 'home' to 'work', apply the rank, while creating new column by adding prefix in .names, and probably select to keep the order

library(dplyr)
df1 <- df %>%
mutate(across(home:work, ~ rank(-.), .names = "rank_{.col}"))

Or may do this in a loop where it is more flexible in placing the column at a particular position by specifying either .after or .before. Note that we used compound assignment operator (%<>% from magrittr) to do the assignment in place

library(magrittr)
library(stringr)
for(nm in names(df)[4:6]) df %<>%
mutate(!!str_c("rank_", nm) := rank(-.data[[nm]]), .after = all_of(nm))

-output

df
id city uf home rank_home money rank_money work rank_work
1 34 LA RJ 10 1 2 6 2 6
2 33 BA TY 7 2 3 5 65 1
3 32 NY BN 4 4 5 4 4 5
4 12 SP SD 3 5 9 2 7 4
5 14 FR DE 1 6 8 3 9 2
6 17 BL DE 5 3 10 1 8 3

NOTE: If the column have ties, then the default method use is "average". So, ties.method can also be an argument in the rank where there are ties.

data

df <- structure(list(id = c(34L, 33L, 32L, 12L, 14L, 17L), city = c("LA", 
"BA", "NY", "SP", "FR", "BL"), uf = c("RJ", "TY", "BN", "SD",
"DE", "DE"), home = c(10L, 7L, 4L, 3L, 1L, 5L), money = c(2L,
3L, 5L, 9L, 8L, 10L), work = c(2L, 65L, 4L, 7L, 9L, 8L)),
class = "data.frame", row.names = c(NA,
-6L))

How to add a ranking column for this dataset?

Does this work:

library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(profit)))
# A tibble: 12 x 4
# Groups: country [4]
comp_name country profit rank
<chr> <chr> <dbl> <dbl>
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3


Related Topics



Leave a reply



Submit