Add a "Rank" Column to a Data Frame

Adding a ranking column to a dataframe

You can compute a ranking from an ordering as follows:

dat$rank <- NA
dat$rank[order.scores] <- 1:nrow(dat)
dat
# fname score rank
# 1 Joe 500 5
# 2 Bob 490 3
# 3 Bill 500 4
# 4 Tom 750 8
# 5 Sue 550 7
# 6 Sam 500 6
# 7 Jane 210 1
# 8 Ruby 320 2

DataFrame - Add a new ranking column

I think this should be like this:

import pandas as pd

Original data:

df = pd.DataFrame({
'fruit': ['Apple', 'Apple', 'Apple', 'Pear', 'Pear', 'Pear', 'Pear', 'Peach', 'Peach'],
'percentage': [23, 99, 50, 45, 87, 67, 70, 93, 75]
})

Output

Sample Image

Create new 'rank' column based on grouping the df dataframe on fruit and rank the value of percentage within the group.

df['rank'] = df.groupby('fruit')['percentage'].rank()

Output:

Sample Image

Pandas rank by column value

Here's one way to do it in Pandas-way

You could groupby on Auction_ID and take rank() on Bid_Price with ascending=False

In [68]: df['Auction_Rank'] = df.groupby('Auction_ID')['Bid_Price'].rank(ascending=False)

In [69]: df
Out[69]:
Auction_ID Bid_Price Auction_Rank
0 123 9 1
1 123 7 2
2 123 6 3
3 123 2 4
4 124 3 1
5 124 2 2
6 124 1 3
7 125 1 1

How to make a rank column in R

Using rank and relocate:

library(dplyr)

df1 %>%
mutate(across(M1:M2, ~ rank(-.x), .names = "{.col}_rank"),
M3_rank = rank(M3)) %>%
relocate(order(colnames(.)))

M1 M1_rank M2 M2_rank M3 M3_rank
1 400 1 500 1 420 4
2 300 2 200 2 330 3
3 200 3 10 4 230 2
4 50 4 100 3 51 1

If you have duplicate values in your vector, then you have to choose a method for ties. By default, you get the average rank, but you can choose "first".

Another possibility, which is I think what you want to do, is to convert to factor and then to numeric, so that you get a only entire values (not the average).

df1 <- data.frame(M1 = c(400,300, 50, 300))
df1 %>%
mutate(M1_rankAverage = rank(-M1),
M1_rankFirst = rank(-M1, ties.method = "first"),
M1_unique = as.numeric(as.factor(rank(-M1))))

M1 M1_rankAverage M1_rankFirst M1_unique
1 400 1.0 1 1
2 300 2.5 2 2
3 50 4.0 4 3
4 300 2.5 3 2

Add a rank column to a data frame

There is a rank function to help you with that:

transform(df, 
year.rank = ave(count, year,
FUN = function(x) rank(-x, ties.method = "first")))
item year count year.rank
1 a 2010 1 3
2 b 2010 4 2
3 c 2010 6 1
4 a 2011 3 2
5 b 2011 8 1
6 c 2011 3 3
7 a 2012 5 3
8 b 2012 7 2
9 c 2012 9 1

In pandas, how to create a new column with a rank according to the mean values of another column

Use groupby + transform for mean and then rank:

df['Average'] = df.groupby('Country')['value'].transform('mean')
df['Rank'] = df['Average'].rank(method='dense', ascending=False)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0

Similar solution:

a = df.groupby('Country')['value'].transform('mean')
b = a.rank(method='dense', ascending=False)

df = df.assign(Average=a, Rank=b)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0

How to add a ranking column for this dataset?

Does this work:

library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(profit)))
# A tibble: 12 x 4
# Groups: country [4]
comp_name country profit rank
<chr> <chr> <dbl> <dbl>
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3

How do I create a 'Rank' column in pandas?

df['Rank'] = df.sort_values(by=['col1', 'col2'], ascending=False) \
.reset_index() \
.sort_values('index') \
.index + 1

This code goes through these steps:

  1. Sorts the data frame into a new temporary one
  2. Resets the index, keeping the old one as a new column called 'index'. Make sure you don't have any starting columns named 'index'. If you do have one, the new column is called level_0. If you have both index and level_0, the code will throw an exception.
  3. Sorts the temporary dataframe according to the old index (so it's in the original order again)
  4. Uses the new index (which is in the rank order we want) as the new 'Rank' column. Plus 1 because your question seems to want ranks that start at 1, not 0

Warning while adding rank column to a pandas dataframe

Because after you filter your dataframe and assign to another variable name even if you use loc you are still using a shallow copy. In order to not get the warning you should copy the dataframe properly like this.

min_100k_volume_df = ohlcv_m30_groupby_df[filter_condition].copy()

ranking dataframe using two columns in R

You can use data.table::frank or dplyr::min_rank:

data.table::frank

dt$Rank <- frank(dt, B, A, ties.method = "min")
dt
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4

dplyr::min_rank

mutate(dt, Rank = min_rank(paste(B,A)))
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4

Data

dt <- data.frame(A = c(1,2,2,4,5), B = c(1,1,1,4,3))


Related Topics



Leave a reply



Submit