Adding a ranking column to a dataframe
You can compute a ranking from an ordering as follows:
dat$rank <- NA
dat$rank[order.scores] <- 1:nrow(dat)
dat
# fname score rank
# 1 Joe 500 5
# 2 Bob 490 3
# 3 Bill 500 4
# 4 Tom 750 8
# 5 Sue 550 7
# 6 Sam 500 6
# 7 Jane 210 1
# 8 Ruby 320 2
DataFrame - Add a new ranking column
I think this should be like this:
import pandas as pd
Original data:
df = pd.DataFrame({
'fruit': ['Apple', 'Apple', 'Apple', 'Pear', 'Pear', 'Pear', 'Pear', 'Peach', 'Peach'],
'percentage': [23, 99, 50, 45, 87, 67, 70, 93, 75]
})
Output
Create new 'rank' column based on grouping the df
dataframe on fruit
and rank
the value of percentage
within the group.
df['rank'] = df.groupby('fruit')['percentage'].rank()
Output:
Pandas rank by column value
Here's one way to do it in Pandas-way
You could groupby
on Auction_ID
and take rank()
on Bid_Price
with ascending=False
In [68]: df['Auction_Rank'] = df.groupby('Auction_ID')['Bid_Price'].rank(ascending=False)
In [69]: df
Out[69]:
Auction_ID Bid_Price Auction_Rank
0 123 9 1
1 123 7 2
2 123 6 3
3 123 2 4
4 124 3 1
5 124 2 2
6 124 1 3
7 125 1 1
How to make a rank column in R
Using rank
and relocate
:
library(dplyr)
df1 %>%
mutate(across(M1:M2, ~ rank(-.x), .names = "{.col}_rank"),
M3_rank = rank(M3)) %>%
relocate(order(colnames(.)))
M1 M1_rank M2 M2_rank M3 M3_rank
1 400 1 500 1 420 4
2 300 2 200 2 330 3
3 200 3 10 4 230 2
4 50 4 100 3 51 1
If you have duplicate values in your vector, then you have to choose a method for ties. By default, you get the average rank, but you can choose "first".
Another possibility, which is I think what you want to do, is to convert to factor and then to numeric, so that you get a only entire values (not the average).
df1 <- data.frame(M1 = c(400,300, 50, 300))
df1 %>%
mutate(M1_rankAverage = rank(-M1),
M1_rankFirst = rank(-M1, ties.method = "first"),
M1_unique = as.numeric(as.factor(rank(-M1))))
M1 M1_rankAverage M1_rankFirst M1_unique
1 400 1.0 1 1
2 300 2.5 2 2
3 50 4.0 4 3
4 300 2.5 3 2
Add a rank column to a data frame
There is a rank
function to help you with that:
transform(df,
year.rank = ave(count, year,
FUN = function(x) rank(-x, ties.method = "first")))
item year count year.rank
1 a 2010 1 3
2 b 2010 4 2
3 c 2010 6 1
4 a 2011 3 2
5 b 2011 8 1
6 c 2011 3 3
7 a 2012 5 3
8 b 2012 7 2
9 c 2012 9 1
In pandas, how to create a new column with a rank according to the mean values of another column
Use groupby
+ transform
for mean
and then rank
:
df['Average'] = df.groupby('Country')['value'].transform('mean')
df['Rank'] = df['Average'].rank(method='dense', ascending=False)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
Similar solution:
a = df.groupby('Country')['value'].transform('mean')
b = a.rank(method='dense', ascending=False)
df = df.assign(Average=a, Rank=b)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
How to add a ranking column for this dataset?
Does this work:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(profit)))
# A tibble: 12 x 4
# Groups: country [4]
comp_name country profit rank
<chr> <chr> <dbl> <dbl>
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
How do I create a 'Rank' column in pandas?
df['Rank'] = df.sort_values(by=['col1', 'col2'], ascending=False) \
.reset_index() \
.sort_values('index') \
.index + 1
This code goes through these steps:
- Sorts the data frame into a new temporary one
- Resets the index, keeping the old one as a new column called 'index'. Make sure you don't have any starting columns named 'index'. If you do have one, the new column is called
level_0
. If you have bothindex
andlevel_0
, the code will throw an exception. - Sorts the temporary dataframe according to the old index (so it's in the original order again)
- Uses the new index (which is in the rank order we want) as the new 'Rank' column. Plus 1 because your question seems to want ranks that start at 1, not 0
Warning while adding rank column to a pandas dataframe
Because after you filter your dataframe and assign to another variable name even if you use loc you are still using a shallow copy. In order to not get the warning you should copy the dataframe properly like this.
min_100k_volume_df = ohlcv_m30_groupby_df[filter_condition].copy()
ranking dataframe using two columns in R
You can use data.table::frank
or dplyr::min_rank
:
data.table::frank
dt$Rank <- frank(dt, B, A, ties.method = "min")
dt
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4
dplyr::min_rank
mutate(dt, Rank = min_rank(paste(B,A)))
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4
Data
dt <- data.frame(A = c(1,2,2,4,5), B = c(1,1,1,4,3))
Related Topics
How to Sort Letters in a String
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
Load Multiple Packages at Once
Add New Row to Dataframe, at Specific Row-Index, Not Appended
Using R to List All Files with a Specified Extension
Determine the Data Types of a Data Frame's Columns
How to Wait for a Keypress in R
Options for Caching/Memoization/Hashing in R
How to Run R on a Server Without X11, and Avoid Broken Dependencies
Examples of the Perils of Globals in R and Stata
Remove Backslashes from Character String
How to Create a Marimekko/Mosaic Plot in Ggplot2
Split Up '...' Arguments and Distribute to Multiple Functions
Handling Java.Lang.Outofmemoryerror When Writing to Excel from R