How to emulate SQLs rank functions in R?
The data.table
package, especially starting with version 1.8.1, offers much of the functionality of partition in SQL terms. rank(x, ties.method = "min")
in R is similar to Oracle RANK()
, and there's a way using factors (described below) to mimic the DENSE_RANK()
function. A way to mimic ROW_NUMBER
should be obvious by the end.
Here's an example: Load the latest version of data.table
from R-Forge:
install.packages("data.table",
repos= c("http://R-Forge.R-project.org", getOption("repos")))
library(data.table)
Create some example data:
set.seed(10)
DT<-data.table(ID=seq_len(4*3),group=rep(1:4,each=3),value=rnorm(4*3),
info=c(sample(c("a","b"),4*2,replace=TRUE),
sample(c("c","d"),4,replace=TRUE)),key="ID")
> DT
ID group value info
1: 1 1 0.01874617 a
2: 2 1 -0.18425254 b
3: 3 1 -1.37133055 b
4: 4 2 -0.59916772 a
5: 5 2 0.29454513 b
6: 6 2 0.38979430 a
7: 7 3 -1.20807618 b
8: 8 3 -0.36367602 a
9: 9 3 -1.62667268 c
10: 10 4 -0.25647839 d
11: 11 4 1.10177950 c
12: 12 4 0.75578151 d
Rank each ID
by decreasing value
within group
(note the -
in front of value
to denote decreasing order):
> DT[,valRank:=rank(-value),by="group"]
ID group value info valRank
1: 1 1 0.01874617 a 1
2: 2 1 -0.18425254 b 2
3: 3 1 -1.37133055 b 3
4: 4 2 -0.59916772 a 3
5: 5 2 0.29454513 b 2
6: 6 2 0.38979430 a 1
7: 7 3 -1.20807618 b 2
8: 8 3 -0.36367602 a 1
9: 9 3 -1.62667268 c 3
10: 10 4 -0.25647839 d 3
11: 11 4 1.10177950 c 1
12: 12 4 0.75578151 d 2
For DENSE_RANK()
with ties in the value being ranked, you could convert the value to a factor and then return the underlying integer values. For example, ranking each ID
based on info
within group
(compare infoRank
with infoRankDense
):
DT[,infoRank:=rank(info,ties.method="min"),by="group"]
DT[,infoRankDense:=as.integer(factor(info)),by="group"]
R> DT
ID group value info valRank infoRank infoRankDense
1: 1 1 0.01874617 a 1 1 1
2: 2 1 -0.18425254 b 2 2 2
3: 3 1 -1.37133055 b 3 2 2
4: 4 2 -0.59916772 a 3 1 1
5: 5 2 0.29454513 b 2 3 2
6: 6 2 0.38979430 a 1 1 1
7: 7 3 -1.20807618 b 2 2 2
8: 8 3 -0.36367602 a 1 1 1
9: 9 3 -1.62667268 c 3 3 3
10: 10 4 -0.25647839 d 3 2 2
11: 11 4 1.10177950 c 1 1 1
12: 12 4 0.75578151 d 2 2 2
p.s. Hi Matthew Dowle.
LEAD and LAG
For imitating LEAD and LAG, start with the answer provided here. I would create a rank variable based on the order of IDs within groups. This wouldn't be necessary with the fake data as above, but if the IDs are not in sequential order within groups, then this would make life a bit more difficult. So here's some new fake data with non-sequential IDs:
set.seed(10)
DT<-data.table(ID=sample(seq_len(4*3)),group=rep(1:4,each=3),value=rnorm(4*3),
info=c(sample(c("a","b"),4*2,replace=TRUE),
sample(c("c","d"),4,replace=TRUE)),key="ID")
DT[,idRank:=rank(ID),by="group"]
setkey(DT,group, idRank)
> DT
ID group value info idRank
1: 4 1 -0.36367602 b 1
2: 5 1 -1.62667268 b 2
3: 7 1 -1.20807618 b 3
4: 1 2 1.10177950 a 1
5: 2 2 0.75578151 a 2
6: 12 2 -0.25647839 b 3
7: 3 3 0.74139013 c 1
8: 6 3 0.98744470 b 2
9: 9 3 -0.23823356 a 3
10: 8 4 -0.19515038 c 1
11: 10 4 0.08934727 c 2
12: 11 4 -0.95494386 c 3
Then to get the values of the previous 1 record, use the group
and idRank
variables and subtract 1
from the idRank
and use the multi = 'last'
argument. To get the value from the record two entries above, subtract 2
.
DT[,prev:=DT[J(group,idRank-1), value, mult='last']]
DT[,prev2:=DT[J(group,idRank-2), value, mult='last']]
ID group value info idRank prev prev2
1: 4 1 -0.36367602 b 1 NA NA
2: 5 1 -1.62667268 b 2 -0.36367602 NA
3: 7 1 -1.20807618 b 3 -1.62667268 -0.3636760
4: 1 2 1.10177950 a 1 NA NA
5: 2 2 0.75578151 a 2 1.10177950 NA
6: 12 2 -0.25647839 b 3 0.75578151 1.1017795
7: 3 3 0.74139013 c 1 NA NA
8: 6 3 0.98744470 b 2 0.74139013 NA
9: 9 3 -0.23823356 a 3 0.98744470 0.7413901
10: 8 4 -0.19515038 c 1 NA NA
11: 10 4 0.08934727 c 2 -0.19515038 NA
12: 11 4 -0.95494386 c 3 0.08934727 -0.1951504
For LEAD, add the appropriate offset to the idRank
variable and switch to multi = 'first'
:
DT[,nex:=DT[J(group,idRank+1), value, mult='first']]
DT[,nex2:=DT[J(group,idRank+2), value, mult='first']]
ID group value info idRank prev prev2 nex nex2
1: 4 1 -0.36367602 b 1 NA NA -1.62667268 -1.2080762
2: 5 1 -1.62667268 b 2 -0.36367602 NA -1.20807618 NA
3: 7 1 -1.20807618 b 3 -1.62667268 -0.3636760 NA NA
4: 1 2 1.10177950 a 1 NA NA 0.75578151 -0.2564784
5: 2 2 0.75578151 a 2 1.10177950 NA -0.25647839 NA
6: 12 2 -0.25647839 b 3 0.75578151 1.1017795 NA NA
7: 3 3 0.74139013 c 1 NA NA 0.98744470 -0.2382336
8: 6 3 0.98744470 b 2 0.74139013 NA -0.23823356 NA
9: 9 3 -0.23823356 a 3 0.98744470 0.7413901 NA NA
10: 8 4 -0.19515038 c 1 NA NA 0.08934727 -0.9549439
11: 10 4 0.08934727 c 2 -0.19515038 NA -0.95494386 NA
12: 11 4 -0.95494386 c 3 0.08934727 -0.1951504 NA NA
How to get relative rankings of numeric elements in a list or vector in R?
I think you are looking for dplyr::dense_rank()
:
# Example 1
dplyr::dense_rank(c(1, 1, 1, 3, 1, 4, 1))
#> [1] 1 1 1 2 1 3 1
# Example 2
dplyr::dense_rank(c(4, 1, 1, 1, 3, 5, 1))
#> [1] 3 1 1 1 2 4 1
# Example in code
dplyr::dense_rank(c(1, 1, 1, 2, 1, 3, 1))
#> [1] 1 1 1 2 1 3 1
How to use Spotfire Dense Rank function in R and How to Implement it R?
Please Try
dense_rank
function fromdplyr
package
MainData$MeasureRank <- dense_rank(MainData$Number_Measure)
Rank doesn't start at 1 in R
You need to use dense_rank.
test <- data.frame(column1 = c(5,5,5,6,6,7,7,7,8))
test$rank <- dplyr::dense_rank(test$column1)
Working of window ranking function
test %>% rename(input = column1) %>%
mutate(row_num_output = row_number(input),
rank_output = min_rank(input),
dense_rank_output = dense_rank(input))
Output to give a better understanding for your input
ranking dataframe using two columns in R
You can use data.table::frank
or dplyr::min_rank
:
data.table::frank
dt$Rank <- frank(dt, B, A, ties.method = "min")
dt
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4
dplyr::min_rank
mutate(dt, Rank = min_rank(paste(B,A)))
A B Rank
1 1 1 1
2 2 1 2
3 2 1 2
4 4 4 5
5 5 3 4
Data
dt <- data.frame(A = c(1,2,2,4,5), B = c(1,1,1,4,3))
Calculate rank based on several columns, with a precedence rule
frank
and frankv
in data.table
"accepts vectors, lists, data.frames or data.tables as input", which can be useful here.
First, frankv
. It has a cols
argument where columns to be ranked can be specified in a character vector - convenient if there are many column names which need to be generated programmatically. It also has a neat order
argument.
library(data.table)
setDT(df)
df[ , Var34 := Var3 + Var4]
cols = c("Var1", "Var2", "Var34")
df[ , r := frankv(.SD, cols, order = -1L, ties.method = "dense")]
df[ , Var34 := NULL]
# Var1 Var2 Var3 Var4 r
# 1: 0 0 0 0 12
# 2: 1 0 0 0 6
# 3: 0 1 0 0 9
# 4: 1 1 0 0 3
# 5: 0 0 1 0 11
# 6: 1 0 1 0 5
# 7: 0 1 1 0 8
# 8: 1 1 1 0 2
# 9: 0 0 0 1 11
# 10: 1 0 0 1 5
# 11: 0 1 0 1 8
# 12: 1 1 0 1 2
# 13: 0 0 1 1 10
# 14: 1 0 1 1 4
# 15: 0 1 1 1 7
# 16: 1 1 1 1 1
frank
is handy for interactive use:
df[ , r := frank(.SD, -Var1, -Var2, -Var34, ties.method = "dense")]
Related answers: How to emulate SQLs rank functions in R?; Rank based on several variables
How to get consecutive rank for multiple variables
Well, an easy way would be to convert to factor
and then integer
df[] <- lapply(df, function(x) as.integer(factor(x)))
df
# var x y z
#G1 1 1 2 1
#G2 2 1 3 2
#G3 3 2 1 3
#G4 4 3 1 2
#G5 5 3 2 4
Rank function gives incorrect Ranking
Perhaps you just want the ranking by dcount
. If so:
select distinct a.pattern, convert(numeric(18,0),
coalesce(b.DCount, 0) as DCount,
dense_rank() over (order by b.[DCount] desc) as [Rank]
from a cross join b;
Simulate rank in mysql (without rank) with two conditions
SQL DEMO
SELECT player, wins, diff, dense_rank, rank, dense_val, prev_wins, prev_diff
FROM
(
SELECT player,
wins,
diff,
@dense_rank := IF(wins = @prev_wins and diff = @prev_diff, @dense_rank, @dense_rank + @dense_val ) AS dense_rank,
@dense_val := IF(wins = @prev_wins and diff = @prev_diff, @dense_val + 1 , 1) as dense_val,
@rank := @rank + 1 as rank,
@prev_wins := wins as prev_wins,
@prev_diff := diff as prev_diff
FROM tmpPoradi,(SELECT @dense_rank := 0, @dense_val := 1, @rank := 0, @prev_wins := 0, @prev_diff := 0) r
ORDER BY wins DESC,diff DESC
) rt
ORDER BY rank
OUTPUT
Related Topics
MySQL::Error: Specified Key Was Too Long; Max Key Length Is 1000 Bytes
Sqlite - How to Join Tables from Different Databases
Simulate Create Database If Not Exists For Postgresql
Where Value in Column Containing Comma Delimited Values
Sql: Between VS ≪= and ≫=
Is There Something Wrong With Joins That Don't Use the Join Keyword in SQL or MySQL
MySQL Not Using Indexes With Where in Clause
What Is Sysname Data Type in SQL Server
Safely Rename Tables Using Serial Primary Key Columns
Pivot Rows to Columns Without Aggregate
What's the Difference Between Varchar and Char
How to Turn Identity_Insert on and Off Using SQL Server 2008
How to Return the Column Names of a Table