Rank Variable by Group (Dplyr)

Rank variable by group (dplyr)

The following produces the desired result as was specified.

library(dplyr)

by_species <- iris %>% arrange(Species, Sepal.Length) %>%
group_by(Species) %>%
mutate(rank = rank(Sepal.Length, ties.method = "first"))

by_species %>% filter(rank <= 3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
## (dbl) (dbl) (dbl) (dbl) (fctr) (int)
##1 4.3 3.0 1.1 0.1 setosa 1
##2 4.4 2.9 1.4 0.2 setosa 2
##3 4.4 3.0 1.3 0.2 setosa 3
##4 4.9 2.4 3.3 1.0 versicolor 1
##5 5.0 2.0 3.5 1.0 versicolor 2
##6 5.0 2.3 3.3 1.0 versicolor 3
##7 4.9 2.5 4.5 1.7 virginica 1
##8 5.6 2.8 4.9 2.0 virginica 2
##9 5.7 2.5 5.0 2.0 virginica 3

by_species %>% slice(1:3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species rank
## (dbl) (dbl) (dbl) (dbl) (fctr) (int)
##1 4.3 3.0 1.1 0.1 setosa 1
##2 4.4 2.9 1.4 0.2 setosa 2
##3 4.4 3.0 1.3 0.2 setosa 3
##4 4.9 2.4 3.3 1.0 versicolor 1
##5 5.0 2.0 3.5 1.0 versicolor 2
##6 5.0 2.3 3.3 1.0 versicolor 3
##7 4.9 2.5 4.5 1.7 virginica 1
##8 5.6 2.8 4.9 2.0 virginica 2
##9 5.7 2.5 5.0 2.0 virginica 3

Rank subgroup by group (dplyr)

We could use match after grouping

library(dplyr)
my_df %>%
group_by(var1) %>%
mutate(group_rank = match(var2, unique(var2))) %>%
ungroup

-output

# A tibble: 20 x 3
var1 var2 group_rank
<chr> <chr> <int>
1 A long_string_x 1
2 A long_string_x 1
3 A long_string_x 1
4 A long_string_x 1
5 A long_string_y 2
6 A long_string_y 2
7 A long_string_y 2
8 A long_string_y 2
9 B long_string_x 1
10 B long_string_x 1
11 B long_string_x 1
12 B long_string_x 1
13 B long_string_y 2
14 B long_string_y 2
15 B long_string_y 2
16 B long_string_y 2
17 B long_string_z 3
18 B long_string_z 3
19 B long_string_z 3
20 B long_string_z 3

Apply a rank across groups

You could try

library(dplyr)

data %>%
group_by(Grp) %>%
mutate(Rank = Value[which.max(YEAR)]) %>%
ungroup() %>%
mutate(Rank = dense_rank(-Rank))

# YEAR Grp Value Rank
# 1 2020 A 25 3
# 2 2019 A 24 3
# 3 2020 B 35 2
# 4 2019 B 34 2
# 5 2020 C 45 1
# 6 2019 C 44 1

Add a grouping variable based on ranked data

We can use cumsum to create the index

library(dplyr)
df %>%
mutate(event = c("Hurdles", "Long Jump")[cumsum(rank == 1)])
# name rank event
#1 Sally 1 Hurdles
#2 Dave 2 Hurdles
#3 Aaron 1 Long Jump
#4 Jane 2 Long Jump
#5 Michael 3 Long Jump

Or in base R (just in case)

df$event <- c("Hurdles", "Long Jump")[cumsum(df$rank == 1)])

Create a ranking variable with dplyr?

It sounds like you're looking for dense_rank from "dplyr" -- but applied in a reverse order than what rank normally does.

Try this:

df %>% mutate(rank = dense_rank(desc(score)))
# name score rank
# 1 A 10 1
# 2 B 10 1
# 3 C 9 2
# 4 D 8 3

R data frame rank by groups (group by rank) with package dplyr

Had a similar issue, my answer was sorting on groups and the relevant ranked variable(s) in order to then use row_number() when using group_by.

# Sample dataset
df <- data.frame(group=rep(c("GROUP 1", "GROUP 2"),10),
value=as.integer(rnorm(20, mean=1000, sd=500)))
require(dplyr)
print.data.frame(df[0:10,])
group value
1 GROUP 1 1273
2 GROUP 2 1261
3 GROUP 1 1189
4 GROUP 2 1390
5 GROUP 1 1942
6 GROUP 2 1111
7 GROUP 1 530
8 GROUP 2 893
9 GROUP 1 997
10 GROUP 2 237

sorted <- df %>%
arrange(group, -value) %>%
group_by(group) %>%
mutate(rank=row_number())
print.data.frame(sorted)

group value rank
1 GROUP 1 1942 1
2 GROUP 1 1368 2
3 GROUP 1 1273 3
4 GROUP 1 1249 4
5 GROUP 1 1189 5
6 GROUP 1 997 6
7 GROUP 1 562 7
8 GROUP 1 535 8
9 GROUP 1 530 9
10 GROUP 1 1 10
11 GROUP 2 1472 1
12 GROUP 2 1390 2
13 GROUP 2 1281 3
14 GROUP 2 1261 4
15 GROUP 2 1111 5
16 GROUP 2 893 6
17 GROUP 2 774 7
18 GROUP 2 669 8
19 GROUP 2 631 9
20 GROUP 2 237 10

ranking with dplyr between groups

After ungrouping, use dense_rank

d %>% 
group_by(group2) %>%
mutate(total_value = sum(value)) %>%
arrange(-total_value) %>%
ungroup %>%
mutate( rank = dense_rank(-total_value) )
# A tibble: 4 x 5
# group1 group2 value total_value rank
# <fct> <fct> <dbl> <dbl> <int>
#1 B f 2 6 1
#2 B f 4 6 1
#3 A e 1 4 2
#4 A e 3 4 2

How to rank within groups in R?

You can do this pretty cleanly with dplyr

library(dplyr)
df %>%
group_by(customer_name) %>%
mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1

R: Get ranking of factor levels by group

Use dplyr::dense_rank, or as.numeric(factor(Days, ordered = T)) in base R:

df %>% 
group_by(Number) %>%
mutate(Ranking = dense_rank(Days),
Ranking2 = as.numeric(factor(Days, ordered = T)))

output

# A tibble: 15 × 4
# Groups: Number [3]
Number Days Ranking Ranking2
<dbl> <dbl> <int> <dbl>
1 1 5 1 1
2 1 5 1 1
3 1 10 2 2
4 1 10 2 2
5 1 15 3 3
6 2 3 1 1
7 2 3 1 1
8 2 3 1 1
9 2 5 2 2
10 2 5 2 2
11 3 11 1 1
12 3 11 1 1
13 3 13 2 2
14 3 13 2 2
15 3 13 2 2


Related Topics



Leave a reply



Submit