## Replacing tied rank by their average

We can just use `rank`

from `base R`

. The default method for `ties.method`

is "average"

`x$freq.Freq <- rank(-x$freq.Freq)`

x$freq.Freq

#[1] 1.0 2.5 2.5 4.0 6.0 6.0 6.0 8.0 9.0

## How to get ranks with no gaps when there are ties among values?

I can think of a quick function to do this. It's not optimal with a for loop but it works:)

`x=c(1,1,2,3,4,5,8,8)`

foo <- function(x){

su=sort(unique(x))

for (i in 1:length(su)) x[x==su[i]] = i

return(x)

}

foo(x)

[1] 1 1 2 3 4 5 6 6

## rank and order in R

`set.seed(1)`

x <- sample(1:50, 30)

x

# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8

rank(x)

# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5

order(x)

# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23

`rank`

returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. `order`

returns the indices that would put the initial vector `x`

in order.

The 27th value of `x`

is the lowest, so `27`

is the first element of `order(x)`

- and if you look at `rank(x)`

, the 27th element is `1`

.

`x[order(x)]`

# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49

## Efficient method to calculate the rank vector of a list in Python

Using scipy, the function you are looking for is `scipy.stats.rankdata`

:

`In [13]: import scipy.stats as ss`

In [19]: ss.rankdata([3, 1, 4, 15, 92])

Out[19]: array([ 2., 1., 3., 4., 5.])

In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])

Out[20]: array([ 1., 2., 4., 4., 4., 6., 7.])

The ranks start at 1, rather than 0 (as in your example), but then again, that's the way `R`

's `rank`

function works as well.

Here is a pure-python equivalent of `scipy`

's rankdata function:

`def rank_simple(vector):`

return sorted(range(len(vector)), key=vector.__getitem__)

def rankdata(a):

n = len(a)

ivec=rank_simple(a)

svec=[a[rank] for rank in ivec]

sumranks = 0

dupcount = 0

newarray = [0]*n

for i in xrange(n):

sumranks += i

dupcount += 1

if i==n-1 or svec[i] != svec[i+1]:

averank = sumranks / float(dupcount) + 1

for j in xrange(i-dupcount+1,i+1):

newarray[ivec[j]] = averank

sumranks = 0

dupcount = 0

return newarray

print(rankdata([3, 1, 4, 15, 92]))

# [2.0, 1.0, 3.0, 4.0, 5.0]

print(rankdata([1, 2, 3, 3, 3, 4, 5]))

# [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]

## create a mean rank for a rank-frequency data.frame by R

sure, just group by frequency

`library(dplyr)`

#>

#> Attaching package: 'dplyr'

#> The following objects are masked from 'package:stats':

#>

#> filter, lag

#> The following objects are masked from 'package:base':

#>

#> intersect, setdiff, setequal, union

dt <-data.frame(frequency=c(64,58,54,32,29,29,25,17,17,15,12,12,10))

dt %>% arrange(desc(frequency))%>%

mutate(rank = row_number()) %>%

group_by(frequency) %>%

mutate(mean_rank = mean(rank)) %>%

ungroup()

#> # A tibble: 13 × 3

#> frequency rank mean_rank

#> <dbl> <int> <dbl>

#> 1 64 1 1

#> 2 58 2 2

#> 3 54 3 3

#> 4 32 4 4

#> 5 29 5 5.5

#> 6 29 6 5.5

#> 7 25 7 7

#> 8 17 8 8.5

#> 9 17 9 8.5

#> 10 15 10 10

#> 11 12 11 11.5

#> 12 12 12 11.5

#> 13 10 13 13

## R: Rank-function with two variables and ties.method random

Since `order(order(x))`

gives the same result as `rank(x)`

(see Why does order(order(x)) equal rank(x) in R?), you could just do

`order(order(y, z, runif(length(y))))`

to get the rank values.

Here's a more involved approach that allows you to use methods from `ties.method`

. It requires `dplyr`

:

`library(dplyr)`

rank2 <- function(df, key1, key2, ties.method) {

average <- function(x) mean(x)

random <- function(x) sample(x, length(x))

df$r <- order(order(df[[key1]], df[[key2]]))

group_by_(df, key1, key2) %>% mutate(rr = get(ties.method)(r))

}

rank2(df, "y", "z", "average")

# Source: local data frame [10 x 5]

# Groups: y, z [8]

# x y z r rr

# <dbl> <dbl> <dbl> <int> <dbl>

# 1 1 1 0.2 1 1.0

# 2 2 4 0.8 6 6.0

# 3 3 5 0.5 8 8.0

# 4 4 5 0.4 7 7.0

# 5 5 2 0.2 3 3.0

# 6 6 8 0.1 9 9.5

# 7 7 8 0.1 10 9.5

# 8 8 1 0.7 2 2.0

# 9 9 3 0.3 4 4.5

# 10 10 3 0.3 5 4.5

## Create ranking for vector of double

One way to do so would be using a `multimap`

.

Place the items in a multimap mapping your objects to

`size_t`

s (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).Loop (either plainly or using whatever from

`algorithm`

) and assign 0, 1, ... as the values.Loop over the distinct keys. For each distinct key, call

`equal_range`

for the key, and set its values to the average (again, you can use stuff from`algorithm`

for this).

The overall complexity should be *Theta(n log(n))*, where *n* is the length of the vector.

## replace subset of vector values with subset average

This is my attempt. I first calculate the average rank, then split the subjects of the same rank into rows.

`library(tidyverse)`

options(stringsAsFactors = FALSE)

subj <- c("A", "B", "C,D,E", "C,D,E", "C,D,E", "F", "G,H", "G,H", "I")

rank <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

df <- data.frame(rank, subj)

df %>%

group_by(subj) %>%

summarise(rank = mean(rank)) %>%

rowwise() %>%

do(tibble(subj = unlist(strsplit(.$subj, ",")), rank = .$rank)) %>%

ungroup()

Output:

`# A tibble: 9 × 2`

subj rank

* <chr> <dbl>

1 A 1.0

2 B 2.0

3 C 4.0

4 D 4.0

5 E 4.0

6 F 6.0

7 G 7.5

8 H 7.5

9 I 9.0

Another approach:

`m <- aggregate(rank~subj, data=df, mean)`

m <- apply(m, 1, function(x) data.frame(subj = unlist(strsplit(x[1], ",")), rank = x[2]))

m <- do.call(rbind, m)

rownames(m) <- NULL

m

Output:

`subj rank`

1 A 1.0

2 B 2.0

3 C 4.0

4 D 4.0

5 E 4.0

6 F 6.0

7 G 7.5

8 H 7.5

9 I 9.0

