Rank Vector with Some Equal Values

Rank vector with some equal values

Convert to factor and back to numeric

as.numeric(as.factor(rank(-x)))
#[1] 6 1 5 3 3 2 4

how to rank values in a vector and give them corresponding values?

That's more clear. Hence :

> vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
> cbind(vect,as.numeric(factor(vect)))
[1,] 41 12
[2,] 42 13
[3,] 5 5
[4,] 6 6
[5,] 3 3
[6,] 12 8
[7,] 10 7
[8,] 15 10
[9,] 2 2
[10,] 3 3
[11,] 4 4
[12,] 13 9
[13,] 2 2
[14,] 33 11
[15,] 4 4
[16,] 1 1
[17,] 1 1

No sort needed. And as said, see also ?factor

and if you want to preserve the order, then:

> cbind(vect,as.numeric(factor(vect,levels=unique(vect))))
vect
[1,] 41 1
[2,] 42 2
[3,] 5 3
[4,] 6 4
[5,] 3 5
[6,] 12 6
[7,] 10 7
[8,] 15 8
[9,] 2 9
[10,] 3 5
[11,] 4 10
[12,] 13 11
[13,] 2 9
[14,] 33 12
[15,] 4 10
[16,] 1 13
[17,] 1 13

Looking for an FP ranking implementation which handles ties (i.e. equal values)

This works well for me:

// scala
val vs = Vector(1, 1, 3, 3, 3, 5, 6)
val rank = vs.distinct.zipWithIndex.toMap
val result = vs.map(i => (rank(i), i))

The same in Java 8 using Javaslang:

// java(slang)
Vector<Integer> vs = Vector(1, 1, 3, 3, 3, 5, 6);
Function<Integer, Integer> rank = vs.distinct().zipWithIndex().toMap(t -> t);
Vector<Tuple2<Integer, Integer>> result = vs.map(i -> Tuple(rank.apply(i), i));

The output of both variants is

Vector((0, 1), (0, 1), (1, 3), (1, 3), (1, 3), (2, 5), (3, 6))

*) Disclosure: I'm the creator of Javaslang

Create ranking for vector of double

One way to do so would be using a multimap.

  • Place the items in a multimap mapping your objects to size_ts (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).

  • Loop (either plainly or using whatever from algorithm) and assign 0, 1, ... as the values.

  • Loop over the distinct keys. For each distinct key, call equal_range for the key, and set its values to the average (again, you can use stuff from algorithm for this).

The overall complexity should be Theta(n log(n)), where n is the length of the vector.

How to get ranks with no gaps when there are ties among values?

I can think of a quick function to do this. It's not optimal with a for loop but it works:)

x=c(1,1,2,3,4,5,8,8)

foo <- function(x){
su=sort(unique(x))
for (i in 1:length(su)) x[x==su[i]] = i
return(x)
}

foo(x)

[1] 1 1 2 3 4 5 6 6

Efficient method to calculate the rank vector of a list in Python

Using scipy, the function you are looking for is scipy.stats.rankdata:

In [13]: import scipy.stats as ss
In [19]: ss.rankdata([3, 1, 4, 15, 92])
Out[19]: array([ 2., 1., 3., 4., 5.])

In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
Out[20]: array([ 1., 2., 4., 4., 4., 6., 7.])

The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R's rank function works as well.

Here is a pure-python equivalent of scipy's rankdata function:

def rank_simple(vector):
return sorted(range(len(vector)), key=vector.__getitem__)

def rankdata(a):
n = len(a)
ivec=rank_simple(a)
svec=[a[rank] for rank in ivec]
sumranks = 0
dupcount = 0
newarray = [0]*n
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray

print(rankdata([3, 1, 4, 15, 92]))
# [2.0, 1.0, 3.0, 4.0, 5.0]
print(rankdata([1, 2, 3, 3, 3, 4, 5]))
# [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]


Related Topics



Leave a reply



Submit