Rank vector with some equal values
Convert to factor and back to numeric
as.numeric(as.factor(rank(-x)))
#[1] 6 1 5 3 3 2 4
how to rank values in a vector and give them corresponding values?
That's more clear. Hence :
> vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
> cbind(vect,as.numeric(factor(vect)))
[1,] 41 12
[2,] 42 13
[3,] 5 5
[4,] 6 6
[5,] 3 3
[6,] 12 8
[7,] 10 7
[8,] 15 10
[9,] 2 2
[10,] 3 3
[11,] 4 4
[12,] 13 9
[13,] 2 2
[14,] 33 11
[15,] 4 4
[16,] 1 1
[17,] 1 1
No sort needed. And as said, see also ?factor
and if you want to preserve the order, then:
> cbind(vect,as.numeric(factor(vect,levels=unique(vect))))
vect
[1,] 41 1
[2,] 42 2
[3,] 5 3
[4,] 6 4
[5,] 3 5
[6,] 12 6
[7,] 10 7
[8,] 15 8
[9,] 2 9
[10,] 3 5
[11,] 4 10
[12,] 13 11
[13,] 2 9
[14,] 33 12
[15,] 4 10
[16,] 1 13
[17,] 1 13
Looking for an FP ranking implementation which handles ties (i.e. equal values)
This works well for me:
// scala
val vs = Vector(1, 1, 3, 3, 3, 5, 6)
val rank = vs.distinct.zipWithIndex.toMap
val result = vs.map(i => (rank(i), i))
The same in Java 8 using Javaslang:
// java(slang)
Vector<Integer> vs = Vector(1, 1, 3, 3, 3, 5, 6);
Function<Integer, Integer> rank = vs.distinct().zipWithIndex().toMap(t -> t);
Vector<Tuple2<Integer, Integer>> result = vs.map(i -> Tuple(rank.apply(i), i));
The output of both variants is
Vector((0, 1), (0, 1), (1, 3), (1, 3), (1, 3), (2, 5), (3, 6))
*) Disclosure: I'm the creator of Javaslang
Create ranking for vector of double
One way to do so would be using a multimap
.
Place the items in a multimap mapping your objects to
size_t
s (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).Loop (either plainly or using whatever from
algorithm
) and assign 0, 1, ... as the values.Loop over the distinct keys. For each distinct key, call
equal_range
for the key, and set its values to the average (again, you can use stuff fromalgorithm
for this).
The overall complexity should be Theta(n log(n)), where n is the length of the vector.
How to get ranks with no gaps when there are ties among values?
I can think of a quick function to do this. It's not optimal with a for loop but it works:)
x=c(1,1,2,3,4,5,8,8)
foo <- function(x){
su=sort(unique(x))
for (i in 1:length(su)) x[x==su[i]] = i
return(x)
}
foo(x)
[1] 1 1 2 3 4 5 6 6
Efficient method to calculate the rank vector of a list in Python
Using scipy, the function you are looking for is scipy.stats.rankdata
:
In [13]: import scipy.stats as ss
In [19]: ss.rankdata([3, 1, 4, 15, 92])
Out[19]: array([ 2., 1., 3., 4., 5.])
In [20]: ss.rankdata([1, 2, 3, 3, 3, 4, 5])
Out[20]: array([ 1., 2., 4., 4., 4., 6., 7.])
The ranks start at 1, rather than 0 (as in your example), but then again, that's the way R
's rank
function works as well.
Here is a pure-python equivalent of scipy
's rankdata function:
def rank_simple(vector):
return sorted(range(len(vector)), key=vector.__getitem__)
def rankdata(a):
n = len(a)
ivec=rank_simple(a)
svec=[a[rank] for rank in ivec]
sumranks = 0
dupcount = 0
newarray = [0]*n
for i in xrange(n):
sumranks += i
dupcount += 1
if i==n-1 or svec[i] != svec[i+1]:
averank = sumranks / float(dupcount) + 1
for j in xrange(i-dupcount+1,i+1):
newarray[ivec[j]] = averank
sumranks = 0
dupcount = 0
return newarray
print(rankdata([3, 1, 4, 15, 92]))
# [2.0, 1.0, 3.0, 4.0, 5.0]
print(rankdata([1, 2, 3, 3, 3, 4, 5]))
# [1.0, 2.0, 4.0, 4.0, 4.0, 6.0, 7.0]
Related Topics
How to Create a Single Dummy Variable with Conditions in Multiple Columns
Rbind Corresponding Elements in Two or More Lists in R
Using Predict() and Table() in R
Complete Time Series by Group in R
My Group by Doesn't Appear to Be Working in Disk Frames
Split Multiple Comma-Separated Column into Separate Rows
Round But .5 Should Be Floored
In Place Modification of Matrices in R
Ggplot2: Have Common Facet Bar in Outer Facet Panel in 3-Way Plot
Error in As.Double(Y):Cannot Coerce Type 'S4' to Vector of Type 'Double'
Copy-On-Modify Semantic on a Vector Does Not Append in a Loop. Why
Character String Is Not in a Standard Unambiguous Format
How to Obtain All Combinations of the Columns of a Data Frame Taken by 2
Cannot Install Library(Xlsx) in R and Look for an Alternative