Fastest Way to Find *The Index* of the Second (Third...) Highest/Lowest Value in Vector or Column

Fastest way to find second (third...) highest/lowest value in vector or column

Rfast has a function called nth_element that does exactly what you ask.

Further the methods discussed above that are based on partial sort, don't support finding the k smallest values

Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755

Disclaimer: An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.

Rfast::nth(x, 5, descending = T)

Will return the 5th largest element of x, while

Rfast::nth(x, 5, descending = F)

Will return the 5th smallest element of x

Benchmarks below against most popular answers.

For 10 thousand numbers:

N = 10000
x = rnorm(N)

maxN <- function(x, N=2){
    len <- length(x)
    if(N>len){
        warning('N greater than length(x).  Setting N=length(x)')
        N <- length(x)
    }
    sort(x,partial=len-N+1)[len-N+1]
}

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])

Unit: microseconds
  expr      min       lq      mean   median        uq       max neval
 Rfast  160.364  179.607  202.8024  194.575  210.1830   351.517   100
  maxN  396.419  423.360  559.2707  446.452  487.0775  4949.452   100
 order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148   100

For 1 million numbers:

N = 1e6
x = rnorm(N)

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]]) 

Unit: milliseconds
  expr      min        lq      mean   median        uq       max neval
 Rfast  89.7722  93.63674  114.9893 104.6325  120.5767  204.8839   100
  maxN 150.2822 207.03922  235.3037 241.7604  259.7476  336.7051   100
 order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129   100

Fastest way to find the index of the second (third...) highest/lowest value in vector or column

library Rfast has implemented the nth element function with return index option.

UPDATE (28/FEB/21) package kit offers a faster implementation (topn) as shown in the simulations below.

x <- runif(1e+6)

n <- 2

which_nth_highest_richie <- function(x, n)
{
  for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
  which(x == max(x))
}

which_nth_highest_joris <- function(x, n)
{
  ux <- unique(x)
  nux <- length(ux)
  which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
} 

microbenchmark::microbenchmark(
        topn = kit::topn(x, n,decreasing = T)[n],
        Rfast = Rfast::nth(x,n,descending = T,index.return = T),
        order = order(x, decreasing = TRUE)[n],
        richie = which_nth_highest_richie(x,n),
        joris = which_nth_highest_joris(x,n))

Unit: milliseconds
          expr    min     lq      mean     median       uq      max   neval 
          topn  3.741101 3.7917  4.517201  4.060752  5.108901 7.403901 100
         Rfast  15.8121 16.7586  20.64204  17.73010  20.7083  47.6832  100
         order 110.5416 113.4774 120.45807 116.84005 121.2291 164.5618 100
        richie  22.7846 24.1552  39.35303  27.10075  42.0132  179.289  100
         joris 131.7838 140.4611 158.20704 156.61610 165.1735 243.9258 100

Topn is the clear winner in finding the index of the 2nd biggest value in 1 million numbers.

Futher, simulations where run to estimate running times of finding the nth biggest number for varying n.
Variable x was repopulated for each n but it's size was always 1 million numbers.

Running time of finding the index of the nth biggest element between 1 million numbers.

As shown topn is the best option for finding the nth biggest element and it's index, given that n is not too big. In the plot we can observe that topn becomes slower than Rfast's nth for bigger n.
It is worthy to note that topn has not been implemented for n > 1000 and will throw an error in such cases.

Fastest way to find the index of the second (third...) highest/lowest value in vector or column

library Rfast has implemented the nth element function with return index option.

UPDATE (28/FEB/21) package kit offers a faster implementation (topn) as shown in the simulations below.

x <- runif(1e+6)

n <- 2

which_nth_highest_richie <- function(x, n)
{
  for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
  which(x == max(x))
}

which_nth_highest_joris <- function(x, n)
{
  ux <- unique(x)
  nux <- length(ux)
  which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
} 

microbenchmark::microbenchmark(
        topn = kit::topn(x, n,decreasing = T)[n],
        Rfast = Rfast::nth(x,n,descending = T,index.return = T),
        order = order(x, decreasing = TRUE)[n],
        richie = which_nth_highest_richie(x,n),
        joris = which_nth_highest_joris(x,n))

Unit: milliseconds
          expr    min     lq      mean     median       uq      max   neval 
          topn  3.741101 3.7917  4.517201  4.060752  5.108901 7.403901 100
         Rfast  15.8121 16.7586  20.64204  17.73010  20.7083  47.6832  100
         order 110.5416 113.4774 120.45807 116.84005 121.2291 164.5618 100
        richie  22.7846 24.1552  39.35303  27.10075  42.0132  179.289  100
         joris 131.7838 140.4611 158.20704 156.61610 165.1735 243.9258 100

Topn is the clear winner in finding the index of the 2nd biggest value in 1 million numbers.

Futher, simulations where run to estimate running times of finding the nth biggest number for varying n.
Variable x was repopulated for each n but it's size was always 1 million numbers.

Running time of finding the index of the nth biggest element between 1 million numbers.

How to get the second smallest/largest element in a list

You can use sort, and then an index, to find the n-th smallest element:

sort(test)[n]

For the second smallest element, use n=2:

sort(test)[2]

apply which.max to second, third, etc. highest value

Try using order

> order(x, decreasing =TRUE)
[1] 6 5 2 1 3 4

Find minimum value in an array that is larger than another column in R

You can use the following solution.

Here c(...) refers to all variables in each row of your data set and I chose only those that starts with attack
Then I chose only those values that are greater than the corresponding value of hosp in each row and since you were looking for the first one that is greater than the value of hosp I used first function to extract that
..2 also refers to the value of the second variable hosp in each row

library(dplyr)
library(purrr)

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5]; 
  first(sort(x[x > ..2]))}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9     NA

As an alternative as mentioned by dear Mr. @Greg in a very large data set, we can use min function in place of first(sort)) combination to ensure a faster evaluation time of the following solution. In case there is no value greater than hosp like in the last row min function would return Inf so I made sure that it would return the value 0 instead you can change it with the value you prefer:

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5];
  out <- min(x[x > ..2], na.rm = TRUE);
  if(!is.finite(out)) 0 else out}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9      0

index from one vector to another by closest values

You can use findInterval, which constructs a sequence of intervals given by breakpoints in b and returns the interval indices in which the elements of a are located (see also ?findInterval for additional arguments, such as behavior at interval boundaries).

a = 1:20
b = seq(from = 1, to = 20, by = 5)

findInterval(a, b)
#>  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

Fastest Way to Find The Index of the Second (Third...) Highest/Lowest Value in Vector or Column

Fastest way to find second (third...) highest/lowest value in vector or column

Fastest way to find the index of the second (third...) highest/lowest value in vector or column

Fastest way to find the index of the second (third...) highest/lowest value in vector or column

How to get the second smallest/largest element in a list

apply which.max to second, third, etc. highest value

Find minimum value in an array that is larger than another column in R

index from one vector to another by closest values

Related Topics

Leave a reply

Fastest way to find second (third...) highest/lowest value in vector or column

Fastest way to find *the index* of the second (third...) highest/lowest value in vector or column

Fastest way to find *the index* of the second (third...) highest/lowest value in vector or column

How to get the second smallest/largest element in a list

apply which.max to second, third, etc. highest value

Find minimum value in an array that is larger than another column in R

index from one vector to another by closest values

Related Topics

Leave a reply

Fastest way to find the index of the second (third...) highest/lowest value in vector or column

Fastest way to find the index of the second (third...) highest/lowest value in vector or column