How to return 5 topmost values from vector in R?
> a <- c(1:100)
> tail(sort(a),5)
[1] 96 97 98 99 100
R: returning the 5 rows with the highest values
We can use rank
mysample$Rank <- rank(-mysample$kWh)
head(mysample[order(mysample$Rank),],5)
if we don't need to create column, directly use order
(as @Jaap mentioned in three alternative methods)
#order descending and get the first 5 rows
head(mysample[order(-mysample$kWh),],5)
#order ascending and get the last 5 rows
tail(mysample[order(mysample$kWh),],5)
#or just use sequence as index to get the rows.
mysample[order(-mysample$kWh),][1:5]
Fastest way to find second (third...) highest/lowest value in vector or column
Rfast has a function called nth_element that does exactly what you ask.
Further the methods discussed above that are based on partial sort, don't support finding the k smallest values
Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755
Disclaimer: An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.
Rfast::nth(x, 5, descending = T)
Will return the 5th largest element of x, while
Rfast::nth(x, 5, descending = F)
Will return the 5th smallest element of x
Benchmarks below against most popular answers.
For 10 thousand numbers:
N = 10000
x = rnorm(N)
maxN <- function(x, N=2){
len <- length(x)
if(N>len){
warning('N greater than length(x). Setting N=length(x)')
N <- length(x)
}
sort(x,partial=len-N+1)[len-N+1]
}
microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])
Unit: microseconds
expr min lq mean median uq max neval
Rfast 160.364 179.607 202.8024 194.575 210.1830 351.517 100
maxN 396.419 423.360 559.2707 446.452 487.0775 4949.452 100
order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148 100
For 1 million numbers:
N = 1e6
x = rnorm(N)
microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]])
Unit: milliseconds
expr min lq mean median uq max neval
Rfast 89.7722 93.63674 114.9893 104.6325 120.5767 204.8839 100
maxN 150.2822 207.03922 235.3037 241.7604 259.7476 336.7051 100
order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129 100
Make a table showing the 10 largest values of a variable in R?
This should do it...
data <- data[with(data,order(-Score)),]
data <- data[1:10,]
How can I get top n values with its index in R?
We can use sort
with index.return=TRUE
to return the value with the index in a list
. Then we can subset the list
based on the first 3 unique elements in the 'x'.
lst <- sort(df1$distance, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),3))
#$x
#[1] 5 5 4 4 3
#$ix
#[1] 6 7 2 5 4
find the index of top n elements in a vector in order [R]
apply()
is perfect for row-wise operations on matrices. You could do
t(apply(v1, 1, function(x) order(-x)[1:5]))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9 3 7 2 6
# [2,] 3 8 4 1 5
This runs the order()
function row-wise down the matrix v1
then takes the first five values for each one, transposing the result since you specify rows not columns.
Find the n most common values in a vector
I'm sure this is a duplicate, but the answer is simple:
sort(table(variable),decreasing=TRUE)[1:3]
Getting the last n elements of a vector. Is there a better way than using the length() function?
see ?tail
and ?head
for some convenient functions:
> x <- 1:10
> tail(x,5)
[1] 6 7 8 9 10
For the argument's sake : everything but the last five elements would be :
> head(x,n=-5)
[1] 1 2 3 4 5
As @Martin Morgan says in the comments, there are two other possibilities which are faster than the tail solution, in case you have to carry this out a million times on a vector of 100 million values. For readibility, I'd go with tail.
test elapsed relative
tail(x, 5) 38.70 5.724852
x[length(x) - (4:0)] 6.76 1.000000
x[seq.int(to = length(x), length.out = 5)] 7.53 1.113905
benchmarking code :
require(rbenchmark)
x <- 1:1e8
do.call(
benchmark,
c(list(
expression(tail(x,5)),
expression(x[seq.int(to=length(x), length.out=5)]),
expression(x[length(x)-(4:0)])
), replications=1e6)
)
function to return top 5 indexes of highest values of a vector
Create an index array and partially sort that:
std::vector<size_t> indices(vMetric.size());
std::iota(indices.begin(), indices.end(), 0);
std::partial_sort(indices.begin(), indices.begin() + 5, indices.end(),
[&](size_t A, size_t B) {
return vMetric[A] > vMetric[B];
});
The first 5 elements of indices
contain your answer and the original vector is not mutated.
Related Topics
Specify Height and Width of Ggplot Graph in Rmarkdown Knitr Output
Combining Different Types of Graphs Together (R)
R Subset with Condition Using %In% or ==. Which One Should Be Used
As.Numeric() Removes Decimal Places in R, How to Change
Correct Number of Decimal Places Reading in a .Csv
Different Axis Limits Per Facet in Ggplot2
How to Write Contents of Help to a File from Within R
Convert a Dataframe to an Object of Class "Dist" Without Actually Calculating Distances in R
Dplyr::N() Returns "Error: Error: N() Should Only Be Called in a Data Context "
Determine Level of Nesting in R
Devtools::Install_Github() - Ignore Ssl Cert Verification Failure
How to Produce a Heatmap with Ggplot2
Rcpp Warning: "Directory Not Found for Option '-L/Usr/Local/Cellar/Gfortran/4.8.2/Gfortran'"
Text-Mining with the Tm-Package - Word Stemming
How to Read Huge CSV File into R by Row Condition
R - Ggplot Line Color (Using Geom_Line) Doesn't Change