Find second highest value on a raster stack in R
You'll probably want to use calc()
, adapting the code below to your precise situation. Just to show that it works as advertised, I've separately plotted layers formed by taking the highest, second highest, third, and fourth highest values found in each cell of the 4-layer RasterStack
object.
zz <- range(cellStats(rs, range))
par(mfcol=c(2,2))
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[1]]), main="1st",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[2]]), main="2nd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[3]]), main="3rd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[4]]), main="4th",zlim=zz)
Or, more compactly and efficiently, just construct a new raster stack holding the reordered values and then plot its layers:
zz <- range(cellStats(rs, range))
rs_ord <- calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)])
par(mfcol=c(2,2))
plot(rs_ord[[1]], main="1st", zlim=zz)
plot(rs_ord[[2]], main="2nd", zlim=zz)
plot(rs_ord[[3]], main="3rd", zlim=zz)
plot(rs_ord[[4]], main="4th", zlim=zz)
Find layer names for the second highest values in a raster stack in R
Here is an approach modifying the which.max.na
function to report the second highest index. Notice that I added sum(!is.na(x)) == 1
to let the function report 0
when there is only one non-NA value.
which.second.max.na <- function(x, ...)
ifelse(length(x) == sum(is.na(x)) | sum(!is.na(x)) == 1, 0,
which.max(`[<-`(x, which.max(x), NA)))
m2 <- calc(rs, which.second.max.na)
We can print the first few values to see if which.max.na
and which.second.max.na
are working.
head(values(m1))
[1] 2 1 4 2 1 2
head(values(m2))
[1] 4 3 2 1 2 3
head(values(rs))
layer.1 layer.2 layer.3 layer.4
[1,] 0.2875775 0.7999890 0.03872603 0.784575267
[2,] 0.7883051 0.5328235 0.76235894 0.009429905
[3,] 0.4089769 0.6886130 0.40136573 0.779065883
[4,] 0.8830174 1.1544738 0.31502973 0.729390652
[5,] 0.9404673 0.6829024 0.20257334 0.630131853
[6,] 0.0455565 1.0903502 0.68024654 0.480910830
It seems like for the first six values from the RasterStack, both functions work as expected.
Finally, be careful that if there are ties in the RasterStack, these two functions could be problematic.
How to find second highest value and corresponding layer name in a raster stack in R
Depending on the dimensions of your rasters, you may be able to use the following, which I'll demonstrate with dummy data in RasterStack s
:
library(raster)
s <- stack(replicate(12, raster(matrix(runif(100000), 1000))))
# coerce s to a data.frame
d <- s[]
# return the second-highest value
sort(d, decreasing=TRUE)[2]
# identify the column containing the second-highest value
col(d)[order(d, decreasing=TRUE)[2]]
If the raster's dimensions are too large to use the above approach, you can instead identify the highest two values of each layer in turn, and then work out which layer has the second-highest value:
# return a matrix whose columns contain the top two values per layer
top_two <- sapply(seq_len(nlayers(s)), function(i) {
sort(s[[i]][], decreasing=TRUE)[1:2]
})
# return the second-highest value
sort(top_two, decreasing=TRUE)[2]
# identify the column containing the second-highest value
col(top_two)[order(top_two, decreasing=TRUE)[2]]
select highest value in Raster stack and show layer name as legend
You need which.max(s)
instead of whiches.max(s)
. Then, to plot, the easiest thing is probably to melt
the raster into a data frame and assign the labels to the value
column and use ggplot
to draw the result with geom_tile
or geom_raster
max_raster <- which.max(s)
df <- reshape2::melt(as.matrix(max_raster))
df$value <- factor(var_name[df$value], var_name)
library(ggplot2)
ggplot(df, aes(Var2, Var1, fill = value)) +
geom_tile() +
scale_y_reverse() +
coord_equal() +
scale_fill_brewer(palette = 'Set1') +
theme_light()
Locate % of times that the second highest value appears for each column in R data frame
I would do it like this:
# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1
A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)
First I compute reverse rank which is analogous to using decreasing=T
with sort()
or order()
. A and B is then rather straightforward. Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.
For C, I take only first two columns of the rank matrix and compute their product for every row. If there are the two largest values in the first two columns the product has to be 2.
Also, if ties might appear in your data set you should consider selecting the appropriate ties.method
argument for rank
.
Fastest way to find second (third...) highest/lowest value in vector or column
Rfast has a function called nth_element that does exactly what you ask.
Further the methods discussed above that are based on partial sort, don't support finding the k smallest values
Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755
Disclaimer: An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.
Rfast::nth(x, 5, descending = T)
Will return the 5th largest element of x, while
Rfast::nth(x, 5, descending = F)
Will return the 5th smallest element of x
Benchmarks below against most popular answers.
For 10 thousand numbers:
N = 10000
x = rnorm(N)
maxN <- function(x, N=2){
len <- length(x)
if(N>len){
warning('N greater than length(x). Setting N=length(x)')
N <- length(x)
}
sort(x,partial=len-N+1)[len-N+1]
}
microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])
Unit: microseconds
expr min lq mean median uq max neval
Rfast 160.364 179.607 202.8024 194.575 210.1830 351.517 100
maxN 396.419 423.360 559.2707 446.452 487.0775 4949.452 100
order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148 100
For 1 million numbers:
N = 1e6
x = rnorm(N)
microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]])
Unit: milliseconds
expr min lq mean median uq max neval
Rfast 89.7722 93.63674 114.9893 104.6325 120.5767 204.8839 100
maxN 150.2822 207.03922 235.3037 241.7604 259.7476 336.7051 100
order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129 100
Find second largest column name
We can do this in one apply
call by finding out 2 maximum value in each row and returning their column name.
temp[c("MaxOrders", "secondMaxOrders", "MaxColName", "secondMaxColumnName")] <-
t(apply(temp, 1, function(x) {
inds <- order(x, decreasing = TRUE)[1:2]
c(x[inds], names(temp)[inds])
}))
temp
# c1 c2 c3 c4 c5 MaxOrders secondMaxOrders MaxColName secondMaxColumnName
#1 1 1 4 1 2 4 2 c3 c5
#2 2 2 5 6 2 6 5 c4 c3
#3 3 3 1 5 2 5 3 c4 c1
#4 4 1 2 4 2 4 4 c1 c4
OR if you want to completely remove the maximum value and consider only the remaining ones for second max
t(apply(temp, 1, function(x) {
inds <- match(unique(sort(x, decreasing=TRUE))[1:2], x)
c(x[inds], names(temp)[inds])
}))
# [,1] [,2] [,3] [,4]
#[1,] "4" "2" "c3" "c5"
#[2,] "6" "5" "c4" "c3"
#[3,] "5" "3" "c4" "c1"
#[4,] "4" "2" "c1" "c3"
How can I find the pixel-wise maximum of multiple rasters?
If you first put the rasters in a stack, you can then simply apply min()
or max()
to the stack to get the summary RasterLayer
you're after
## Example rasters and stack
r1 <- raster(matrix(1:4,ncol=4))
r2 <- -2*r1
r3 <- 2*r1
rr <- list(r1,r2,r3)
s <- stack(rr)
## Extract the pixel-wise min and max values
min(s)
max(s)
(To apply some other, more complicated function that returns a scalar for each pixel in the stack, you may want to use calc()
, as demonstrated (for example) here.)
R : How to find the location of the max value in a raster?
Let's try with a toy raster as follows:
library(raster)
r = raster(nrow=10, ncol=10)
r[] = runif(100,0,10)
Then the position (index) of the maximum is found using
idx = which.max(r)
And from thhe index position to the coordinates of the cell
pos = xyFromCell(r,idx)
Let me know if it works
Related Topics
Error in Xj[I]: Invalid Subscript Type 'List'
Same Seed, Different Os, Different Random Numbers in R
Quantiles by Factor Levels in R
How to Use Stat_Function by Group
Calculate a 2D Spline Curve in R
Margins Between Plots in Grid.Arrange
How to Find All Possible Subsets of a Set Iteratively in R
R Bookdown - Custom Title Page
Error: C Stack Usage Is Too Close to The Limit in R
How Does R Represent Na Internally
How to Format Kable Table When Knit from .Rmd to Word (With Bookdown)
Combining Two Vectors Element-By-Element
Specific Spaces Between Bars in a Barplot - Ggplot2 - R
Large Matrices in Rcpparmadillo via The Arma_64Bit_Word Define
Converting a Long-Formated Dataframe to Wide Format Tidyverse