Find Second Highest Value on a Raster Stack in R

Find second highest value on a raster stack in R

You'll probably want to use calc(), adapting the code below to your precise situation. Just to show that it works as advertised, I've separately plotted layers formed by taking the highest, second highest, third, and fourth highest values found in each cell of the 4-layer RasterStack object.

zz <- range(cellStats(rs, range))

par(mfcol=c(2,2))
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[1]]), main="1st",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[2]]), main="2nd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[3]]), main="3rd",zlim=zz)
plot(calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)[4]]), main="4th",zlim=zz)

Or, more compactly and efficiently, just construct a new raster stack holding the reordered values and then plot its layers:

zz <- range(cellStats(rs, range))
rs_ord <- calc(rs, fun=function(X,na.rm) X[order(X,decreasing=T)])

par(mfcol=c(2,2))
plot(rs_ord[[1]], main="1st", zlim=zz)
plot(rs_ord[[2]], main="2nd", zlim=zz)
plot(rs_ord[[3]], main="3rd", zlim=zz)
plot(rs_ord[[4]], main="4th", zlim=zz)

Sample Image

Find layer names for the second highest values in a raster stack in R

Here is an approach modifying the which.max.na function to report the second highest index. Notice that I added sum(!is.na(x)) == 1 to let the function report 0 when there is only one non-NA value.

which.second.max.na <- function(x, ...) 
ifelse(length(x) == sum(is.na(x)) | sum(!is.na(x)) == 1, 0,
which.max(`[<-`(x, which.max(x), NA)))

m2 <- calc(rs, which.second.max.na)

We can print the first few values to see if which.max.na and which.second.max.na are working.

head(values(m1))
[1] 2 1 4 2 1 2

head(values(m2))
[1] 4 3 2 1 2 3

head(values(rs))
layer.1 layer.2 layer.3 layer.4
[1,] 0.2875775 0.7999890 0.03872603 0.784575267
[2,] 0.7883051 0.5328235 0.76235894 0.009429905
[3,] 0.4089769 0.6886130 0.40136573 0.779065883
[4,] 0.8830174 1.1544738 0.31502973 0.729390652
[5,] 0.9404673 0.6829024 0.20257334 0.630131853
[6,] 0.0455565 1.0903502 0.68024654 0.480910830

It seems like for the first six values from the RasterStack, both functions work as expected.

Finally, be careful that if there are ties in the RasterStack, these two functions could be problematic.

How to find second highest value and corresponding layer name in a raster stack in R

Depending on the dimensions of your rasters, you may be able to use the following, which I'll demonstrate with dummy data in RasterStack s:

library(raster)
s <- stack(replicate(12, raster(matrix(runif(100000), 1000))))

# coerce s to a data.frame
d <- s[]
# return the second-highest value
sort(d, decreasing=TRUE)[2]
# identify the column containing the second-highest value
col(d)[order(d, decreasing=TRUE)[2]]

If the raster's dimensions are too large to use the above approach, you can instead identify the highest two values of each layer in turn, and then work out which layer has the second-highest value:

# return a matrix whose columns contain the top two values per layer
top_two <- sapply(seq_len(nlayers(s)), function(i) {
sort(s[[i]][], decreasing=TRUE)[1:2]
})
# return the second-highest value
sort(top_two, decreasing=TRUE)[2]
# identify the column containing the second-highest value
col(top_two)[order(top_two, decreasing=TRUE)[2]]

select highest value in Raster stack and show layer name as legend

You need which.max(s) instead of whiches.max(s). Then, to plot, the easiest thing is probably to melt the raster into a data frame and assign the labels to the value column and use ggplot to draw the result with geom_tile or geom_raster

max_raster <- which.max(s)
df <- reshape2::melt(as.matrix(max_raster))
df$value <- factor(var_name[df$value], var_name)

library(ggplot2)

ggplot(df, aes(Var2, Var1, fill = value)) +
geom_tile() +
scale_y_reverse() +
coord_equal() +
scale_fill_brewer(palette = 'Set1') +
theme_light()

Sample Image

Locate % of times that the second highest value appears for each column in R data frame

I would do it like this:

# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1

A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)

First I compute reverse rank which is analogous to using decreasing=T with sort() or order(). A and B is then rather straightforward. Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.

For C, I take only first two columns of the rank matrix and compute their product for every row. If there are the two largest values in the first two columns the product has to be 2.

Also, if ties might appear in your data set you should consider selecting the appropriate ties.method argument for rank.

Fastest way to find second (third...) highest/lowest value in vector or column

Rfast has a function called nth_element that does exactly what you ask.

Further the methods discussed above that are based on partial sort, don't support finding the k smallest values

Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755

Disclaimer: An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.

Rfast::nth(x, 5, descending = T)

Will return the 5th largest element of x, while

Rfast::nth(x, 5, descending = F)

Will return the 5th smallest element of x

Benchmarks below against most popular answers.

For 10 thousand numbers:

N = 10000
x = rnorm(N)

maxN <- function(x, N=2){
len <- length(x)
if(N>len){
warning('N greater than length(x). Setting N=length(x)')
N <- length(x)
}
sort(x,partial=len-N+1)[len-N+1]
}

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])

Unit: microseconds
expr min lq mean median uq max neval
Rfast 160.364 179.607 202.8024 194.575 210.1830 351.517 100
maxN 396.419 423.360 559.2707 446.452 487.0775 4949.452 100
order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148 100

For 1 million numbers:

N = 1e6
x = rnorm(N)

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]])

Unit: milliseconds
expr min lq mean median uq max neval
Rfast 89.7722 93.63674 114.9893 104.6325 120.5767 204.8839 100
maxN 150.2822 207.03922 235.3037 241.7604 259.7476 336.7051 100
order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129 100

Find second largest column name

We can do this in one apply call by finding out 2 maximum value in each row and returning their column name.

temp[c("MaxOrders", "secondMaxOrders", "MaxColName", "secondMaxColumnName")] <-
t(apply(temp, 1, function(x) {
inds <- order(x, decreasing = TRUE)[1:2]
c(x[inds], names(temp)[inds])
}))

temp
# c1 c2 c3 c4 c5 MaxOrders secondMaxOrders MaxColName secondMaxColumnName
#1 1 1 4 1 2 4 2 c3 c5
#2 2 2 5 6 2 6 5 c4 c3
#3 3 3 1 5 2 5 3 c4 c1
#4 4 1 2 4 2 4 4 c1 c4

OR if you want to completely remove the maximum value and consider only the remaining ones for second max

t(apply(temp, 1, function(x) {
inds <- match(unique(sort(x, decreasing=TRUE))[1:2], x)
c(x[inds], names(temp)[inds])
}))

# [,1] [,2] [,3] [,4]
#[1,] "4" "2" "c3" "c5"
#[2,] "6" "5" "c4" "c3"
#[3,] "5" "3" "c4" "c1"
#[4,] "4" "2" "c1" "c3"

How can I find the pixel-wise maximum of multiple rasters?

If you first put the rasters in a stack, you can then simply apply min() or max() to the stack to get the summary RasterLayer you're after

## Example rasters and stack
r1 <- raster(matrix(1:4,ncol=4))
r2 <- -2*r1
r3 <- 2*r1
rr <- list(r1,r2,r3)
s <- stack(rr)

## Extract the pixel-wise min and max values
min(s)
max(s)

(To apply some other, more complicated function that returns a scalar for each pixel in the stack, you may want to use calc(), as demonstrated (for example) here.)

R : How to find the location of the max value in a raster?

Let's try with a toy raster as follows:

library(raster)
r = raster(nrow=10, ncol=10)
r[] = runif(100,0,10)

Then the position (index) of the maximum is found using

idx = which.max(r)

And from thhe index position to the coordinates of the cell

pos = xyFromCell(r,idx)

Let me know if it works



Related Topics



Leave a reply



Submit