R: Find Nearest Index

Fastest way to find nearest value in vector

library(data.table)

a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

a[,merge:=Value]

b=data.table(Value=c(4,6,10,16))

b[,merge:=Value]

setkeyv(a,c('merge'))

setkeyv(b,c('merge'))

Merge_a_b=a[b,roll='nearest']

In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a to the nearest element in data table b. The size of the resultant data table will be equal to the size of b (whichever is within the bracket). It requires a common key for merging as usual.

R: find nearest index

You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:

sapply(b,function(x)which.min(abs(x - A)))

closest value and data frame index index of all data frame elements of a list

Here is a dplyr approach. We can generate the list.index and line.number.in.df for each dataframe and then bind_rows them together. Next, slice the rows where C2 contains the closest value for each number in that vector.

library(dplyr)

test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))

vector <- c(3, 14.4, 7, 0)

test %>%
lapply(tibble::rowid_to_column, "line.number.in.df") %>%
bind_rows(.id = "list.index") %>%
slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))

Output is

  list.index line.number.in.df  C1   C2   C3
1 1 2 0.4 3.5 4.0
2 3 3 0.8 14.0 11.0
3 2 3 0.6 8.0 10.0
4 1 1 0.2 2.0 0.3

return index from a vector of the value closest to a given element

one way:

# as mnel points out in his answer, the difference,
# using `which` here gives all indices that match
which(abs(x-0.4) == min(abs(x-0.4)))

where x is your vector.

Alternately,

# this one returns the first index, but is SLOW
sort(abs(x-0.4), index.return=T)$ix[1]

Nearest index of a logical vector in R

We can get the index of all TRUE values and then use findInterval to get the closest one for each value in b.

inds <- which(df$a)
df$c <- inds[findInterval(df$b, inds)]
df

# a b c
#1 FALSE NA NA
#2 TRUE NA NA
#3 FALSE 3 2
#4 FALSE NA NA
#5 FALSE NA NA
#6 TRUE NA NA
#7 FALSE NA NA
#8 FALSE 8 6
#9 TRUE NA NA
#10 TRUE NA NA
#11 FALSE NA NA
#12 FALSE 12 10
#13 FALSE NA NA
#14 FALSE NA NA
#15 FALSE NA NA

Closest subsequent index for a specified value

Find the location of each value (numeric or character)

int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7

Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1] 3 3 3 6 6 6 7 NA

Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

abs(seq_along(int) - nearest)
## [1] 2 1 0 2 1 0 0 NA

Written as a function

f <- function(x, value) {
idx = which(x == value)
nearest = rep(NA, length(x))
if (length(idx)) # non-NA values only if `value` in `x`
nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
abs(seq_along(x) - nearest)
}

We have

> f(int, 0)
[1] 2 1 0 2 1 0 0 NA
> f(int, 1)
[1] 0 0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1] 1 0 NA NA NA
> f(char, "C")
[1] 2 1 0 NA NA

The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.

Quickest way to find closest elements in an array in R

R is vectorized, so skip the for loop. This saves time in scripting and computation. Simply replace the for loop with an apply function. Since we're returning a 1D vector, we use sapply.

YmatchIndex <- sapply(Xtimes, function(x){which.min(abs(Ytimes - x))})


Proof that apply is faster:

library(microbenchmark)
library(ggplot2)

# set up data
Xtimes <- c(1,5,8,10,15,19,23,34,45,51,55,57,78,120)
Ytimes <- seq(0,120,length.out = 1000)

# time it
mbm <- microbenchmark(
for_loop = for (i in 1:length(Xtimes)) {
YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
times = 100
)

# plot
autoplot(mbm)

Sample Image

See ?apply for more.

In R: find the closest value within group_by excluding self comparisons

Answered it using a question I asked years ago Count values less than x and find nearest values to x by multiple groups

temp1 <- data%>%
group_by(river) %>%
mutate(n_ds = match(dist,sort(dist))-1) %>%
mutate(closest_uid=apply(sapply(dist, function(i)abs(i-dist)), 2, function(n) id[which(n==sort(n)[2])])) %>%
data.frame()

tempdist <- temp1 %>% select(dist, id) %>% rename(rivDist = dist)

temp2 <- temp1 %>% left_join(tempdist, by = c('closest_uid' = 'id')) %>%
mutate(mindist = abs(dist - rivDist)

How to find the closest value and return the value of the other column?

Or following what you tried already:

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300

As a side note (not sure what your outcome should be if there are two matches):

dfdf<-data.frame(a= c(80,90,105,105,120),
b= c(500,400,300,200,100))
index= 105

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300

dfdf[findInterval(index, dfdf$a),"b"]
# [1] 200

One more fun example:

dfdf<-data.frame(a= c(80,90,100,105,120),
b= c(500,400,300,200,100))
index= 95

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 400

dfdf[findInterval(index, dfdf$a),"b"]
# [1] 400


Related Topics



Leave a reply



Submit