R: Find Nearest Index

Fastest way to find nearest value in vector

library(data.table)

a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

a[,merge:=Value]

b=data.table(Value=c(4,6,10,16))

b[,merge:=Value]

setkeyv(a,c('merge'))

setkeyv(b,c('merge'))

Merge_a_b=a[b,roll='nearest']

In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a to the nearest element in data table b. The size of the resultant data table will be equal to the size of b (whichever is within the bracket). It requires a common key for merging as usual.

R: find nearest index

You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:

sapply(b,function(x)which.min(abs(x - A)))

closest value and data frame index index of all data frame elements of a list

Here is a dplyr approach. We can generate the list.index and line.number.in.df for each dataframe and then bind_rows them together. Next, slice the rows where C2 contains the closest value for each number in that vector.

library(dplyr)

test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA, 
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3, 
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA, 
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3, 
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA, 
-3L)))

vector <- c(3, 14.4, 7, 0)

test %>% 
  lapply(tibble::rowid_to_column, "line.number.in.df") %>% 
  bind_rows(.id = "list.index") %>% 
  slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))

Output is

  list.index line.number.in.df  C1   C2   C3
1          1                 2 0.4  3.5  4.0
2          3                 3 0.8 14.0 11.0
3          2                 3 0.6  8.0 10.0
4          1                 1 0.2  2.0  0.3

return index from a vector of the value closest to a given element

one way:

# as mnel points out in his answer, the difference,
# using `which` here gives all indices that match
which(abs(x-0.4) == min(abs(x-0.4)))

where x is your vector.

Alternately,

# this one returns the first index, but is SLOW
sort(abs(x-0.4), index.return=T)$ix[1]

Nearest index of a logical vector in R

We can get the index of all TRUE values and then use findInterval to get the closest one for each value in b.

inds <- which(df$a)
df$c <- inds[findInterval(df$b, inds)]
df

#       a  b  c
#1  FALSE NA NA
#2   TRUE NA NA
#3  FALSE  3  2
#4  FALSE NA NA
#5  FALSE NA NA
#6   TRUE NA NA
#7  FALSE NA NA
#8  FALSE  8  6
#9   TRUE NA NA
#10  TRUE NA NA
#11 FALSE NA NA
#12 FALSE 12 10
#13 FALSE NA NA
#14 FALSE NA NA
#15 FALSE NA NA

Closest subsequent index for a specified value

Find the location of each value (numeric or character)

int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7

Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1]  3  3  3  6  6  6  7 NA

Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

abs(seq_along(int) - nearest)
## [1]  2  1  0  2  1  0  0 NA

Written as a function

f <- function(x, value) {
    idx = which(x == value)
    nearest = rep(NA, length(x))
    if (length(idx)) # non-NA values only if `value` in `x`
        nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
    abs(seq_along(x) - nearest)
}

We have

> f(int, 0)
[1]  2  1  0  2  1  0  0 NA
> f(int, 1)
[1]  0  0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1]  1  0 NA NA NA
> f(char, "C")
[1]  2  1  0 NA NA

The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.

Quickest way to find closest elements in an array in R

R is vectorized, so skip the for loop. This saves time in scripting and computation. Simply replace the for loop with an apply function. Since we're returning a 1D vector, we use sapply.

YmatchIndex <- sapply(Xtimes, function(x){which.min(abs(Ytimes - x))})

Proof that apply is faster:

library(microbenchmark)
library(ggplot2)

# set up data
Xtimes <- c(1,5,8,10,15,19,23,34,45,51,55,57,78,120)
Ytimes <- seq(0,120,length.out = 1000)

# time it
mbm <- microbenchmark(
  for_loop = for (i in 1:length(Xtimes)) {
    YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
  },
  apply    = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
  times = 100
)

# plot
autoplot(mbm)

Sample Image

See ?apply for more.

In R: find the closest value within group_by excluding self comparisons

Answered it using a question I asked years ago Count values less than x and find nearest values to x by multiple groups

temp1 <- data%>%
  group_by(river) %>%
  mutate(n_ds = match(dist,sort(dist))-1) %>%
  mutate(closest_uid=apply(sapply(dist, function(i)abs(i-dist)), 2, function(n) id[which(n==sort(n)[2])])) %>%
  data.frame()

tempdist <- temp1 %>% select(dist, id) %>% rename(rivDist = dist)

temp2 <- temp1 %>% left_join(tempdist, by = c('closest_uid' = 'id')) %>%
  mutate(mindist = abs(dist - rivDist)

How to find the closest value and return the value of the other column?

Or following what you tried already:

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300

As a side note (not sure what your outcome should be if there are two matches):

dfdf<-data.frame(a= c(80,90,105,105,120),
                 b= c(500,400,300,200,100))
index= 105

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300

dfdf[findInterval(index, dfdf$a),"b"]
# [1] 200

One more fun example:

dfdf<-data.frame(a= c(80,90,100,105,120),
                 b= c(500,400,300,200,100))
index= 95

dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 400

dfdf[findInterval(index, dfdf$a),"b"]
# [1] 400