Fastest way to find nearest value in vector
library(data.table)
a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
a[,merge:=Value]
b=data.table(Value=c(4,6,10,16))
b[,merge:=Value]
setkeyv(a,c('merge'))
setkeyv(b,c('merge'))
Merge_a_b=a[b,roll='nearest']
In the Data table when we merge two data table, there is an option called nearest which put all the element in data table a
to the nearest element in data table b
. The size of the resultant data table will be equal to the size of b
(whichever is within the bracket). It requires a common key for merging as usual.
R: find nearest index
You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:
sapply(b,function(x)which.min(abs(x - A)))
closest value and data frame index index of all data frame elements of a list
Here is a dplyr
approach. We can generate the list.index
and line.number.in.df
for each dataframe and then bind_rows
them together. Next, slice
the rows where C2 contains the closest value for each number in that vector.
library(dplyr)
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
vector <- c(3, 14.4, 7, 0)
test %>%
lapply(tibble::rowid_to_column, "line.number.in.df") %>%
bind_rows(.id = "list.index") %>%
slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))
Output is
list.index line.number.in.df C1 C2 C3
1 1 2 0.4 3.5 4.0
2 3 3 0.8 14.0 11.0
3 2 3 0.6 8.0 10.0
4 1 1 0.2 2.0 0.3
return index from a vector of the value closest to a given element
one way:
# as mnel points out in his answer, the difference,
# using `which` here gives all indices that match
which(abs(x-0.4) == min(abs(x-0.4)))
where x
is your vector.
Alternately,
# this one returns the first index, but is SLOW
sort(abs(x-0.4), index.return=T)$ix[1]
Nearest index of a logical vector in R
We can get the index of all TRUE
values and then use findInterval
to get the closest one for each value in b
.
inds <- which(df$a)
df$c <- inds[findInterval(df$b, inds)]
df
# a b c
#1 FALSE NA NA
#2 TRUE NA NA
#3 FALSE 3 2
#4 FALSE NA NA
#5 FALSE NA NA
#6 TRUE NA NA
#7 FALSE NA NA
#8 FALSE 8 6
#9 TRUE NA NA
#10 TRUE NA NA
#11 FALSE NA NA
#12 FALSE 12 10
#13 FALSE NA NA
#14 FALSE NA NA
#15 FALSE NA NA
Closest subsequent index for a specified value
Find the location of each value (numeric or character)
int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7
Expand the index to indicate the nearest value of interest, using an NA after the last value in int
.
nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1] 3 3 3 6 6 6 7 NA
Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value
abs(seq_along(int) - nearest)
## [1] 2 1 0 2 1 0 0 NA
Written as a function
f <- function(x, value) {
idx = which(x == value)
nearest = rep(NA, length(x))
if (length(idx)) # non-NA values only if `value` in `x`
nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
abs(seq_along(x) - nearest)
}
We have
> f(int, 0)
[1] 2 1 0 2 1 0 0 NA
> f(int, 1)
[1] 0 0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1] 1 0 NA NA NA
> f(char, "C")
[1] 2 1 0 NA NA
The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.
Quickest way to find closest elements in an array in R
R
is vectorized, so skip the for
loop. This saves time in scripting and computation. Simply replace the for
loop with an apply
function. Since we're returning a 1D vector, we use sapply
.
YmatchIndex <- sapply(Xtimes, function(x){which.min(abs(Ytimes - x))})
Proof that apply
is faster:
library(microbenchmark)
library(ggplot2)
# set up data
Xtimes <- c(1,5,8,10,15,19,23,34,45,51,55,57,78,120)
Ytimes <- seq(0,120,length.out = 1000)
# time it
mbm <- microbenchmark(
for_loop = for (i in 1:length(Xtimes)) {
YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
times = 100
)
# plot
autoplot(mbm)
See ?apply for more
.
In R: find the closest value within group_by excluding self comparisons
Answered it using a question I asked years ago Count values less than x and find nearest values to x by multiple groups
temp1 <- data%>%
group_by(river) %>%
mutate(n_ds = match(dist,sort(dist))-1) %>%
mutate(closest_uid=apply(sapply(dist, function(i)abs(i-dist)), 2, function(n) id[which(n==sort(n)[2])])) %>%
data.frame()
tempdist <- temp1 %>% select(dist, id) %>% rename(rivDist = dist)
temp2 <- temp1 %>% left_join(tempdist, by = c('closest_uid' = 'id')) %>%
mutate(mindist = abs(dist - rivDist)
How to find the closest value and return the value of the other column?
Or following what you tried already:
dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300
As a side note (not sure what your outcome should be if there are two matches):
dfdf<-data.frame(a= c(80,90,105,105,120),
b= c(500,400,300,200,100))
index= 105
dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 300
dfdf[findInterval(index, dfdf$a),"b"]
# [1] 200
One more fun example:
dfdf<-data.frame(a= c(80,90,100,105,120),
b= c(500,400,300,200,100))
index= 95
dfdf$b[which.min(abs(index - dfdf$a))]
# [1] 400
dfdf[findInterval(index, dfdf$a),"b"]
# [1] 400
Related Topics
How to Plot Pie Charts in Haplonet Haplotype Networks {Pegas}
Display Frequency Instead of Count with Geom_Bar() in Ggplot
How to Overlay an Image on to a Ggplot
How to Get Discrete Factor Levels to Be Treated as Continuous
R 3.5 Is Not Available for Linux
Fill in Data Frame with Values from Rows Above
Create Link to the Other Part of the Shiny App
Ggplot Geom_Bar: Stack and Center
Ggplot2: How to Transparently Shade Alternate Days on a Plot
R, Sweave, Latex - Escape Variables to Be Printed in Latex
How to Access Browser Session/Cookies from Within Shiny App
Calculate Summary Statistics (E.G. Mean) on All Numeric Columns Using Data.Table
Extracting Zip+CSV File from Attachment W/ Image in Body of Email
Inserting a New Row to Data Frame for Each Group Id
Drawing a Tangent to the Plot and Finding the X-Intercept Using R
How to Create Group Indices for Nested Groups in R