return index from a vector of the value closest to a given element
one way:
# as mnel points out in his answer, the difference,
# using `which` here gives all indices that match
which(abs(x-0.4) == min(abs(x-0.4)))
where x
is your vector.
Alternately,
# this one returns the first index, but is SLOW
sort(abs(x-0.4), index.return=T)$ix[1]
Closest subsequent index for a specified value
Find the location of each value (numeric or character)
int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7
Expand the index to indicate the nearest value of interest, using an NA after the last value in int
.
nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1] 3 3 3 6 6 6 7 NA
Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value
abs(seq_along(int) - nearest)
## [1] 2 1 0 2 1 0 0 NA
Written as a function
f <- function(x, value) {
idx = which(x == value)
nearest = rep(NA, length(x))
if (length(idx)) # non-NA values only if `value` in `x`
nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
abs(seq_along(x) - nearest)
}
We have
> f(int, 0)
[1] 2 1 0 2 1 0 0 NA
> f(int, 1)
[1] 0 0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1] 1 0 NA NA NA
> f(char, "C")
[1] 2 1 0 NA NA
The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.
index from one vector to another by closest values
You can use findInterval
, which constructs a sequence of intervals given by breakpoints in b
and returns the interval indices in which the elements of a
are located (see also ?findInterval
for additional arguments, such as behavior at interval boundaries).
a = 1:20
b = seq(from = 1, to = 20, by = 5)
findInterval(a, b)
#> [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
Find index of value in Vector, nearest to the input
Computationally, if the vector is not sorted, you can't expect anything less than O(n), which may or may not meet your expectation. If it doesn't, you should change the data structure. If it does, you could use std::min_element
, this way:
#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
int refElem = 42;
std::vector<int> v{1, 5, 36, 50};
auto i = min_element(begin(v), end(v), [=] (int x, int y)
{
return abs(x - refElem) < abs(y - refElem);
});
std::cout << std::distance(begin(v), i); // Prints 2
}
If the vector is sorted, on the other hand, you can use std::lower_bound()
and std::upper_bound()
, which have logarithmic complexity.
If you think complexity is an issue because of performance, do some measurements before deciding to change the data structure. Since vectors store their elements in a contiguous region of memory, a linear search resulting in high cache hit rate will often outperform a computationally more efficient algorithm on a data structure which allocates its element here and there in memory, resulting in frequent cache misses.
Choose closest x elements by index in a list/vector
num_closest_by_indices <- function(v, idx, num) {
# Try the base case, where idx is not within (num/2) of the edge
i <- abs(seq_along(x) - idx)
i[idx] <- +Inf # sentinel
# If there are not enough elements in the base case, incrementally add more
for (cutoff_idx in seq(floor(num/2), num)) {
if (sum(i <= cutoff_idx) >= num) {
# This will add two extra indices every iteration. Strictly if we have an even length, we should add the leftmost one first and `continue`, to break ties towards the left.
return(v[i <= cutoff_idx])
}
}
}
Here's an illustration of this algorithm: we rank the indices in order of desirability, then pick the lowest num
legal ones:
> seq_along(x)
1 2 3 4 5 6 7 8 9
> seq_along(x) - idx
-2 -1 0 1 2 3 4 5 6
> i <- abs(seq_along(x) - idx)
2 1 0 1 2 3 4 5 6
> i[idx] <- +Inf # sentinel to prevent us returning the element itself
2 1 Inf 1 2 3 4 5 6
Now we can just find num
elements with smallest values (break ties arbitrarily, unless you have a preference (left)).
Our first guess is all indices <= (num/2) ; this might not be enough if index
is within (num/2)
of the start/end.
> i <= 2
TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
> v[i <= 2]
1 2 4 5
So, adapting @dash2's code to handle the corner cases where some indices are illegal (nonpositive, or > length(x)), i.e. ! %in% 1:L
. Then min(elems)
would be the number of illegal indices which we cannot pick, hence we must pick abs(min(elems))
more.
Notes:
- in the end the code is simpler and faster to handle it by three piecewise cases. Aww.
- it actually seems to simplify things if we pick
(num+1)
indices, then removeidx
before returning the answer. Usingresult[-idx]
to remove it.
Find position of closest value to another value given a condition in R
Assuming your points always continue to decrease in value after the first decrease, and value
is between the point of the first decrease and the last point, you could do this:
closest <- function(value, vec, next_is){
lead_fun <- function(x) c(tail(x, -1), NA)
meets_cond <- get(next_is)(lead_fun(vec), vec)
which.min(abs(vec[meets_cond] - value)) + which.max(meets_cond) - 1
}
closest(6.2, vec = vector, next_is = '<')
# [1] 13
Check which elements in the vector meet your condition, find the index of the closest element in that vector, then add back the number of elements before the first which meets your condition.
Edit: ----------------------------------------
Another version of the function which accepts an arbitrary logical vector which is TRUE for indices meeting a condition:
closest <- function(value, vec, cond_vec){
which.min(abs(vec[cond_vec] - value)) + which.max(cond_vec) - 1
}
Note that this assumes the values matching your condition are all in one contiguous region (not e.g. the first matches, then the third, then the sixth, etc.)
If your condition is that the point comes after the max value:
closest(6.2, vec = vector, cond_vec = seq_along(vector) > which.max(vector))
# [1] 13
closest value and data frame index index of all data frame elements of a list
Here is a dplyr
approach. We can generate the list.index
and line.number.in.df
for each dataframe and then bind_rows
them together. Next, slice
the rows where C2 contains the closest value for each number in that vector.
library(dplyr)
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
vector <- c(3, 14.4, 7, 0)
test %>%
lapply(tibble::rowid_to_column, "line.number.in.df") %>%
bind_rows(.id = "list.index") %>%
slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))
Output is
list.index line.number.in.df C1 C2 C3
1 1 2 0.4 3.5 4.0
2 3 3 0.8 14.0 11.0
3 2 3 0.6 8.0 10.0
4 1 1 0.2 2.0 0.3
In R: find the closest value within group_by excluding self comparisons
Answered it using a question I asked years ago Count values less than x and find nearest values to x by multiple groups
temp1 <- data%>%
group_by(river) %>%
mutate(n_ds = match(dist,sort(dist))-1) %>%
mutate(closest_uid=apply(sapply(dist, function(i)abs(i-dist)), 2, function(n) id[which(n==sort(n)[2])])) %>%
data.frame()
tempdist <- temp1 %>% select(dist, id) %>% rename(rivDist = dist)
temp2 <- temp1 %>% left_join(tempdist, by = c('closest_uid' = 'id')) %>%
mutate(mindist = abs(dist - rivDist)
Match values to nearest, larger value in another list in R
You can take the minimum of the subset of the viable numbers that are equal to or greater than x:
picker <- function(x, viable_numbers) {
min(viable_numbers[viable_numbers >= x])
}
picker(x = 1, viable_numbers = viable_numbers)
[1] 5
picker(x = 5, viable_numbers = viable_numbers)
[1] 5
picker(x = 6, viable_numbers = viable_numbers)
[1] 10
picker(x = 20, viable_numbers = viable_numbers)
[1] Inf
Related Topics
Why Would R Use the "L" Suffix to Denote an Integer
Can't Execute Rsdriver (Connection Refused)
Finding Overlaps Between Interval Sets/Efficient Overlap Joins
Format Number as Fixed Width, with Leading Zeros
Saving Multiple Outputs of Foreach Dopar Loop
Ggplot2: Change Order of Display of a Factor Variable on an Axis
How to Obtain an 'Unbalanced' Grid of Ggplots
Creating a Plot Window of a Particular Size
How to Connect Two Coordinates with a Line Using Leaflet in R
R Loop for Variable Names to Run Linear Regression Model
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
Adaptive Moving Average - Top Performance in R
Linear Regression Loop for Each Independent Variable Individually Against Dependent
Rstudio Rmarkdown: Both Portrait and Landscape Layout in a Single PDF
Changing Whisker Definition in Geom_Boxplot
Ggplot Separate Legend and Plot