Finding Which Element of a Vector Is Between Two Values in R

Finding which element of a vector is between two values in R

You are looking for &, not &&:

x = c( .2, .4, 2.1, 5.3, 6.7, 10.5)
y = c( 1, 7)
x = x[ x >= y[1] & x <= y[2]]
x
# [1] 2.1 5.3 6.7

Edited to explain. Here's the text from ?'&' .

& and && indicate logical AND and | and || indicate logical OR. 
The shorter form performs elementwise comparisons in much the same way as arithmetic operators.
The longer form evaluates left to right examining only the first element of each vector.
Evaluation proceeds only until the result is determined.

So when you used && , it returned FALSE for the first element of your x and terminated.

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))

Find the difference between all values of two vectors

sapply(a, "-", b)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 2 3 4
# [2,] -1 0 1 2 3
# [3,] -2 -1 0 1 2
# [4,] -3 -2 -1 0 1
# [5,] -4 -3 -2 -1 0
# [6,] -5 -4 -3 -2 -1
# [7,] -6 -5 -4 -3 -2
# [8,] -7 -6 -5 -4 -3
# [9,] -8 -7 -6 -5 -4
#[10,] -9 -8 -7 -6 -5

Explanation

Taking advantage of the fact that a scalar minus a vector in R is an element-wise subtraction between said scalar and each element of the vector, we can simply apply the minus - operator to each value in a against the whole vector b.

Finding elements that do not overlap between two vectors

Yes, there is a way:

setdiff(list.a, list.b)
# [1] "Mary" "Jack" "Michelle"

Check if column value is in between (range) of two other column values

We can loop over each x$number using sapply and check if it lies in range of any of y$number1 and y$number2 and give the value accordingly.

x$found <- ifelse(sapply(x$number, function(p) 
any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x

# id number found
#1 1 5225 YES
#2 2 2222 <NA>
#3 3 3121 YES

Using the same logic but with replace

x$found <- replace(x$found, 
sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")

EDIT

If we want to also compare the id value we could do

x$found <- ifelse(sapply(seq_along(x$number), function(i) {
inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
any(inds) & (x$id[i] == y$id[which.max(inds)])
}), "YES", NA)

x$found
#[1] "YES" NA "YES"

apply() to find closest value in 2 vectors

With sapply, the option is

sapply(v1, function(x) which.min(abs(v2 - x)))
#[1] 4 7 3 9 2 10 4 9 5 1

Or with outer

max.col(-abs(outer(v1, v2, `-`)), 'first')
#[1] 4 7 3 9 2 10 4 9 5 1

Or using findInterval

i1 <- order(v1)
findInterval(v2, v1[i1])[i1]

R: compare the next two values in a vector with each other (without looping if possible)

Based on the edited version of the question, it's now clear that you need some sort of a looping function, because your decisions on previous indices affect your decisions on subsequent indices. The most efficient way I can think to do this would be to populate a logical vector indicating whether each index should be kept in the vector. Afterward you can use the logical vector to get both the remaining values and the indices that were removed.

x <- c(10,  7,  7, 10,  7, 10,  7, 10, 10,  7, 10, 10,  7,  7, 10, 10, 7, 10,  7,  7, 10,  7, 10)
keep <- rep(TRUE, length(x))
even <- TRUE
for (pos in 2:length(x)) {
if (even & x[pos] == x[pos-1]) {
keep[pos-1] <- FALSE
} else {
even <- !even
}
}
x[keep]
# [1] 10 7 7 10 7 10 7 10 10 7 10 7 7 10 10 7 10 7 7 10 7 10
which(!keep)
# [1] 11

As with any looping function, Rcpp can be used to get a speedup:

library(Rcpp)
cppFunction(
"LogicalVector getBin(NumericVector x) {
const int n = x.size();
LogicalVector keep(n, true);
bool even = true;
for (int pos=1; pos < n; ++pos) {
if (even && x[pos] == x[pos-1]) {
keep[pos-1] = false;
} else {
even = !even;
}
}
return keep;
}")

Benchmarking of the pure-R and Rcpp approaches:

# Slightly larger dataset
set.seed(144)
x <- sample(1:10, 1000, replace=T)

# Functions to compare
pureR <- function(x) {
keep <- rep(TRUE, length(x))
even <- TRUE
for (pos in 2:length(x)) {
if (even & x[pos] == x[pos-1]) {
keep[pos-1] <- FALSE
} else {
even <- !even
}
}
list(x[keep], which(!keep))
}
with.Rcpp <- function(x) {
keep <- getBin(x)
list(x[keep], which(!keep))
}
all.equal(pureR(x), with.Rcpp(x))
# [1] TRUE
library(microbenchmark)
microbenchmark(pureR(x), with.Rcpp(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# pureR(x) 855.318 1066.177 1806.67855 1140.656 1442.869 35379.369 100
# with.Rcpp(x) 30.137 62.304 86.80656 78.132 94.771 348.598 100

With a vector of length 1000 we see a speedup of more than 10x from using Rcpp. Obviously this speedup would only be relevant for much larger vectors.

How to tell what is in one vector and not another?

you can use the setdiff() (set difference) function:

> setdiff(x, y)
[1] 1


Related Topics



Leave a reply



Submit