R Compare Multiple Values with Vector and Return Vector

R compare multiple values with vector and return vector

Just try:

 A %in% Targets

The %in% function tells you if each element of the first argument equals one of the elements of the second argument, that's exactly what you are looking for.

R: compare the next two values in a vector with each other (without looping if possible)

Based on the edited version of the question, it's now clear that you need some sort of a looping function, because your decisions on previous indices affect your decisions on subsequent indices. The most efficient way I can think to do this would be to populate a logical vector indicating whether each index should be kept in the vector. Afterward you can use the logical vector to get both the remaining values and the indices that were removed.

x <- c(10,  7,  7, 10,  7, 10,  7, 10, 10,  7, 10, 10,  7,  7, 10, 10, 7, 10,  7,  7, 10,  7, 10)
keep <- rep(TRUE, length(x))
even <- TRUE
for (pos in 2:length(x)) {
if (even & x[pos] == x[pos-1]) {
keep[pos-1] <- FALSE
} else {
even <- !even
}
}
x[keep]
# [1] 10 7 7 10 7 10 7 10 10 7 10 7 7 10 10 7 10 7 7 10 7 10
which(!keep)
# [1] 11

As with any looping function, Rcpp can be used to get a speedup:

library(Rcpp)
cppFunction(
"LogicalVector getBin(NumericVector x) {
const int n = x.size();
LogicalVector keep(n, true);
bool even = true;
for (int pos=1; pos < n; ++pos) {
if (even && x[pos] == x[pos-1]) {
keep[pos-1] = false;
} else {
even = !even;
}
}
return keep;
}")

Benchmarking of the pure-R and Rcpp approaches:

# Slightly larger dataset
set.seed(144)
x <- sample(1:10, 1000, replace=T)

# Functions to compare
pureR <- function(x) {
keep <- rep(TRUE, length(x))
even <- TRUE
for (pos in 2:length(x)) {
if (even & x[pos] == x[pos-1]) {
keep[pos-1] <- FALSE
} else {
even <- !even
}
}
list(x[keep], which(!keep))
}
with.Rcpp <- function(x) {
keep <- getBin(x)
list(x[keep], which(!keep))
}
all.equal(pureR(x), with.Rcpp(x))
# [1] TRUE
library(microbenchmark)
microbenchmark(pureR(x), with.Rcpp(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# pureR(x) 855.318 1066.177 1806.67855 1140.656 1442.869 35379.369 100
# with.Rcpp(x) 30.137 62.304 86.80656 78.132 94.771 348.598 100

With a vector of length 1000 we see a speedup of more than 10x from using Rcpp. Obviously this speedup would only be relevant for much larger vectors.

Compare a list against multiple vectors, break 'loop', and populate new column

You've got a few things to unlearn from your SAS days, but first here's a solution:

 non_fall2_flag$abuse <-  apply( non_fall2_flag[diag_codes], 1, 
function(x) if('99559' %in% x) {"other abuse"} else
if ('99550' %in% x) {"unspec."} else {""} )

The things to unlearn are that R does not have an implicit row-oriented looping mechanism in the manner of what you are familiar with in data steps. The second is that ifelse is designed to return vectors but you should not be using <- inside the consequent and alternate expressions. Instead you need to provide two vectors and the ifelse machinery will do the choosing. Any assignment should be outside the ifelse. If you had been working with a single column rather than wanting to test multiple columns at once, you could have used ifelse.

My code used %in% to apply the membership test across an entire row at a time. When apply is used with a second argument of 1, an entire row is passed to the formal argument of the function in the third position. Another approach to processing several columns at one might have been to use mapply, but then you would have needed to separately extract the columns and that would ahve been a lot more bulky code.

I modified your data sample so that at least two of hte lines would match your test and this then succeeded:

non_fall2_flag $broad <-  apply( non_fall2_flag[,diag_codes] ,1 ,
function(x)
if ( any( '9251' == substr(x,1,4) ) ) {1} else
if ( any( '95901' == substr(x,1,5)) ) {1} else {0})
non_fall2_flag

Note that the any function will collapse a set of logical tests down to a single value, whereas your code would have only tested the first value of the vector returned by substr.

Comparing multiple vectors

Start by putting all of your vectors in a list, which will make them easier to work with. I imagine you then just want to know if each element of each vector appears in any of the other vectors. You can do that with a simple leave-one-out comparison of each vector to all the other vectors in the list:

x <- list(a, b, c)
lapply(seq_along(x), function(n) x[[n]] %in% unlist(x[-n]))
# [[1]]
# [1] FALSE FALSE FALSE
#
# [[2]]
# [1] FALSE FALSE FALSE
#
# [[3]]
# [1] FALSE FALSE FALSE

In the above structure, each vector is compared against all other values in all other vectors (combined). So the first list element is a three-element vector indicating whether each element of a is found anywhere in b or c, and so forth.

If you need to do every pairwise comparison of vectors, you can do:

apply(combn(seq_along(x), 2), 2, function(n) x[[n[1]]] %in% x[[n[2]]])
# [,1] [,2] [,3]
# [1,] FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE

In this structure, each column relates to a comparison of the vectors given by combn(seq_along(x), 2):

     [,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3

So the first column indicates whether each element of a is found in b, the second column indicates whether each element of a is found in c, etc.

Comparing one value across multiple rows in one data frame with values across multiple rows in a second data frame

Here's an answer with dplyr:

library(dplyr)

df1 <- tribble(
~CHR, ~POS,
1, 2000,
1, 3000,
2, 1500,
3, 3000
)

df2 <- tribble(
~CHR, ~POS_START, ~POS_END,
1, 1500, 2500,
1, 3200, 4000,
2, 1200, 1600,
2, 2000, 2200,
3, 5000, 5500,
4, 1000, 1200
)

df1 %>%
left_join(df2, by = 'CHR') %>%
mutate(IN_RANGE = POS >= POS_START & POS <= POS_END) %>%
group_by(CHR, POS) %>%
summarize(IN_RANGE = sum(IN_RANGE) > 0)

Using identical() in R with multiple vectors

I would just pick one, say A, and do all pair-wise comparisons with it.

all(sapply(list(B, C, D, E), FUN = identical, A))
# [1] FALSE

Remove the all() to see the not identical one(s)

sapply(list(B, C, D, E), FUN = identical, A)
# [1] TRUE TRUE TRUE FALSE

identical ought to be transitive, so if A is identical to C and to D, then C should be identical to D.

(Thanks to @docendo discimus for simplified syntax.)

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))


Related Topics



Leave a reply



Submit