Find Elements Not in Smaller Character Vector List But in Big List

Find elements not in smaller character vector list but in big list

Look at help("%in%") - there's an example all the way at the bottom of that page that addresses this situation.

A <- c("A", "B", "C", "D")
B <- c("A", "B", "C")
(new <- A[which(!A %in% B)])

# [1] "D"

EDIT:

As Tyler points out, I should take my own advice and read the support documents. which() is unnecessary when using %in% for this example. So,

(new <- A[!A %in% B])

# [1] "D"

selecting elements bigger than a particular number in a list

You can use Filter :

Listsubset <- Filter(function(x) x$n > 10, BigList)

Or an alternative with sapply :

Listsubset <- BigList[sapply(BigList, `[[`, 'n') > 10]

Finding elements that do not overlap between two vectors

Yes, there is a way:

setdiff(list.a, list.b)
# [1] "Mary"     "Jack"     "Michelle"

In R, find elements of a vector in a list using vectorization

we can do this, seems to be the fastest by far.

v1 <- c(1, 200, 4000)
L1 <- list(1:4, 1:4*100, 1:4*1000)

sequence(lengths(L1))[match(v1, unlist(L1))]
# [1] 1 2 4
sequence(lengths(L1))[which(unlist(L1) %in% v1)]
# [1] 1 2 4

library(microbenchmark)
library(tidyverse)

microbenchmark(
  akrun_sapply = {sapply(L1, function(x) which(x %in% v1))},
  akrun_Vectorize = {Vectorize(function(x) which(x %in% v1))(L1)},
  akrun_mapply = {mapply(function(x, y) which(x %in% y), L1, v1)},
  akrun_mapply_match = {mapply(match, v1, L1)},
  akrun_map2 = {purrr::map2_int(L1, v1, ~ .x %in% .y %>% which)},
  CPak = {setNames(rep(1:length(L1), times=lengths(L1)), unlist(L1))[as.character(v1)]},
  zacdav = {sequence(lengths(L1))[match(v1, unlist(L1))]},
  zacdav_which = {sequence(lengths(L1))[which(unlist(L1) %in% v1)]},
  times = 10000
)

Unit: microseconds
               expr     min       lq      mean   median       uq        max neval
       akrun_sapply  18.187  22.7555  27.17026  24.6140  27.8845   2428.194 10000
    akrun_Vectorize  60.119  76.1510  88.82623  83.4445  89.9680   2717.420 10000
       akrun_mapply  19.006  24.2100  29.78381  26.2120  29.9255   2911.252 10000
 akrun_mapply_match  14.136  18.4380  35.45528  20.0275  23.6560 127960.324 10000
         akrun_map2 217.209 264.7350 303.64609 277.5545 298.0455   9204.243 10000
               CPak  15.741  19.7525  27.31918  24.7150  29.0340    235.245 10000
             zacdav   6.649   9.3210  11.30229  10.4240  11.5540   2399.686 10000
       zacdav_which   7.364  10.2395  12.22632  11.2985  12.4515   2492.789 10000

Using R, How to use a character vector to search for matches in a very large character vector

grep and family only allow a single pattern= in their call, but one can use Vectorize to help with this:

out <- Vectorize(grepl, vectorize.args = "pattern")(Cities, Locations)
rownames(out) <- Locations
out
#                New York San Francisco Austin
# San Antonio/TX    FALSE         FALSE  FALSE
# Austin/TX         FALSE         FALSE   TRUE
# Boston/MA         FALSE         FALSE  FALSE

(I added rownames(.) purely to identify columns/rows from the source data.)

With this, if you want to know which index points where, then you can do

apply(out, 1, function(z) which(z)[1])
# San Antonio/TX      Austin/TX      Boston/MA 
#             NA              3             NA 
apply(out, 2, function(z) which(z)[1])
#      New York San Francisco        Austin 
#            NA            NA             2

The first indicates the index within Cities that apply to each specific location. The second indicates the index within Locations that apply to each of Cities. Both of these methods assume that there is at most a 1-to-1 matching; if there are ever more, the which(z)[1] will hide the 2nd and subsequent, which is likely not a good thing.

How to tell what is in one vector and not another?

you can use the setdiff() (set difference) function:

> setdiff(x, y)
[1] 1

See which vector in a list is contained within a vector from another list (finding people's name matches)

Since you are dealing with lists it would be better to collapse them into vectors to be easy to deal with regular expressions. But you just arrange them in ascending order. In that case you can easily match them:

lst=sapply(first_last_names_list,function(x)paste0(sort(x),collapse=" "))
 lst1=gsub("\\s|$",".*",lst)
 lst2=sapply(full_names_list,function(x)paste(sort(x),collapse=" "))
 (lst3 = Vectorize(grep)(lst1,list(lst2),value=T,ignore.case=T))
               boy.*boy.*             bob.*orengo.*        kalonzo.*musyoka.*         anami.*lisamula.* 
           "boy boy juma"        "bob james orengo" "kalonzo musyoka stephen" "anami lisamula silverse"

Now if you want to link first_name_last_name_list and full_name_list then:

setNames(full_names_list[ match(lst3,lst2)],sapply(first_last_names_list[grep(paste0(names(lst3),collapse = "|"),lst1)],paste,collapse=" "))
$`boy boy`
[1] "boy"  "juma" "boy" 

$`bob orengo`
[1] "james"  "bob"    "orengo"

$`kalonzo musyoka`
[1] "stephen" "kalonzo" "musyoka"

$`anami lisamula`
[1] "lisamula" "silverse" "anami"

where the names are from first_last_list and the elements are full_name_list. It would be great for you to deal with character vectors rather than lists:

Combine a list of similar length vectors with NAs to one vector

Here's a vectorised version of your code :

dat <- do.call(cbind, x)
#Logical matrix
mat <- !is.na(dat)
#Number of non-NA's in each row
rs <- rowSums(mat)
#First non-NA value
val <- dat[cbind(1:nrow(dat), max.col(mat, ties.method = 'first'))]
#More than 1 non-NA value
val[rs > 1] <- 'conflict'
#Only NA value
val[rs == 0] <- 'none'
val

#[1] "A"        "A"        "A"        "A"        "Conflict" "B"       
#[7] "B"        "B"        "B"        "none"

EDIT - Updated to include suggestion from @Henrik to avoid nested ifelse which should make the solution faster.

conditional removing from a vector without if statement in R

Use setdiff

setdiff(a1, "out")
#[1] "bagh" "bir" 

setdiff(a2, "out")
#[1] "bagh" "bir"

%in% would work as well if we don't use which

a1[!a1 %in% "out"]
a2[!a2 %in% "out"]

Find Elements Not in Smaller Character Vector List But in Big List