Find Matches of a Vector of Strings in Another Vector of Strings

Find matches of a vector of strings in another vector of strings

You can try pasting your "keywords" together and separate them with the pipe character (|) which will work like an "or", like this:

> articles[grepl(paste(keywords, collapse="|"), articles$text),]
id text
1 1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
2 2 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
4 4 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse

R: Finding multiple string matches in a vector of strings

We can use &

i1 <- grepl(toMatch[1], files.list) & grepl(toMatch[2], files.list)

If there are multiple elements in 'toMatch', loop through them with lapply and Reduce to a single logical vector with &

i1 <- Reduce(`&`, lapply(toMatch, grepl, x = files.list))
files.list[i1]
#[1] "Fasted DWeib NoCmaxW.xlsx"

It is also possible to collapse the elements with .* i.e. to match first word of 'toMatch' followed by a word boundary(\\b) then some characters (.*) and another word boundary (\\b) before the second word of 'toMatch'. In this example it works. May be it is better to add the word boundary at the start and end as well (which is not needed for this example)

pat1 <- paste(toMatch, collapse= "\\b.*\\b")
grep(pat1, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"

But, this will look for matches in the same order of words in 'toMatch'. In case, if have substring in reverse order and want to match those as well, create the pattern in the reverse order and then collapse with |

pat2 <- paste(rev(toMatch), collapse="\\b.*\\b")
pat <- paste(pat1, pat2, sep="|")
grep(pat, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"

R How can i count occurrences of a vector of strings in another vector of strings

In base R you can use nested sapply which is similar to the nested for loop iterating over each keyword in dumb_df$x for each string in big_df$x.

big_df$z <- colSums(sapply(big_df$x, function(x) sapply(dumb_df$x, grepl, x)))
big_df

# x y z
#1 happy birth day to you 1 4
#2 sorry bad day, man 2 3
#3 happy old man 3 2

Using str_count from stringr will avoid sapply you can combine the pattern in one string using paste0.

big_df$z <- stringr::str_count(big_df$x, paste0(dumb_df$x, collapse = '|'))

Order one vector of strings by partial match of the other

Here is one way using grep and sapply:

samples$mouse[sapply(samples$groups, function(x) { grep(x, samples$mouse) })]

The grep base R function is not vectorized with regard to the first parameter, so we can't feed in the entire groups vector. Instead, we can use sapply to find the indices of matches in the mouse vector of paths.

How to detect multiple strings in another vector of strings

You just need:

all_words %in% words

From help("%in%"):

%in% is a more intuitive interface as a binary operator, which returns
a logical vector indicating if there is a match or not for its left
operand.

Basically for each element in the first vector, it checks if there's a match in the left hand vector.

How to find the match position of a text string for each element in a vector in R?

This is the exact output of ?regexpr (along with some other helpful attributes):

regexpr("ber", month.name)
# [1] -1 -1 -1 -1 -1 -1 -1 -1 7 5 6 6
#attr(,"match.length")
# [1] -1 -1 -1 -1 -1 -1 -1 -1 3 3 3 3
#attr(,"index.type")
#[1] "chars"
#attr(,"useBytes")
#[1] TRUE

detect any pattern from a character vector within another character vector in R

Another option is to collapse the pattern into a single string with |

library(stringr)
str_detect(lorem, str_c(c('non', 'sit'), collapse = "|"))
#[1] TRUE FALSE FALSE TRUE

Can I extract strings that contain any of a vector of strings in R

You can take the outer of x and y with a vectorized grepl, then use which to give you the matching indices for x and y.

ind <- which(outer(x, y, Vectorize(grepl)), arr.ind = T)
x[ind[,1]]
# [1] "bub" "gre" "pop"
y[ind[,2]]
# [1] "bubble bath" "green frogs" "pop goes the weasel"

as a data.frame:

as.data.frame(Map('[', data.frame(x, y), as.data.frame(ind)))
# x y
# 1 bub bubble bath
# 2 gre green frogs
# 3 pop pop goes the weasel


Related Topics



Leave a reply



Submit