Find matches of a vector of strings in another vector of strings
You can try pasting your "keywords" together and separate them with the pipe character (|
) which will work like an "or", like this:
> articles[grepl(paste(keywords, collapse="|"), articles$text),]
id text
1 1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
2 2 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
4 4 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
R: Finding multiple string matches in a vector of strings
We can use &
i1 <- grepl(toMatch[1], files.list) & grepl(toMatch[2], files.list)
If there are multiple elements in 'toMatch', loop through them with lapply
and Reduce
to a single logical vector
with &
i1 <- Reduce(`&`, lapply(toMatch, grepl, x = files.list))
files.list[i1]
#[1] "Fasted DWeib NoCmaxW.xlsx"
It is also possible to collapse the elements with .*
i.e. to match first word of 'toMatch' followed by a word boundary(\\b
) then some characters (.*
) and another word boundary (\\b
) before the second word of 'toMatch'. In this example it works. May be it is better to add the word boundary at the start and end as well (which is not needed for this example)
pat1 <- paste(toMatch, collapse= "\\b.*\\b")
grep(pat1, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"
But, this will look for matches in the same order of words in 'toMatch'. In case, if have substring in reverse order and want to match those as well, create the pattern
in the reverse order and then collapse with |
pat2 <- paste(rev(toMatch), collapse="\\b.*\\b")
pat <- paste(pat1, pat2, sep="|")
grep(pat, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"
R How can i count occurrences of a vector of strings in another vector of strings
In base R you can use nested sapply
which is similar to the nested for
loop iterating over each keyword in dumb_df$x
for each string in big_df$x
.
big_df$z <- colSums(sapply(big_df$x, function(x) sapply(dumb_df$x, grepl, x)))
big_df
# x y z
#1 happy birth day to you 1 4
#2 sorry bad day, man 2 3
#3 happy old man 3 2
Using str_count
from stringr
will avoid sapply
you can combine the pattern in one string using paste0
.
big_df$z <- stringr::str_count(big_df$x, paste0(dumb_df$x, collapse = '|'))
Order one vector of strings by partial match of the other
Here is one way using grep
and sapply
:
samples$mouse[sapply(samples$groups, function(x) { grep(x, samples$mouse) })]
The grep
base R function is not vectorized with regard to the first parameter, so we can't feed in the entire groups
vector. Instead, we can use sapply
to find the indices of matches in the mouse
vector of paths.
How to detect multiple strings in another vector of strings
You just need:
all_words %in% words
From help("%in%")
:
%in% is a more intuitive interface as a binary operator, which returns
a logical vector indicating if there is a match or not for its left
operand.
Basically for each element in the first vector, it checks if there's a match in the left hand vector.
How to find the match position of a text string for each element in a vector in R?
This is the exact output of ?regexpr
(along with some other helpful attributes):
regexpr("ber", month.name)
# [1] -1 -1 -1 -1 -1 -1 -1 -1 7 5 6 6
#attr(,"match.length")
# [1] -1 -1 -1 -1 -1 -1 -1 -1 3 3 3 3
#attr(,"index.type")
#[1] "chars"
#attr(,"useBytes")
#[1] TRUE
detect any pattern from a character vector within another character vector in R
Another option is to collapse the pattern
into a single string with |
library(stringr)
str_detect(lorem, str_c(c('non', 'sit'), collapse = "|"))
#[1] TRUE FALSE FALSE TRUE
Can I extract strings that contain any of a vector of strings in R
You can take the outer of x and y with a vectorized grepl, then use which to give you the matching indices for x and y.
ind <- which(outer(x, y, Vectorize(grepl)), arr.ind = T)
x[ind[,1]]
# [1] "bub" "gre" "pop"
y[ind[,2]]
# [1] "bubble bath" "green frogs" "pop goes the weasel"
as a data.frame:
as.data.frame(Map('[', data.frame(x, y), as.data.frame(ind)))
# x y
# 1 bub bubble bath
# 2 gre green frogs
# 3 pop pop goes the weasel
Related Topics
Why Does Is.Vector() Return True for List
Extract Date from Given String in R
How to Print a Variable Inside a for Loop to the Console in Real Time as the Loop Is Running
How to Replace Certain Values in a Specific Rows and Columns with Na in R
Append Multiple CSV Files into One File Using R
Using Proxy Interface in Plotly/Shiny to Dynamically Change Data
How to Expand a Large Dataframe in R
R Shiny Widgetfunc() Warning Messages with Eventreactive(Warning 1) and Renderdatatable (Warning 2)
How to Make Stacked Barplot with Ggplot2
Regression with Heteroskedasticity Corrected Standard Errors
Nas Are Not Allowed in Subscripted Assignments
How to Get the Min/Max Possible Numeric
How to Add Legend to Geom_Smooth in Ggplot in R
Find Most Frequent Combination of Values in a Data.Frame
How to Manage a Table/Matrix to Obtain Information Using Conditions