grep in R using a character vector with multiple patterns with same order as vector
Try lapply
:
unlist(lapply(to_match, grep, vectorA, value = TRUE))
## [1] "RuL_KZB8" "RuL_KZA9"
or
unlist(sapply(to_match, grep, vectorA, value = TRUE))
## KZB8 KZA9
## "RuL_KZB8" "RuL_KZA9"
R: grep exactly using a character vector with multiple patterns
You have to create all versions of the string. Only this ID, ID at the beginning, ID at the end and ID in the middle...
search_string <- paste0(c('^',';',';','^'),
rep(t1, each=4),
c('$',';','$',';'), collapse='|')
candidates <- grep(search_string, t2)
R: grep multiple strings at once
Since these are exact matches use this where phrases
is a character vector of the phrases you want to match:
match(phrases, df[, 1])
This also works provided no phrase is a substring of another phrase:
grep(phrases, df[, 1])
R's grepl() to find multiple strings exists
Text <- c("instance", "percentage", "n",
"instance percentage", "percentage instance")
grepl("instance|percentage", Text)
# TRUE TRUE FALSE TRUE TRUE
grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE TRUE
The latter one works by looking for:
('instance')(any character sequence)('percentage')
OR
('percentage')(any character sequence)('instance')
Naturally if you need to find any combination of more than two words, this will get pretty complicated. Then the solution mentioned in the comments would be easier to implement and read.
Another alternative that might be relevant when matching many words is to use positive look-ahead (can be thought of as a 'non-consuming' match). For this you have to activate perl
regex.
# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
"character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))
# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
Text2, perl=TRUE)
# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) &
grepl("percentage", Text2) &
grepl("element", Text2) &
grepl("character", Text2)
# they produce identical results
identical(longperl, longstrd)
Furthermore, if you have the patterns stored in a vector you can condense the expressions significantly, giving you
pat <- c("instance", "percentage", "element", "character")
longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L
As asked for in the comments, if you want to match on exact words, i.e. not match on substrings, we can specify word boundaries using \\b
. E.g:
tx <- c("cent element", "percentage element", "element cent", "element centimetre")
grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE TRUE FALSE
R search a character vector using a pattern vector
You could form a regex alternation, and then grep for that:
vec <- c("Fast.file1", "Fast.file2", "Med.file3", "Medium.file4", "Slow.file5")
checkAgainst <- c("Fast", "Medium", "Med")
regex <- paste(checkAgainst, collapse="|")
Fast_files <- vec[grep(regex, vec)]
Fast_files
[1] "Fast.file1" "Fast.file2" "Med.file3" "Medium.file4"
Matching multiple patterns
Yes, you can. The |
in a grep
pattern has the same meaning as or
. So you can test for your pattern by using "001|100|000"
as your pattern. At the same time, grep
is vectorised, so all of this can be done in one step:
x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"
grep(pattern, x)
[1] 1 2 3
This returns an index of which of your vectors contained the matching pattern (in this case the first three.)
Sometimes it is more convenient to have a logical vector that tells you which of the elements in your vector were matched. Then you can use grepl
:
grepl(pattern, x)
[1] TRUE TRUE TRUE FALSE
See ?regex
for help about regular expressions in R.
Edit:
To avoid creating pattern manually we can use paste
:
myValues <- c("001", "100", "000")
pattern <- paste(myValues, collapse = "|")
find names that match either of two patters
You can use :
pattern <- c("id", "group")
grep(paste0(pattern, collapse = '|'), names(a), value = TRUE)
#[1] "c-id" "g_igroups"
With grepl
you can get logical value
grepl(paste0(pattern, collapse = '|'), names(a))
#[1] TRUE TRUE FALSE
A stringr
solution :
stringr::str_subset(names(a), paste0(pattern, collapse = '|'))
#[1] "c-id" "g_igroups"
R grep: Match one string against multiple patterns
What about applying the regexpr function over a vector of keywords?
keywords <- c("dog", "cat", "bird")
strings <- c("Do you have a dog?", "My cat ate by bird.", "Let's get icecream!")
sapply(keywords, regexpr, strings, ignore.case=TRUE)
dog cat bird
[1,] 15 -1 -1
[2,] -1 4 15
[3,] -1 -1 -1
sapply(keywords, regexpr, strings[1], ignore.case=TRUE)
dog cat bird
15 -1 -1
Values returned are the position of the first character in the match, with -1
meaning no match.
If the position of the match is irrelevant, use grepl
instead:
sapply(keywords, grepl, strings, ignore.case=TRUE)
dog cat bird
[1,] TRUE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE FALSE
Update: This runs relatively quick on my system, even with a large number of keywords:
# Available on most *nix systems
words <- scan("/usr/share/dict/words", what="")
length(words)
[1] 234936
system.time(matches <- sapply(words, grepl, strings, ignore.case=TRUE))
user system elapsed
7.495 0.155 7.596
dim(matches)
[1] 3 234936
R: grep multiple patterns and switch result?
First, construct pattern string of state names
> pattern = "(Houston)|(San Antonio)|(San Diego)|(Phoenix)"
Then, use stringr::str_extract_all
or stringr::str_extract
to extract state names from string and add new column to data frame.
> stringr::str_extract("Houston is a good city.","(Houston)|(San Antonio)|(San Diego)|(Phoenix)")
[1] "Houston"
Then merge earlier data frames to obtain suitable data frame.
Related Topics
How to Specifically Order Ggplot2 X Axis Instead of Alphabetical Order
Show Percent % Instead of Counts in Charts of Categorical Variables
Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)
How to Set Limits For Axes in Ggplot2 R Plots
Error in ≪My Code≫: Object of Type 'Closure' Is Not Subsettable
Annotating Text on Individual Facet in Ggplot2
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
Reorder Bars in Geom_Bar Ggplot2 by Value
In R, How to Get an Object'S Name After It Is Sent to a Function
Apply a Function to Every Specified Column in a Data.Table and Update by Reference
Transform Year/Week to Date Object
Increasing (Or Decreasing) the Memory Available to R Processes
Replace Values in a Dataframe Based on Lookup Table
Predict() - Maybe I'M Not Understanding It
Replacing Character Values With Na in a Data Frame
Count Occurrences of Value in a Set of Variables in R (Per Row)