Grep Using a Character Vector With Multiple Patterns

grep in R using a character vector with multiple patterns with same order as vector

Try lapply:

unlist(lapply(to_match, grep, vectorA, value = TRUE))
## [1] "RuL_KZB8" "RuL_KZA9"

or

unlist(sapply(to_match, grep, vectorA, value = TRUE))
## KZB8 KZA9
## "RuL_KZB8" "RuL_KZA9"

R: grep exactly using a character vector with multiple patterns

You have to create all versions of the string. Only this ID, ID at the beginning, ID at the end and ID in the middle...

search_string <- paste0(c('^',';',';','^'), 
rep(t1, each=4),
c('$',';','$',';'), collapse='|')
candidates <- grep(search_string, t2)

R: grep multiple strings at once

Since these are exact matches use this where phrases is a character vector of the phrases you want to match:

match(phrases, df[, 1])

This also works provided no phrase is a substring of another phrase:

grep(phrases, df[, 1])

R's grepl() to find multiple strings exists

Text <- c("instance", "percentage", "n", 
"instance percentage", "percentage instance")

grepl("instance|percentage", Text)
# TRUE TRUE FALSE TRUE TRUE

grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE TRUE

The latter one works by looking for:

('instance')(any character sequence)('percentage')  
OR
('percentage')(any character sequence)('instance')

Naturally if you need to find any combination of more than two words, this will get pretty complicated. Then the solution mentioned in the comments would be easier to implement and read.

Another alternative that might be relevant when matching many words is to use positive look-ahead (can be thought of as a 'non-consuming' match). For this you have to activate perl regex.

# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
"character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))

# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
Text2, perl=TRUE)

# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) &
grepl("percentage", Text2) &
grepl("element", Text2) &
grepl("character", Text2)

# they produce identical results
identical(longperl, longstrd)

Furthermore, if you have the patterns stored in a vector you can condense the expressions significantly, giving you

pat <- c("instance", "percentage", "element", "character")

longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L

As asked for in the comments, if you want to match on exact words, i.e. not match on substrings, we can specify word boundaries using \\b. E.g:

tx <- c("cent element", "percentage element", "element cent", "element centimetre")

grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE TRUE FALSE

R search a character vector using a pattern vector

You could form a regex alternation, and then grep for that:

vec <- c("Fast.file1", "Fast.file2", "Med.file3", "Medium.file4", "Slow.file5")
checkAgainst <- c("Fast", "Medium", "Med")
regex <- paste(checkAgainst, collapse="|")
Fast_files <- vec[grep(regex, vec)]
Fast_files

[1] "Fast.file1" "Fast.file2" "Med.file3" "Medium.file4"

Matching multiple patterns

Yes, you can. The | in a grep pattern has the same meaning as or. So you can test for your pattern by using "001|100|000" as your pattern. At the same time, grep is vectorised, so all of this can be done in one step:

x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"

grep(pattern, x)
[1] 1 2 3

This returns an index of which of your vectors contained the matching pattern (in this case the first three.)

Sometimes it is more convenient to have a logical vector that tells you which of the elements in your vector were matched. Then you can use grepl:

grepl(pattern, x)
[1] TRUE TRUE TRUE FALSE

See ?regex for help about regular expressions in R.


Edit:
To avoid creating pattern manually we can use paste:

myValues <- c("001", "100", "000")
pattern <- paste(myValues, collapse = "|")

find names that match either of two patters

You can use :

pattern <- c("id", "group")
grep(paste0(pattern, collapse = '|'), names(a), value = TRUE)
#[1] "c-id" "g_igroups"

With grepl you can get logical value

grepl(paste0(pattern, collapse = '|'), names(a))
#[1] TRUE TRUE FALSE

A stringr solution :

stringr::str_subset(names(a), paste0(pattern, collapse = '|'))
#[1] "c-id" "g_igroups"

R grep: Match one string against multiple patterns

What about applying the regexpr function over a vector of keywords?

keywords <- c("dog", "cat", "bird")

strings <- c("Do you have a dog?", "My cat ate by bird.", "Let's get icecream!")

sapply(keywords, regexpr, strings, ignore.case=TRUE)

dog cat bird
[1,] 15 -1 -1
[2,] -1 4 15
[3,] -1 -1 -1

sapply(keywords, regexpr, strings[1], ignore.case=TRUE)

dog cat bird
15 -1 -1

Values returned are the position of the first character in the match, with -1 meaning no match.

If the position of the match is irrelevant, use grepl instead:

sapply(keywords, grepl, strings, ignore.case=TRUE)

dog cat bird
[1,] TRUE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE FALSE

Update: This runs relatively quick on my system, even with a large number of keywords:

# Available on most *nix systems
words <- scan("/usr/share/dict/words", what="")
length(words)
[1] 234936

system.time(matches <- sapply(words, grepl, strings, ignore.case=TRUE))

user system elapsed
7.495 0.155 7.596

dim(matches)
[1] 3 234936

R: grep multiple patterns and switch result?

First, construct pattern string of state names

> pattern = "(Houston)|(San Antonio)|(San Diego)|(Phoenix)"

Then, use stringr::str_extract_all or stringr::str_extract to extract state names from string and add new column to data frame.

> stringr::str_extract("Houston is a good city.","(Houston)|(San Antonio)|(San Diego)|(Phoenix)")
[1] "Houston"

Then merge earlier data frames to obtain suitable data frame.



Related Topics



Leave a reply



Submit