R's grepl() to find multiple strings exists
Text <- c("instance", "percentage", "n",
"instance percentage", "percentage instance")
grepl("instance|percentage", Text)
# TRUE TRUE FALSE TRUE TRUE
grepl("instance.*percentage|percentage.*instance", Text)
# FALSE FALSE FALSE TRUE TRUE
The latter one works by looking for:
('instance')(any character sequence)('percentage')
OR
('percentage')(any character sequence)('instance')
Naturally if you need to find any combination of more than two words, this will get pretty complicated. Then the solution mentioned in the comments would be easier to implement and read.
Another alternative that might be relevant when matching many words is to use positive look-ahead (can be thought of as a 'non-consuming' match). For this you have to activate perl
regex.
# create a vector of word combinations
set.seed(1)
words <- c("instance", "percentage", "element",
"character", "n", "o", "p")
Text2 <- replicate(10, paste(sample(words, 5), collapse=" "))
# grepl with multiple positive look-ahead
longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)",
Text2, perl=TRUE)
# this is equivalent to the solution proposed in the comments
longstrd <- grepl("instance", Text2) &
grepl("percentage", Text2) &
grepl("element", Text2) &
grepl("character", Text2)
# they produce identical results
identical(longperl, longstrd)
Furthermore, if you have the patterns stored in a vector you can condense the expressions significantly, giving you
pat <- c("instance", "percentage", "element", "character")
longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE)
longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L
As asked for in the comments, if you want to match on exact words, i.e. not match on substrings, we can specify word boundaries using \\b
. E.g:
tx <- c("cent element", "percentage element", "element cent", "element centimetre")
grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE)
# TRUE FALSE TRUE FALSE
grepl("element", tx) & grepl("\\bcent\\b", tx)
# TRUE FALSE TRUE FALSE
Function to search of multiple patterns using grep
I think you can do a recursive function:
search() {
if [ $# -gt 0 ]; then
local pat=$1
shift
grep "$pat" | search "$@"
else
cat
fi
}
In your script you would call this function and pass the search patterns as arguments. Say that $1
is the file and the rest of the arguments are patterns then you would do
file=$1
shift
cat "$file" | search "$@"
Match two strings in one line with grep
You can use
grep 'string1' filename | grep 'string2'
Or
grep 'string1.*string2\|string2.*string1' filename
Grep multiple patterns and print pattern and its previous lines
You may try this awk
:
awk '/^(in|out)put/{print p; print} /^(cell|function|Type)/; {p = $0}' file
cell A
function (A1A2)
Type combinational
(CO)
output
(A1)
input
(A2)
input
cell X
function ((A1+A2)B)
Type combinational
(Z)
output
(A1)
input
(A2)
input
(B)
input
grep multiple patterns Or condition on grouping data
I used dplyr
grep
to get the desired result.
Below is the code:
library(dplyr)
pattern <- c("Beach", "sand", "warm")
df <- data.frame(group_id= c(1, 1, 1, 1, 2, 1, 2, 3, 4),
words = c("beach", "sand", "trip", "warm","travel", "water","beach","sand", "trees"),
ID = c("vacation", "vacation", "vacation", "vacation", "meeting","vacation","meeting","onduty", "hiking"))
x <- df %>%
group_by(group_id) %>%
summarise(words = paste(words, collapse = " "))
y <- sapply(pattern, function(d) grep(paste0("\\b",d,"\\b"),x$words , ignore.case = T))
y <- setNames(unlist(y, use.names=F),rep(names(y), lengths(y)))
y <- data.frame(Match_pattern =names(y), group_id=y, row.names=NULL)
y <- y %>%
group_by(group_id) %>%
summarise(Match_pattern = paste(Match_pattern, collapse = ", "))
out <- merge(df, y, by = "group_id", all.x = T)
out$N <- ifelse(is.na(out$Match_pattern), 0, 1)
> out
group_id words ID Match_pattern N
1 1 sand vacation Beach, sand, warm 1
2 1 trip vacation Beach, sand, warm 1
3 1 warm vacation Beach, sand, warm 1
4 1 beach vacation Beach, sand, warm 1
5 1 water vacation Beach, sand, warm 1
6 2 beach meeting Beach 1
7 2 travel meeting Beach 1
8 3 sand onduty sand 1
9 4 trees hiking <NA> 0
Use grepl to search either of multiple substrings in a text
You could paste the genres together with an "or" |
separator and run that through grepl
as a single regular expression.
x <- c("Action", "Adventure", "Animation", ...)
grepl(paste(x, collapse = "|"), my_text)
Here's an example.
x <- c("Action", "Adventure", "Animation")
my_text <- c("This one has Animation.", "This has none.", "Here is Adventure.")
grepl(paste(x, collapse = "|"), my_text)
# [1] TRUE FALSE TRUE
Related Topics
How to Use If/Else Awk to Evaluate a File and Extract This Information
What's The Relation Between 32/64-Bit Application, Os and Processor
Eclipse-Mars on Linux: Black Background Color in Tooltip's
Replace a String in a File with Contents Copied from Another File
How to Find/Cut for Only The Filename from an Output of Ls -Lrt in Perl
How to Remove Space/Tab from Command Output
Kill Background Process on Sigint
Set Preferred Listen Address in Weblogic 11G
Golang Math Can Not Finished with My Code, But Python Is Ok
Bash Separate Parameters with Specific Delimiter
Please Help Me "Binary Operator Expected in Cygwin"
How to Configure/Make/Install Against an Older Version of a Library
How to Make Webdriver Testsuite Created in Windows Machine to Run in a Linux Box
Centos Cgconfig Fails to Start
Bus Error Opening and Mmap'Ing a File
Having an Issue Passing Variables to Subshell