Match a string against multiple regex patterns
If you have just a few regexes, and they are all known at compile time, then this can be enough:
private static final Pattern
rx1 = Pattern.compile("..."),
rx2 = Pattern.compile("..."),
...;
return rx1.matcher(s).matches() || rx2.matcher(s).matches() || ...;
If there are more of them, or they are loaded at runtime, then use a list of patterns:
final List<Pattern> rxs = new ArrayList<>();
for (Pattern rx : rxs) if (rx.matcher(input).matches()) return true;
return false;
How to Match a String Against Multiple Regex Patterns in Java
You probably ment the regex to be
[A-Z][a-z]*|(?<=_)[a-z-]*
The first part being lowercase word starting with uppercase letter, or the second: lowercase word preceded by underscore.
The part of your posted regex (?<=_)[A-Za-z-]*
matches lower and upper case letters after underscore, i.e. does not stop matching when uppercase letter met, which should be in fact start of another word.
How to match a string against multiple regex?
For the regular expressions given in the question, you can use following regular expression using character class:
[admut]-
[admut]
will match any ofa
,d
,m
,u
,t
^
can be omitted becausere.match
matches only at the beginning of the string.- removed
-*
because it's pointless; only one-
is enough to check-
appear after thea/d/m/u/t
.
And instead of using array, you can use a dictionary; no need to remember indexes:
def countbycat(tempfilter):
count = dict.fromkeys('admut', 0)
pattern = re.compile("[admut]-")
for each in tempfilter:
if pattern.match(each):
count[each[0]] += 1
return count
Instead of dict.fromkeys
, you can use collections.Counter
.
Match strings with multiple regex patterns in javascript
Using this line in the loop if(!match.index || !regexObj.lastIndex) break;
will stop the loop when either of the statements in the if clause are true.
If either the match.index
or regexObj.lastIndex
is zero, this will be true and the loop will stop, and this will happen for example if there is a match for the first character as the index will be 0.
You can also switch the order of the patterns, putting the most specific one first. Because the first char of the email will also be matched by [a-z]
so the email will otherwise not be matched.
Note to omit the anchors ^
and $
from the email or else the email will only match if it is the only string.
let str = "I am abinas patra and my email is abinas@gmail.com"
let patterns = [
"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}",
"[a-z]"
];
let regexObj = new RegExp(patterns.join("|"), "gmi");
let match, indicesArr = [];
while ((match = regexObj.exec(str))) {
let obj = {
start: match.index,
end: regexObj.lastIndex
}
indicesArr.push(obj);
}
console.log(indicesArr)
Java how to check multiple regex patterns against an input?
To collect the matched string in the result you may need to create a group in your regexp if you are matching less than the entire string:
List<Pattern> patterns = new ArrayList<>();
patterns.add(Pattern.compile("(TST\\w+)");
...
Optional<String> result = Optional.empty();
for (Pattern pattern: patterns) {
Matcher matcher = pattern.match();
if (matcher.matches()) {
result = Optional.of(matcher.group(1));
break;
}
}
Or, if you are familiar with streams:
Optional<String> result = patterns.stream()
.map(Pattern::match).filter(Matcher::matches)
.map(m -> m.group(1)).findFirst();
The alternative is to use find
(as in @Raffaele's answer) that implicitly creates a group.
Another alternative you may want to consider is to put all your matches into a single pattern.
Pattern pattern = Pattern.compile("(TST\\w+|TWT\\w+|...");
Then you can match and group in a single operation. However this might might it harder to change the matches over time.
Group 1 is the first matched group (i.e. the match inside the first set of parentheses). Group 0 is the entire match. So if you want the entire match (I wasn't sure from your question) then you could perhaps use group 0.
R grep: Match one string against multiple patterns
What about applying the regexpr function over a vector of keywords?
keywords <- c("dog", "cat", "bird")
strings <- c("Do you have a dog?", "My cat ate by bird.", "Let's get icecream!")
sapply(keywords, regexpr, strings, ignore.case=TRUE)
dog cat bird
[1,] 15 -1 -1
[2,] -1 4 15
[3,] -1 -1 -1
sapply(keywords, regexpr, strings[1], ignore.case=TRUE)
dog cat bird
15 -1 -1
Values returned are the position of the first character in the match, with -1
meaning no match.
If the position of the match is irrelevant, use grepl
instead:
sapply(keywords, grepl, strings, ignore.case=TRUE)
dog cat bird
[1,] TRUE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE FALSE
Update: This runs relatively quick on my system, even with a large number of keywords:
# Available on most *nix systems
words <- scan("/usr/share/dict/words", what="")
length(words)
[1] 234936
system.time(matches <- sapply(words, grepl, strings, ignore.case=TRUE))
user system elapsed
7.495 0.155 7.596
dim(matches)
[1] 3 234936
RegEx match multiple string conditions in AND operator
You can use a single positive lookahead to make sure that key1 is present with at least an occurrence of d
and a digit 1-4.
Then you can use another lookahead to assert one of key 1, key2 or key3 with the allowed digits.
Note that you can shorten the alternations |
for (a1|a2)
to a character class a[12]
^(?=.*key1=[a-z0-9,]*d[1-4])(?=.*(?:key1=a[12]|key2=b[123]|key3=c[123])).+
Regex demo
The pattern matches:
^
Start of string(?=
Positive lookahead.*key1=[a-z0-9,]*d[1-4]
Match key1= having a value ofd1
d2
d3
d4
by optionally matching the allowed characters that precede it[a-z0-9,]*
)
Close lookahead(?=.*
Positive lookahead, assert what is at the right is(?:
Non capture group to list the alternativeskey1=a[12]
Matchkey1=a1
orkey1=a2
|
Orkey2=b[123]
Match key2 with the allowed values|
Orkey3=c[123]
Match key3 with the allowed values
)
Close non capture group
)
Close positive lookahead.+
Match 1 or more characters
Efficiently querying one string against multiple regexes
Martin Sulzmann Has done quite a bit of work in this field.
He has a HackageDB project explained breifly here which use partial derivatives seems to be tailor made for this.
The language used is Haskell and thus will be very hard to translate to a non functional language if that is the desire (I would think translation to many other FP languages would still be quite hard).
The code is not based on converting to a series of automata and then combining them, instead it is based on symbolic manipulation of the regexes themselves.
Also the code is very much experimental and Martin is no longer a professor but is in 'gainful employment'(1) so may be uninterested/unable to supply any help or input.
- this is a joke - I like professors, the less the smart ones try to work the more chance I have of getting paid!
How to match pattern against multiple strings and store in data.frame in R
Using outer
and Vectorize
d grepl
.
r <- sapply(dat2[-1], \(x) +outer(dat1$pattern, x, Vectorize(grepl)))
cbind(dat1[rep(seq_len(nrow(dat1)), each=nrow(dat2)), ], id2=dat2$id2, r)
# id1 pattern id2 description description2
# 1 1 apple 1174 1 0
# 1.1 1 apple 1231 0 0
# 2 1 applejack 1174 0 0
# 2.1 1 applejack 1231 0 0
# 3 2 bananas, sweet 1174 0 0
# 3.1 2 bananas, sweet 1231 0 1
Related Topics
Using Send_File to Download a File from Amazon S3
How to Create a Class Instance from a String Name in Ruby
How to 'Bundle Install' When Your Gemfile Requires an Older Version of Bundler
Gem Install Permission Problem
In Ruby, How Does Coerce() Actually Work
Using God to Monitor Unicorn - Start Exited with Non-Zero Code = 1
Why Is Sudo: Bundle Command Not Found
Execute a Sudo Command in Ruby on Rails App
How to Prompt for a Sudo Password Using Ruby
Ruby: How to Copy a Variable Without Pointing to the Same Object
Ruby Run Shell Command in a Specific Directory
How to Redirect Stderr and Stdout to File For a Ruby Script
How to Edit or Write on Existing Pdf With Ruby
Convert Unicode Codepoint to String Character in Ruby