Match a String Against Multiple Patterns

Match a string against multiple regex patterns

If you have just a few regexes, and they are all known at compile time, then this can be enough:

private static final Pattern
rx1 = Pattern.compile("..."),
rx2 = Pattern.compile("..."),
...;

return rx1.matcher(s).matches() || rx2.matcher(s).matches() || ...;

If there are more of them, or they are loaded at runtime, then use a list of patterns:

final List<Pattern> rxs = new ArrayList<>();


for (Pattern rx : rxs) if (rx.matcher(input).matches()) return true;
return false;

How to Match a String Against Multiple Regex Patterns in Java

You probably ment the regex to be

[A-Z][a-z]*|(?<=_)[a-z-]*

The first part being lowercase word starting with uppercase letter, or the second: lowercase word preceded by underscore.

The part of your posted regex (?<=_)[A-Za-z-]* matches lower and upper case letters after underscore, i.e. does not stop matching when uppercase letter met, which should be in fact start of another word.

How to match a string against multiple regex?

For the regular expressions given in the question, you can use following regular expression using character class:

[admut]-
  • [admut] will match any of a, d, m, u, t
  • ^ can be omitted because re.match matches only at the beginning of the string.
  • removed -* because it's pointless; only one - is enough to check - appear after the a/d/m/u/t.

And instead of using array, you can use a dictionary; no need to remember indexes:

def countbycat(tempfilter):
count = dict.fromkeys('admut', 0)
pattern = re.compile("[admut]-")
for each in tempfilter:
if pattern.match(each):
count[each[0]] += 1
return count

Instead of dict.fromkeys, you can use collections.Counter.

Match strings with multiple regex patterns in javascript

Using this line in the loop if(!match.index || !regexObj.lastIndex) break; will stop the loop when either of the statements in the if clause are true.

If either the match.index or regexObj.lastIndex is zero, this will be true and the loop will stop, and this will happen for example if there is a match for the first character as the index will be 0.

You can also switch the order of the patterns, putting the most specific one first. Because the first char of the email will also be matched by [a-z] so the email will otherwise not be matched.

Note to omit the anchors ^ and $ from the email or else the email will only match if it is the only string.

let str = "I am abinas patra and my email is abinas@gmail.com"
let patterns = [
"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}",
"[a-z]"
];
let regexObj = new RegExp(patterns.join("|"), "gmi");
let match, indicesArr = [];
while ((match = regexObj.exec(str))) {
let obj = {
start: match.index,
end: regexObj.lastIndex
}
indicesArr.push(obj);
}
console.log(indicesArr)

Java how to check multiple regex patterns against an input?

To collect the matched string in the result you may need to create a group in your regexp if you are matching less than the entire string:

List<Pattern> patterns = new ArrayList<>();
patterns.add(Pattern.compile("(TST\\w+)");
...

Optional<String> result = Optional.empty();
for (Pattern pattern: patterns) {
Matcher matcher = pattern.match();
if (matcher.matches()) {
result = Optional.of(matcher.group(1));
break;
}
}

Or, if you are familiar with streams:

Optional<String> result = patterns.stream()
.map(Pattern::match).filter(Matcher::matches)
.map(m -> m.group(1)).findFirst();

The alternative is to use find (as in @Raffaele's answer) that implicitly creates a group.

Another alternative you may want to consider is to put all your matches into a single pattern.

Pattern pattern = Pattern.compile("(TST\\w+|TWT\\w+|...");

Then you can match and group in a single operation. However this might might it harder to change the matches over time.

Group 1 is the first matched group (i.e. the match inside the first set of parentheses). Group 0 is the entire match. So if you want the entire match (I wasn't sure from your question) then you could perhaps use group 0.

R grep: Match one string against multiple patterns

What about applying the regexpr function over a vector of keywords?

keywords <- c("dog", "cat", "bird")

strings <- c("Do you have a dog?", "My cat ate by bird.", "Let's get icecream!")

sapply(keywords, regexpr, strings, ignore.case=TRUE)

dog cat bird
[1,] 15 -1 -1
[2,] -1 4 15
[3,] -1 -1 -1

sapply(keywords, regexpr, strings[1], ignore.case=TRUE)

dog cat bird
15 -1 -1

Values returned are the position of the first character in the match, with -1 meaning no match.

If the position of the match is irrelevant, use grepl instead:

sapply(keywords, grepl, strings, ignore.case=TRUE)

dog cat bird
[1,] TRUE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE FALSE

Update: This runs relatively quick on my system, even with a large number of keywords:

# Available on most *nix systems
words <- scan("/usr/share/dict/words", what="")
length(words)
[1] 234936

system.time(matches <- sapply(words, grepl, strings, ignore.case=TRUE))

user system elapsed
7.495 0.155 7.596

dim(matches)
[1] 3 234936

RegEx match multiple string conditions in AND operator

You can use a single positive lookahead to make sure that key1 is present with at least an occurrence of d and a digit 1-4.

Then you can use another lookahead to assert one of key 1, key2 or key3 with the allowed digits.

Note that you can shorten the alternations | for (a1|a2) to a character class a[12]

^(?=.*key1=[a-z0-9,]*d[1-4])(?=.*(?:key1=a[12]|key2=b[123]|key3=c[123])).+

Regex demo

The pattern matches:

  • ^ Start of string
  • (?= Positive lookahead
    • .*key1=[a-z0-9,]*d[1-4] Match key1= having a value of d1 d2 d3 d4 by optionally matching the allowed characters that precede it [a-z0-9,]*
  • ) Close lookahead
  • (?=.* Positive lookahead, assert what is at the right is
    • (?: Non capture group to list the alternatives
      • key1=a[12] Match key1=a1 or key1=a2
      • | Or
      • key2=b[123] Match key2 with the allowed values
      • | Or
      • key3=c[123] Match key3 with the allowed values
    • ) Close non capture group
  • ) Close positive lookahead
  • .+ Match 1 or more characters

Efficiently querying one string against multiple regexes

Martin Sulzmann Has done quite a bit of work in this field.
He has a HackageDB project explained breifly here which use partial derivatives seems to be tailor made for this.

The language used is Haskell and thus will be very hard to translate to a non functional language if that is the desire (I would think translation to many other FP languages would still be quite hard).

The code is not based on converting to a series of automata and then combining them, instead it is based on symbolic manipulation of the regexes themselves.

Also the code is very much experimental and Martin is no longer a professor but is in 'gainful employment'(1) so may be uninterested/unable to supply any help or input.


  1. this is a joke - I like professors, the less the smart ones try to work the more chance I have of getting paid!

How to match pattern against multiple strings and store in data.frame in R

Using outer and Vectorized grepl.

r <- sapply(dat2[-1], \(x) +outer(dat1$pattern, x, Vectorize(grepl)))
cbind(dat1[rep(seq_len(nrow(dat1)), each=nrow(dat2)), ], id2=dat2$id2, r)
# id1 pattern id2 description description2
# 1 1 apple 1174 1 0
# 1.1 1 apple 1231 0 0
# 2 1 applejack 1174 0 0
# 2.1 1 applejack 1231 0 0
# 3 2 bananas, sweet 1174 0 0
# 3.1 2 bananas, sweet 1231 0 1


Related Topics



Leave a reply



Submit