Java Regex: Repeating Capturing Groups

Java regex: Repeating capturing groups

That's right. You can't have a "variable" number of capturing groups in a Java regular expression. Your Pattern has two groups:

\((.+?)\)(?:,\((.+?)\))*
|___| |___|
group 1 group 2

Each group will contain the content of the last match for that group. I.e., abc,12 will get overridden by 30,asdf,2.

Related question:

  • Regular expression with variable number of groups?

The solution is to use one expression (something like \((.+?)\)) and use matcher.find to iterate over the matches.

Java regex repeating capture groups

Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.

If you load your regex at regex101, you will see it matches

  • \$\{
  • ( - start of a capturing group

    • (?: - start of a non-capturing group

      • (?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)

        • (?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
      • )+ - repeat the non-capturing group 1 or more times
      • | - or
      • (?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
      • )+ - repeat the group pattern 1+ times
    • ) - end of non-capturing group
  • }+ - 1+ } chars
  • $ - end of string.

To match any occurrence of your pattern inside a string, you need to use

\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}

See the regex demo, get Group 1 value after a match is found. Details:

  • \$\{ - a ${ substring
  • (\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:

    • \"[^\"]*\" - ", 0+ chars other than " and then a "
    • | - or
    • \w+(?:\(\))? - 1+ word chars and an optional () substring
    • (?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
  • } - a } char.

See the Java demo:

String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}

Output:

test.one
test.two
test.one
test.two()
test.one
"hello"

java regex - capture repeated groups

Java does not allow you to access the individual matches of a repeated capturing group. For more information look at this question: Regular Expression - Capturing all repeating groups

The code provided by Tim Pietzcker can help you as well. If you rework it a bit and add a special case for the first number you can use something like this:

String target = "31,5,46,7,86";

Pattern compileFirst = Pattern.compile("(?<number>[0-9]+)(,([0-9])+)*");
Pattern compileFollowing = Pattern.compile(",(?<number>[0-9]+)");

Matcher matcherFirst = compileFirst.matcher(target);
Matcher matcherFollowing = compileFollowing.matcher(target);

System.out.println("matches: " + matcherFirst.matches());
System.out.println("first: " + matcherFirst.group("number"));

int start = 0;
while (matcherFollowing.find(start)) {
String group = matcherFollowing.group("number");

System.out.println("following: " + start + " - " + group);
start = matcherFollowing.end();
}

This outputs:

matches: true
first: 31
following: 0 - 5
following: 4 - 46
following: 7 - 7
following: 9 - 86

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

Regex capture group within repetition

Notice groups 2-5 match only the last (4th) repetition. Why aren't the first 3 repetitions matched? How can I extract all 4 integers from each repetition?

I am confident that you will find that group 1 also matches only the fourth repetition. Group 0, on the other hand, will always correspond to the entire match -- I suspect that's what you saw.

This behavior is documented in the API docs for java.util.regex.Pattern:

Capturing groups are numbered by counting their opening parentheses from left to right.

[...]

Group zero always stands for the entire expression.

[...]

The captured input associated with a group is always the subsequence that the group most recently matched.

That's all quite standard across different regex implementations.

Instead of capturing it all at once, you could process the String one piece at a time by means of Matcher.find() and / or Matcher.lookingAt(), using a pattern that corresponds to exactly one of the repeat units. After each successful match, extract and store the captured groups for that match.

Regular Expression - Capturing all repeating groups

You're right; most regex flavors, Java included, do not allow access to individual matches of a repeated capturing group. (Perl 6 and .NET do allow this, for the record, but that's not helping you).

What else can you do?

Pattern regex = Pattern.compile("@[^@]+@");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

That will capture @property.one@, @property.two@ etc. one by one.

How do you reference repeated nested capture groups?

It is impossible to reference repeated capture groups other than the final in sequence; therefore if you want to modify each capture group, as in this situation, you must apply multiple regexes in sequence:

Step 1: Copy arguments list into position (https://regex101.com/r/uE7aA1/2)

pattern: (public void (\w+\((?:(?:final )?\w+ \w+(?:, )?)*\))) \{(?:.|\n)*?\n    \}
replacement: $1 {\n addAction(BasicActions.$2);\n }
output:
public void jumpTo(final double x, double y) {
addAction(BasicActions.jumpTo(final double x, double y));
}

Step 2: Remove final

pattern:final #note the space
replacement:
output:
public void jumpTo(double x, double y) {
addAction(BasicActions.jumpTo(double x, double y));
}

Step 3: Remove type keywords (https://regex101.com/r/kC0nA3/3)

use lookahead to match any argument without passing over other arguments
pattern: \w+ (\w+)(?=(, \w+ \w+)*\)\);\n })
replacement: $1
output:
public void jumpTo(double x, double y) {
addAction(BasicActions.jumpTo(x, y));
}

Regex capturing group with many repeating white spaces

You can use

\w+:.*?(?=\s*\w+:|$)

See the regex demo.

Details:

  • \w+ - one or more word chars
  • : - a colon
  • .*? - any zero or more chars other than line break chars, as few as possible
  • (?=\s*\w+:|$) - a positive lookahead that requires zero or more whitespaces, one or more word chars and a colon, or end of string, immediately to the right of the current location.


Related Topics



Leave a reply



Submit