Regex: Repeated Capturing Groups

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

Regex - Repeating Capturing Group

Regex doesn't support what you're trying to do. When the engine enters the capturing group a second time, it overwrites what it had captured the first time. Consider a simple example (thanks regular-expressions.info): /(abc|123)+/ used on 'abc123'. It will match "abc" then see the plus and try again, matching the "123". The final capturing group in the output will be "123".

This happens no matter what pattern you try and any limitation you set simply changes when the regex will accept the string. Consider /(abc|123){2}/. This accepts 'abc123' with the capturing group as "123" but not 'abc123abc'. Putting a capturing group inside another doesn't work either. When you create a capturing group, it's like creating a variable. It can only have one value and subsequent values overwrite the previous one. You'll never be able to have more capturing groups than you have parentheses pairs (you can definitely have fewer, though).

A possible fix then would be to split the string on ';', then each of those on '=', then the right-hand side of those on ','. That would get you [['id', '1', '2'], ['name', 'user1', ...], ['city', ...], ['zip', ...]].

That comes out to be:

function (str) {
var afterSplit = str.split(';|:');
afterSplit.pop() // final semicolon creates empty string
for (var i = 0; i < afterSplit.length; i++) {
afterSplit[i] = afterSplit[i].split('=');
afterSplit[i][1] = afterSplit[i][1].split(','); // optionally, you can flatten the array from here to get something nicer
}
return afterSplit;
}

Regex match capture group multiple times

Use

(?:x|(?<!\A)\G).*?\Kb(?=.*z)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
x 'x'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator (omits matched text)
--------------------------------------------------------------------------------
b 'b'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except line breaks (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
z 'z'
--------------------------------------------------------------------------------
) end of look-ahead

JavaScript - Capture repeated group

Use

(%n)(?:(:\d+)(\+\d+)?)?

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
%n '%n'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
)? end of grouping

Java regex repeating capture groups

Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.

If you load your regex at regex101, you will see it matches

  • \$\{
  • ( - start of a capturing group

    • (?: - start of a non-capturing group

      • (?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)

        • (?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
      • )+ - repeat the non-capturing group 1 or more times
      • | - or
      • (?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
      • )+ - repeat the group pattern 1+ times
    • ) - end of non-capturing group
  • }+ - 1+ } chars
  • $ - end of string.

To match any occurrence of your pattern inside a string, you need to use

\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}

See the regex demo, get Group 1 value after a match is found. Details:

  • \$\{ - a ${ substring
  • (\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:

    • \"[^\"]*\" - ", 0+ chars other than " and then a "
    • | - or
    • \w+(?:\(\))? - 1+ word chars and an optional () substring
    • (?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
  • } - a } char.

See the Java demo:

String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}

Output:

test.one
test.two
test.one
test.two()
test.one
"hello"

JS Regex multiple capturing groups return all matches

You may not get arbitrary number of groups, their number is specified by the number of capturing groups in your pattern. You may instead match and capture the --separated values into 1 group and then split it with - to get individual items and build the result dynamically:

var strs = ['dn2.33:sc-pts-tt-as3.43','dn2.33:sc3.43','dn2.33:sc-tt-as3.43'];var rx = /^[^:]+:([a-z]+(?:-[a-z]+)*)([\d.]+)$/; // Define the regexfor (var s of strs) {  var res = [];             // The resulting array variable  var m = rx.exec(s);       // Run the regex search  if (m) {                  // If there is a match...    res = m[1].split('-');  // Split Group 1 value with - and assign to res    res.push(m[2]);         // Add Group 2 value to the resulting array  }  console.log(s, "=>", res);}

Multiple capturing groups within non-capturing group using Python regexes

You are close.

To get the capture always as group 1 can use a lookahead to do the match and then a separate capturing group to capture:

(?:a (?=[ac]+)|b (?=[bd]+))(.*)

Demo

Or in Python3:

>>> regex=r'(?:a (?=[ac]+)|b (?=[bd]+))(.*)'
>>> (?:a (?=[ac]+)|b (?=[bd]+))(.*)
>>> re.match(regex, 'a caca').groups()
('caca',)
>>> re.match(regex, 'b bdbd').groups()
('bdbd',)


Related Topics



Leave a reply



Submit