How to Capture an Arbitrary Number of Groups in JavaScript Regexp

How to capture an arbitrary number of groups in JavaScript Regexp?

When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In some flavor, e.g. .NET, you can get all intermediate captures, but this is not the case with Javascript.

That is, in Javascript, if you have a pattern with N capturing groups, you can only capture exactly N strings per match, even if some of those groups were repeated.

So generally speaking, depending on what you need to do:

  • If it's an option, split on delimiters instead
  • Instead of matching /(pattern)+/, maybe match /pattern/g, perhaps in an exec loop

    • Do note that these two aren't exactly equivalent, but it may be an option
  • Do multilevel matching:

    • Capture the repeated group in one match
    • Then run another regex to break that match apart

References

  • regular-expressions.info/Repeating a Capturing Group vs Capturing a Repeating Group

    • Javascript flavor notes

Example

Here's an example of matching <some;words;here> in a text, using an exec loop, and then splitting on ; to get individual words (see also on ideone.com):

var text = "a;b;<c;d;e;f>;g;h;i;<no no no>;j;k;<xx;yy;zz>";

var r = /<(\w+(;\w+)*)>/g;

var match;
while ((match = r.exec(text)) != null) {
print(match[1].split(";"));
}
// c,d,e,f
// xx,yy,zz

The pattern used is:

      _2__
/ \
<(\w+(;\w+)*)>
\__________/
1

This matches <word>, <word;another>, <word;another;please>, etc. Group 2 is repeated to capture any number of words, but it can only keep the last capture. The entire list of words is captured by group 1; this string is then split on the semicolon delimiter.

Related questions

  • How do you access the matched groups in a javascript regex?

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

Arbitrary number of capture groups in multiline strings

You're trying to repeat a capturing group and then access all of the captures. Unfortunately, that won't work in the JavaScript regex engine (this is true for most of the others too). The .NET engine actually does support it.

I know you didn't want to split first, but that's probably the best option here. If you can somehow use the .NET regex engine from JS or change your project to use .NET/Powershell, then you can probably do it in pure regex.

Reference

Repeating a Capturing Group vs. Capturing a Repeated Group

JS Regex multiple capturing groups return all matches

You may not get arbitrary number of groups, their number is specified by the number of capturing groups in your pattern. You may instead match and capture the --separated values into 1 group and then split it with - to get individual items and build the result dynamically:

var strs = ['dn2.33:sc-pts-tt-as3.43','dn2.33:sc3.43','dn2.33:sc-tt-as3.43'];var rx = /^[^:]+:([a-z]+(?:-[a-z]+)*)([\d.]+)$/; // Define the regexfor (var s of strs) {  var res = [];             // The resulting array variable  var m = rx.exec(s);       // Run the regex search  if (m) {                  // If there is a match...    res = m[1].split('-');  // Split Group 1 value with - and assign to res    res.push(m[2]);         // Add Group 2 value to the resulting array  }  console.log(s, "=>", res);}

Matching multiple regex groups in Javascript

Your regex is an example of how repeated capturing group works: (ab)+ only captures the last occurrence of ab in an abababab string.

In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g based regex.

To validate the string you may use

/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/

See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow | at the start / end of the string (but you may add \|* after # and before $ if you need that).

Details

  • ^# - # at the start of the string
  • [^|]+ - 1+ chars other than |
  • \| - a |
  • [^|]+ - 1+ chars other than |
  • (?:\|[^|]+\|[^|]+)* - 0+ sequences of

    • \| - a | char
    • [^|]+\|[^|]+ - 1+ chars other than |, | and again 1+ chars other than |
  • $ - end of string.

To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/ regex (the input will be the substring starting at Position 1).

Whole solution:

var s = "#something|somethingelse|morestuff|evenmorestuff";var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;var rx_extract = /([^|]+)\|([^|]+)/g;var m, result = [];if (rx_validate.test(s)) {  while (m=rx_extract.exec(s.substr(1))) {    result.push([m[1], m[2]]);  }}console.log(result);// or just pairs as strings// console.log(s.substr(1).match(rx_extract));// => [ "something|somethingelse",  "morestuff|evenmorestuff" ]

Capturing Arbitrary Multiple Groups with Regex

You have only 2 capture groups so you cannot get more that 2 groups in the result. You will have to run a loop to match all the repetitions

You may use this regex in while loop to get all matches:

(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)

\G asserts position at the end of the previous match or the start of the string for the first match, since we are using (?!^) we force \G to only match position at the end of the previous match

RegEx Demo

CODE DEMO

Code:

final String regex = "(?:([\\w/.-]+)\\h*=|(?!^)\\G,)\\h*((\"?)[^\",]*\\3)";
final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
if (matcher.group(1) != null)
System.out.println(matcher.group(1));
System.out.println("\t=> " + matcher.group(2));
}

Javascript regex: dynamic capture group

Try putting the \s+ into the optional group with *:

/(MATCH)\s+(?:(X)\s)*(THIS)/g

Note the g modifier to get all matches.

Regex to count the number of capturing groups in a regex

Modify your regex so that it will match an empty string, then match an empty string and see how many groups it returns:

var num_groups = (new RegExp(regex.toString() + '|')).exec('').length - 1;

Example: http://jsfiddle.net/EEn6G/

Regex Group Capture

Capture groups are provided in the match array starting at index 1:

var str = "<br><strong>Name:</strong> John Smith<br>";var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/gmatch = re.exec(str);while (match != null) {    console.log(match[1]); // <====    match = re.exec(str);}


Related Topics



Leave a reply



Submit