How to capture an arbitrary number of groups in JavaScript Regexp?
When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In some flavor, e.g. .NET, you can get all intermediate captures, but this is not the case with Javascript.
That is, in Javascript, if you have a pattern with N capturing groups, you can only capture exactly N strings per match, even if some of those groups were repeated.
So generally speaking, depending on what you need to do:
- If it's an option, split on delimiters instead
- Instead of matching
/(pattern)+/
, maybe match/pattern/g
, perhaps in anexec
loop- Do note that these two aren't exactly equivalent, but it may be an option
- Do multilevel matching:
- Capture the repeated group in one match
- Then run another regex to break that match apart
References
- regular-expressions.info/Repeating a Capturing Group vs Capturing a Repeating Group
- Javascript flavor notes
Example
Here's an example of matching <some;words;here>
in a text, using an exec
loop, and then splitting on ;
to get individual words (see also on ideone.com):
var text = "a;b;<c;d;e;f>;g;h;i;<no no no>;j;k;<xx;yy;zz>";
var r = /<(\w+(;\w+)*)>/g;
var match;
while ((match = r.exec(text)) != null) {
print(match[1].split(";"));
}
// c,d,e,f
// xx,yy,zz
The pattern used is:
_2__
/ \
<(\w+(;\w+)*)>
\__________/
1
This matches <word>
, <word;another>
, <word;another;please>
, etc. Group 2 is repeated to capture any number of words, but it can only keep the last capture. The entire list of words is captured by group 1; this string is then split
on the semicolon delimiter.
Related questions
- How do you access the matched groups in a javascript regex?
How to capture multiple repeated groups?
With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the +
quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.
You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).
Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:
^([A-Z]+),([A-Z]+),([A-Z]+)$
Arbitrary number of capture groups in multiline strings
You're trying to repeat a capturing group and then access all of the captures. Unfortunately, that won't work in the JavaScript regex engine (this is true for most of the others too). The .NET engine actually does support it.
I know you didn't want to split first, but that's probably the best option here. If you can somehow use the .NET regex engine from JS or change your project to use .NET/Powershell, then you can probably do it in pure regex.
Reference
Repeating a Capturing Group vs. Capturing a Repeated Group
JS Regex multiple capturing groups return all matches
You may not get arbitrary number of groups, their number is specified by the number of capturing groups in your pattern. You may instead match and capture the -
-separated values into 1 group and then split it with -
to get individual items and build the result dynamically:
var strs = ['dn2.33:sc-pts-tt-as3.43','dn2.33:sc3.43','dn2.33:sc-tt-as3.43'];var rx = /^[^:]+:([a-z]+(?:-[a-z]+)*)([\d.]+)$/; // Define the regexfor (var s of strs) { var res = []; // The resulting array variable var m = rx.exec(s); // Run the regex search if (m) { // If there is a match... res = m[1].split('-'); // Split Group 1 value with - and assign to res res.push(m[2]); // Add Group 2 value to the resulting array } console.log(s, "=>", res);}
Matching multiple regex groups in Javascript
Your regex is an example of how repeated capturing group works: (ab)+
only captures the last occurrence of ab
in an abababab
string.
In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g
based regex.
To validate the string you may use
/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/
See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow |
at the start / end of the string (but you may add \|*
after #
and before $
if you need that).
Details
^#
-#
at the start of the string[^|]+
- 1+ chars other than|
\|
- a|
[^|]+
- 1+ chars other than|
(?:\|[^|]+\|[^|]+)*
- 0+ sequences of\|
- a|
char[^|]+\|[^|]+
- 1+ chars other than|
,|
and again 1+ chars other than|
$
- end of string.
To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/
regex (the input will be the substring starting at Position 1).
Whole solution:
var s = "#something|somethingelse|morestuff|evenmorestuff";var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;var rx_extract = /([^|]+)\|([^|]+)/g;var m, result = [];if (rx_validate.test(s)) { while (m=rx_extract.exec(s.substr(1))) { result.push([m[1], m[2]]); }}console.log(result);// or just pairs as strings// console.log(s.substr(1).match(rx_extract));// => [ "something|somethingelse", "morestuff|evenmorestuff" ]
Capturing Arbitrary Multiple Groups with Regex
You have only 2 capture groups so you cannot get more that 2 groups in the result. You will have to run a loop to match all the repetitions
You may use this regex in while
loop to get all matches:
(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)
\G
asserts position at the end of the previous match or the start of the string for the first match, since we are using (?!^)
we force \G
to only match position at the end of the previous match
RegEx Demo
CODE DEMO
Code:
final String regex = "(?:([\\w/.-]+)\\h*=|(?!^)\\G,)\\h*((\"?)[^\",]*\\3)";
final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
if (matcher.group(1) != null)
System.out.println(matcher.group(1));
System.out.println("\t=> " + matcher.group(2));
}
Javascript regex: dynamic capture group
Try putting the \s+
into the optional group with *
:
/(MATCH)\s+(?:(X)\s)*(THIS)/g
Note the g
modifier to get all matches.
Regex to count the number of capturing groups in a regex
Modify your regex so that it will match an empty string, then match an empty string and see how many groups it returns:
var num_groups = (new RegExp(regex.toString() + '|')).exec('').length - 1;
Example: http://jsfiddle.net/EEn6G/
Regex Group Capture
Capture groups are provided in the match array starting at index 1:
var str = "<br><strong>Name:</strong> John Smith<br>";var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/gmatch = re.exec(str);while (match != null) { console.log(match[1]); // <==== match = re.exec(str);}
Related Topics
React Native - Image Require Module Using Dynamic Names
Why Are Callbacks from Promise '.Then' Methods an Anti-Pattern
How to Get Character Array from a String
Using Jquery's Ajax Method to Retrieve Images as a Blob
Is There a Null-Coalescing (Elvis) Operator or Safe Navigation Operator in JavaScript
Why Don't We Just Use Element Ids as Identifiers in JavaScript
What Is the Stability of the Array.Sort() Method in Different Browsers
Nested Routes with React Router V4/V5
Binding Arrow Keys in Js/Jquery
Center a Popup Window on Screen
How to Access a JavaScript Object Which Has Spaces in the Object's Key
How to Find Events Bound on an Element with Jquery
What Is the Meaning of "$" Sign in JavaScript
How to Getelementbyclass Instead of Getelementbyid with JavaScript
Maximum Size of an Array in JavaScript
Over_Query_Limit in Google Maps API V3: How to Pause/Delay in JavaScript to Slow It Down
What Is the Reason JavaScript Settimeout Is So Inaccurate
Navigator.Geolocation.Getcurrentposition Sometimes Works Sometimes Doesn'T