How to Access the Matched Groups in a JavaScript Regular Expression

How do you access the matched groups in a JavaScript regular expression?

You can access capturing groups like this:

var myString = "something format_abc";
var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g");
var match = myRegexp.exec(myString);
console.log(match[1]); // abc

JavaScript Regex Global Match Groups

To do this with a regex, you will need to iterate over it with .exec() in order to get multiple matched groups. The g flag with match will only return multiple whole matches, not multiple sub-matches like you wanted. Here's a way to do it with .exec().

var input = "'Warehouse','Local Release','Local Release DA'";
var regex = /'(.*?)'/g;

var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}
// result is in output here

Working demo: http://jsfiddle.net/jfriend00/VSczR/


With certain assumptions about what's in the strings, you could also just use this:

var input = "'Warehouse','Local Release','Local Release DA'";
var output = input.replace(/^'|'$/, "").split("','");

Working demo: http://jsfiddle.net/jfriend00/MFNm3/


Note: With modern Javascript engines as of 2021, you can use str.matchAll(regex) and get all matches in one function call.

Javascript global match with capturing groups

As per MDN docs :

If the regular expression does not include the g flag, returns the same result as RegExp.exec(). The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.

If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.


If you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.

var myRe = /(\d)(\d)/g;
var str = '12 34';
var myArray;
while (myArray = myRe.exec(str)) {
console.log(myArray);
}

Regex Group Capture

Capture groups are provided in the match array starting at index 1:

var str = "<br><strong>Name:</strong> John Smith<br>";

var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g

match = re.exec(str);

while (match != null) {

console.log(match[1]); // <====

match = re.exec(str);

}

how can I get matched groups from regex match function?

Instead of codes.match(regex) use regex.exec(codes) in a loop: then you will have the captured text for each group.

Example for one input:

const codes = `

require('babel-polyfill');

require('child-process-promise');

require('fs-extra');

require('chalk');

require('ora');

require('querystring');

`;

const regex = /(?:from |require\()'([^/.][^/]*?)'/g,

arr = [];

let match;

while (match = regex.exec(codes)) arr.push(match[1]);

console.log("arr", arr)

Regex expression to match and store matched groups in an array

You may use this regex to build your output:

/!([^|!]*(?:jpe?g|png|gif|pdf|xlx))(?:\|width=(\d*\.?\d+%?)(?:,height=(\d*\.?\d+%?))?)?!/g

Updated RegEx Demo

Code and Demo:

let str = `!img2.png|width=83.33333333333334%!

!robot (f05f0216-caf4-4543-a630-99c2477849d5).png|width=400,height=400!

fefeeef !abc.pdf|width=200!

!dfe.xlx !abcd.xlx!`;

var re = /!([^|!]*(?:jpe?g|png|gif|pdf|xlx))(?:\|width=(\d*\.?\d+%?)(?:,height=(\d*\.?\d+%?))?)?!/g;

let m;
let imageConfig = [];

while ((m = re.exec(str)) !== null) {
imageConfig.push({file: m[1] , width: (m[2] || ''), height: (m[3] || '')});
}
console.log(imageConfig);

Matching multiple regex groups in Javascript

Your regex is an example of how repeated capturing group works: (ab)+ only captures the last occurrence of ab in an abababab string.

In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g based regex.

To validate the string you may use

/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/

See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow | at the start / end of the string (but you may add \|* after # and before $ if you need that).

Details

  • ^# - # at the start of the string
  • [^|]+ - 1+ chars other than |
  • \| - a |
  • [^|]+ - 1+ chars other than |
  • (?:\|[^|]+\|[^|]+)* - 0+ sequences of

    • \| - a | char
    • [^|]+\|[^|]+ - 1+ chars other than |, | and again 1+ chars other than |
  • $ - end of string.

To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/ regex (the input will be the substring starting at Position 1).

Whole solution:

var s = "#something|somethingelse|morestuff|evenmorestuff";

var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;

var rx_extract = /([^|]+)\|([^|]+)/g;

var m, result = [];

if (rx_validate.test(s)) {

while (m=rx_extract.exec(s.substr(1))) {

result.push([m[1], m[2]]);

}

}

console.log(result);

// or just pairs as strings

// console.log(s.substr(1).match(rx_extract));

// => [ "something|somethingelse", "morestuff|evenmorestuff" ]

How to find indices of groups in JavaScript regular expressions match?

You can't directly get the index of a match group. What you have to do is first put every character in a match group, even the ones you don't care about:

var m= /(s+)(.*?)(l)([^l]*?)(o+)/.exec('this is hello to you');

Now you've got the whole match in parts:

['s is hello', 's', ' is hel', 'l', '', 'o']

So you can add up the lengths of the strings before your group to get the offset from the match index to the group index:

function indexOfGroup(match, n) {
var ix= match.index;
for (var i= 1; i<n; i++)
ix+= match[i].length;
return ix;
}

console.log(indexOfGroup(m, 3)); // 11

Named capturing groups in JavaScript regex?

ECMAScript 2018 introduces named capturing groups into JavaScript regexes.

Example:

  const auth = 'Bearer AUTHORIZATION_TOKEN'
const { groups: { token } } = /Bearer (?<token>[^ $]*)/.exec(auth)
console.log(token) // "AUTHORIZATION_TOKEN"

If you need to support older browsers, you can do everything with normal (numbered) capturing groups that you can do with named capturing groups, you just need to keep track of the numbers - which may be cumbersome if the order of capturing group in your regex changes.

There are only two "structural" advantages of named capturing groups I can think of:

  1. In some regex flavors (.NET and JGSoft, as far as I know), you can use the same name for different groups in your regex (see here for an example where this matters). But most regex flavors do not support this functionality anyway.

  2. If you need to refer to numbered capturing groups in a situation where they are surrounded by digits, you can get a problem. Let's say you want to add a zero to a digit and therefore want to replace (\d) with $10. In JavaScript, this will work (as long as you have fewer than 10 capturing group in your regex), but Perl will think you're looking for backreference number 10 instead of number 1, followed by a 0. In Perl, you can use ${1}0 in this case.

Other than that, named capturing groups are just "syntactic sugar". It helps to use capturing groups only when you really need them and to use non-capturing groups (?:...) in all other circumstances.

The bigger problem (in my opinion) with JavaScript is that it does not support verbose regexes which would make the creation of readable, complex regular expressions a lot easier.

Steve Levithan's XRegExp library solves these problems.

Regex 4 capture groups

Would you please try the following:

/(?:([^:|\s]*)\|)?(?:([^:|\s]*):)?([^:|\s]*)::([^:|\s]*)/gm

Demo



Related Topics



Leave a reply



Submit