How to Find Multiple Occurrences with Regex Groups

How to find multiple occurrences with regex groups?

string text = "C# is the best language there is in the world.";
string search = "the";
MatchCollection matches = Regex.Matches(text, search);
Console.WriteLine("there was {0} matches for '{1}'", matches.Count, search);
Console.ReadLine();

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

Regex capture for multiple occurrences and place them in groups

You may slightly modify the regex to use a positive lookbehind and use a simpler code:

def winningSym = /(?<=winline":)[0-9]+/
String s = """{"Id":1,"winline":5,"Winnings":50000, some random text, "winline":4, more random text, "winline":7, more stuff}"""
def res = s.findAll(winningSym)
println(res)

See the Groovy demo, output: [5, 4, 7].

To use your regex and collect Group 1 values use .collect on the matcher (as Matcher supports the iterator() method):

def winningSym = /winline":([0-9]+)/
String line = """{"Id":1,"winline":5,"Winnings":50000, some random text, "winline":4, more random text, "winline":7, more stuff}"""
def res = (line =~ winningSym).collect { it[1] }

See another Groovy demo. Here, it[1] will access the contents inside capturing group 1 and .collect will iterate through all matches.

Multiple occurrences of each capturing group in Javascript regex

Assuming str is your string

var tables = str.match(/from\s(.*)\swhere/)[1].split(/, /);

Returns:

["table1 t1", "table2", "table3 t3", "table4"]

Regex match capture group multiple times

Use

(?:x|(?<!\A)\G).*?\Kb(?=.*z)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
x 'x'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator (omits matched text)
--------------------------------------------------------------------------------
b 'b'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except line breaks (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
z 'z'
--------------------------------------------------------------------------------
) end of look-ahead

Matching multiple regex groups in Javascript

Your regex is an example of how repeated capturing group works: (ab)+ only captures the last occurrence of ab in an abababab string.

In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g based regex.

To validate the string you may use

/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/

See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow | at the start / end of the string (but you may add \|* after # and before $ if you need that).

Details

  • ^# - # at the start of the string
  • [^|]+ - 1+ chars other than |
  • \| - a |
  • [^|]+ - 1+ chars other than |
  • (?:\|[^|]+\|[^|]+)* - 0+ sequences of

    • \| - a | char
    • [^|]+\|[^|]+ - 1+ chars other than |, | and again 1+ chars other than |
  • $ - end of string.

To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/ regex (the input will be the substring starting at Position 1).

Whole solution:

var s = "#something|somethingelse|morestuff|evenmorestuff";var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;var rx_extract = /([^|]+)\|([^|]+)/g;var m, result = [];if (rx_validate.test(s)) {  while (m=rx_extract.exec(s.substr(1))) {    result.push([m[1], m[2]]);  }}console.log(result);// or just pairs as strings// console.log(s.substr(1).match(rx_extract));// => [ "something|somethingelse",  "morestuff|evenmorestuff" ]

regexes: How to access multiple matches of a group?

Drop the * from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...) or re.finditer (see here) to return all matches.

Update:

It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.



Related Topics



Leave a reply



Submit