Regex with Multiple Groups

How to capture multiple repeated groups?

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$

How to capture multiple groups in regex?

You may use

(stxt|city):([^,]+)

See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).

Pattern details:

  • (stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
  • : - a colon
  • ([^,]+) - 1 or more characters other than a comma (Group 2).

Java demo:

String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

Trying to capture multiple groups in regex while skipping others

You may use

^(.*?)(?:\s*\([^()]*\))?:\s*(.*)$

See the regex demo.

Details

  • ^ - start of string
  • (.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible
  • (?:\s*\([^()]*\))? - an optional non-capturing group matching 1 or 0 occurrences of
    • \s* - 0+ whitespaces
    • \([^()]*\) - a (, zero or more chars other than ( and ) and then )
  • : - a colon
  • \s* - 0 or more whitespaces
  • (.*) - Capturing group 2: any zero or more chars other than line break chars, as many as possible
  • $ - end of string.

Regex match capture group multiple times

Use

(?:x|(?<!\A)\G).*?\Kb(?=.*z)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
x 'x'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator (omits matched text)
--------------------------------------------------------------------------------
b 'b'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except line breaks (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
z 'z'
--------------------------------------------------------------------------------
) end of look-ahead

Matching multiple regex groups in Javascript

Your regex is an example of how repeated capturing group works: (ab)+ only captures the last occurrence of ab in an abababab string.

In your case, you may perform two steps: 1) validate the input string to make sure it follows the pattern you want, 2) extract parts from the string using a g based regex.

To validate the string you may use

/^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/

See the regex demo. It is basically your original regex but it is more efficient, has no capturing groups (we do not need them at this step), and it does not allow | at the start / end of the string (but you may add \|* after # and before $ if you need that).

Details

  • ^# - # at the start of the string
  • [^|]+ - 1+ chars other than |
  • \| - a |
  • [^|]+ - 1+ chars other than |
  • (?:\|[^|]+\|[^|]+)* - 0+ sequences of

    • \| - a | char
    • [^|]+\|[^|]+ - 1+ chars other than |, | and again 1+ chars other than |
  • $ - end of string.

To extract the pairs, you may use a simple /([^|]+)\|([^|]+)/ regex (the input will be the substring starting at Position 1).

Whole solution:

var s = "#something|somethingelse|morestuff|evenmorestuff";var rx_validate = /^#[^|]+\|[^|]+(?:\|[^|]+\|[^|]+)*$/;var rx_extract = /([^|]+)\|([^|]+)/g;var m, result = [];if (rx_validate.test(s)) {  while (m=rx_extract.exec(s.substr(1))) {    result.push([m[1], m[2]]);  }}console.log(result);// or just pairs as strings// console.log(s.substr(1).match(rx_extract));// => [ "something|somethingelse",  "morestuff|evenmorestuff" ]

Apply multiple conditions to a capturing group

You may use

\b(?=[A-Z]*[a-z])(?=[a-z]*[A-Z])([a-zA-Z]+)\b

See the regex demo

Actually, you do not even need the capturing group, ([a-zA-Z]+) can be usually replaced with [a-zA-Z]+, but it depends where you are using the regex.

Details

  • \b - word boundary
  • (?=[A-Z]*[a-z]) - a positive lookahead that requires a lowercase letter after 0+ uppercase ones
  • (?=[a-z]*[A-Z]) - a positive lookahead that requires a uppercase letter after 0+ lowercase ones
  • ([a-zA-Z]+) - Group 1: 1 or more letters
  • \b - a word boundary.

Regex handling multiple groups form a potentially comma delimited list

The (.*?)\s?=\s?(.*?)\s?,? regex has got only one obligatory pattern, =. The (.*?) at the start gets expanded up to the leftmost = and the group captures any text up to the leftmost = and an optional whitespace after it. The rest of the subpatterns do not have to match, if there is a whitespace, it is matched with \s?, if there are two, they are matched, too, and if there is a comma, it is also matched and consumed, the .*? part is simply skipped as it is lazy.

If you want to get the second capturing group with single quotes included, you can use

(?:,|^)\s*([^\s=]+)\s*=\s*('[^']*'|\S+)

See this regex pattern. It matches

  • (?:,|^) - a non-capturing group matching a , or start of string
  • \s* - zero or more whitespaces
  • ([^\s=]+) - Group 1: one or more chars other than whitespace and =
  • \s*=\s* - a = char enclosed with zero or more whitespaces
  • ('[^']*'|\S+) - Group 2: either ', zero or more non-'s, and a ', or one or more non-whitespaces.

If you want to exclude single quotes you can post-process the matches, or use an extra capturing group in '([^']*)', and then check if the group matched or not:

import re
text = "col1 = 'Test String' , col2= 'Next Test String',col3='Last Text String', col4=37"
pattern = r"([^,\s=]+)\s*=\s*(?:'([^']*)'|(\S+))"
matches = re.findall(pattern, text)
print( dict([(x, z or y) for x,y,z in matches]) )
# => {'col1': 'Test String', 'col2': 'Next Test String', 'col3': 'Last Text String', 'col4': '37'}

See this Python demo.

If you want to do that with a pure regex, you can use a branch reset group:

import regex  # pip install regex
text = "col1 = 'Test String' , col2= 'Next Test String',col3='Last Text String', col4=37"
print( dict(regex.findall(r"([^,\s=]+)\s*=\s*(?|'([^']*)'|(\S+))", text)) )

See the Python demo (regex demo).

RegEx for splitting a list of words with multiple capturing groups

A large regex that probably does it

(?=.*\b(?:one|two|three|four|five|six|seven|eight|nine)\b)(\b(?:one|two|three)(?:\s+(?:one|two|three))*\b)?.+?(\b(?:four|five|six)(?:\s+(?:four|five|six))*\b)?.+?(\b(?:seven|eight|nine)(?:\s+(?:seven|eight|nine))*\b)?

https://regex101.com/r/rUtkyU/1

Readable version

 (?=
.* \b
(?:
one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
\b
)
( # (1 start)
\b
(?: one | two | three )

(?:
\s+
(?: one | two | three )
)*
\b
)? # (1 end)

.+?
( # (2 start)
\b
(?: four | five | six )

(?:
\s+
(?: four | five | six )
)*
\b
)? # (2 end)

.+?
( # (3 start)
\b
(?: seven | eight | nine )

(?:
\s+
(?: seven | eight | nine )
)*
\b
)? # (3 end)


Related Topics



Leave a reply



Submit