How to Read Regex Captures in C#

How to read RegEx Captures in C#

The C# regex API can be quite confusing. There are groups and captures:

  • A group represents a capturing group, it's used to extract a substring from the text
  • There can be several captures per group, if the group appears inside a quantifier.

The hierarchy is:

  • Match
    • Group
      • Capture

(a match can have several groups, and each group can have several captures)

For example:

Subject: aabcabbc
Pattern: ^(?:(a+b+)c)+$

In this example, there is only one group: (a+b+). This group is inside a quantifier, and is matched twice. It generates two captures: aab and abb:

aabcabbc
^^^ ^^^
Cap1 Cap2

When a group is not inside of a quantifier, it generates only one capture. In your case, you have 3 groups, and each group captures once. You can use match.Groups[1].Value, match.Groups[2].Value and match.Groups[3].Value to extract the 3 substrings you're interested in, without resorting to the capture notion at all.

How to get the value of a regex capture group?

This is what you need:

string sString = @"docs/horaires/1/images/1";
var pickImage = Regex.Match(sString, @"/horaires/(.*?)/images/");

if (pickImage.Success)
Console.WriteLine(pickImage.Groups[1].Value);

In your original code, PickImage[0] is a Match object, and Value will return the full match. You want the first captured group, so use match.Groups[1].Value. Note that Groups[0] always contains the full match.

No need to use Matches is you want a single result, use Match instead.

Regular Expression Groups in C#

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).

How do I access named capturing groups in a .NET Regex?

Use the group collection of the Match object, indexing it with the capturing group name, e.g.

foreach (Match m in mc){
MessageBox.Show(m.Groups["link"].Value);
}

How to match regular expression group capture collection with another group capture collection?

Consider this:

<
(?<closing>/)?
(?<tname>[a-z][a-z0-9]*)
(?:
\s+
(?<aname>[a-z0-9-_:]+)
(?:
=?
(?<quote>['"]?)
(?<avalue>[^'"<>]*)
\k<quote>
)
)*
(?<selfclosing>\s*\/)?
>

It will match some invalid markup:

<input type="text" disabled"" value="Something" />
<input type="text" disabled= value="Something" />

but you can fix this by adding lookaheads:

<
(?<closing>/)?
(?<tname>[a-z][a-z0-9]*)
(?:
\s+
(?<aname>[a-z0-9-_:]+)
(?:
(?:
=
(?=\S)|
(?=\s)
)
(?<quote>['"]?)
(?<avalue>[^'"<>]*)
\k<quote>
)
)*
(?<selfclosing>\s*\/)?
>

aname and avalue would be aligned.

How to use regex to capture 3 different parts from text line

You can use named matched groups for this:

var item = " john smith (idjs) <js@email.com>";
String[] patternArr =
{
"(?:\\s*)",
"(?<fullname>[a-zA-Z\\s]*?[a-zA-Z])", // captures the full name part
"(?:\\s*)",
"(?<idjs>\\([a-zA-Z]*\\))", // captures the idjs part
"(?:.*)",
"(?<email>(?:<).*@.*(?:>))" // captures the email part
};

var pattern = String.Join("", patternArr);
var m = Regex.Match(item, pattern);

if (m.Success)
{
Console.WriteLine("fullname: {0}", m.Groups["fullname"]);
Console.WriteLine("idjs: {0}", m.Groups["idjs"]);
Console.WriteLine("email: {0}", m.Groups["email"]);
}

Output:

fullname: john smith
idjs: (idjs)
email: <js@email.com>

Demo: https://dotnetfiddle.net/y6U5j4

How can C# Regex capture everything between *| and |*?

match.Value contains the entire match. This includes the delimiters since you specified them in your regex. When I test your regex and input with RegexPal, it highlights *|variablename|*.

You want to get only the capture group (the stuff in the brackets), so use match.Groups[1]:

String teststring = "This is a *|variablename|*";
Regex regex = new Regex(@"\*\|(.*)\|\*");
Match match = regex.Match(teststring);
Console.WriteLine(match.Groups[1]);


Related Topics



Leave a reply



Submit