Regular Expression Groups in C#

Regular Expression Groups in C#

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).

RegEx Capturing Groups in C#

One way we might like to try is to test that if our expression would be working in another language.

Also, we might want to simplify our expression:

^(.*?)([\s:]+)?{([\s\S].*)?.$

where we have three capturing groups. The first and third ones are our desired key and values.

Sample Image

RegEx

You can modify/simplify/change your expressions in regex101.com.

RegEx Circuit

You can also visualize your expressions in jex.im:

Sample Image

JavaScript Demo





const regex = /^(.*?)([\s:]+)?{([\s\S].*)?.$/gm;

const str = `key:{value}

key:{valu{0}e}

key:{valu

{0}e}

key: {val-u{0}e}

key: {val__[!]

-u{0}{1}e}`;

const subst = `$1,$3`;


// The substituted value will be contained in the result variable

const result = str.replace(regex, subst);


console.log('Substitution result: ', result);

How the Regex C# Regex groups are identified


  • Group 1 is defined by (\w+?\s) and it captures only last iteration
  • Group 2 is defined by (U22334)
  • Named group one is defined with whole regex definition
  • Named group two is defined by (?<two>(U22334))

For detailed explanation you can use e.g. page regex101.com (this link is with your regex pattern and tested string)

How to read RegEx Captures in C#

The C# regex API can be quite confusing. There are groups and captures:

  • A group represents a capturing group, it's used to extract a substring from the text
  • There can be several captures per group, if the group appears inside a quantifier.

The hierarchy is:

  • Match
    • Group
      • Capture

(a match can have several groups, and each group can have several captures)

For example:

Subject: aabcabbc
Pattern: ^(?:(a+b+)c)+$

In this example, there is only one group: (a+b+). This group is inside a quantifier, and is matched twice. It generates two captures: aab and abb:

aabcabbc
^^^ ^^^
Cap1 Cap2

When a group is not inside of a quantifier, it generates only one capture. In your case, you have 3 groups, and each group captures once. You can use match.Groups[1].Value, match.Groups[2].Value and match.Groups[3].Value to extract the 3 substrings you're interested in, without resorting to the capture notion at all.

What is the regex pattern for named capturing groups in .NET?


 string pattern = @"(?<Person>[\w ]+) has been to (?<NumberOfGames>\d+) bingo games\. The last was on (?<Day>\w+) (?<Date>\d\d/\d\d/\d{4})\. She won with the Numbers: (?<Numbers>.*?)$";

Other posts have mentioned how to pull out the groups, but this regex matches on your input.

C# Regex: match capture groups multiple times

As per the comments, for the first posted example data, you could omit the anchor ^ and use the 3 capturing groups:

(?<replace>(?<column>\[[^]]+\]\.\w+)\.Contains\("(?<like>%[^%]+\%)"\))(?: And | Or |$)

Demo

Explanation

  • (?<replace> Start group replace

    • (?<column> Start group column

      • \[[^]]+\] Match opening [, 1+ times not closing ] and then closing ]
      • \.\w+ Match a dot and 1+ word characters
    • ) Closing group column
    • \.Contains\(" Match .contains("
    • (?<like>) Start group like

      • %[^%]+\% Match %, 1+ times not % and then %`
    • ) Close group like
    • "\) Match ")
  • ) Close group replace
  • (?: And | Or |$) Alternation which matches either And, Or or the end of the string

Regex in C# How to replace only capture groups and not non-capture groups

Instead of trying to ignore the strings with words and!@#$%^&*()_- in them, I just included them in my search, placed an extra single quote on either end, and then remove all instances of two single quotes like so:

 // Find any string of words and !@#$%^&*()_- in and out of quotes.
Regex getwords = new Regex(@"(^(?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)([\w!@#$%^&*())_-]+)|((?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)(?<=\W)([\w!@#$%^&*()_-]+)|('[\w\s!@#$%^&*()_-]+')))", RegexOptions.IgnoreCase);
// Find all cases of two single quotes
Regex getQuotes = new Regex(@"('')");

// Get string from user
Console.WriteLine("Type in a string");
string search = Console.ReadLine();

// Execute Expressions.
search = getwords.Replace(search, "'$1'");
search = getQuotes.Replace(search, "'");

c# Match multiple regex groups, keeping each match/word seperate

You seem to seek

GeForce\s+\w+-(\w+)-(\w+)

The regex demo is available here.

Pattern explanation:

  • GeForce - a literal substring GeForce
  • \s+ - 1 or more whitespaces
  • \w+- - 1+ word chars and a hyphen
  • (\w+) - Group 1 capturing 1+ word chars
  • - - a hyphen
    -(\w+) - Group 2 capturing 1+ word chars

To access the groups, use Match.Groups[X].Value.

C# demo:

var re = @"GeForce\s+\w+-(\w+)-(\w+)"; 
var str = "GeForce TURBO-GTX1080-8G NVIDIA\nGeForce TURBO-GTX1070-4Gi";
var res = Regex.Matches(str, re)
.Cast<Match>()
.Select(m => m.Groups.Cast<Group>().Skip(1).Select(g => g.Value) )
.ToList();
foreach (var m in res)
Console.WriteLine(string.Join(" : ", m));

If you also need to match digits there, use

GeForce\s+\w+-([^\W\d]*(\d+)[^\W\d]*)-([^\W\d]*(\d+)[^\W\d]*)

See this regex demo. The code will be the same as above.

Here, \w+ are replaced with [^\W\d]*(\d+)[^\W\d]* that match:

  • [^\W\d]* - zero or more word chars except digits (that is, [\p{L}_], or [\w-[\d]])
  • (\d+) - Group X that captures one or more digits
  • [^\W\d]* - ibid.

Sample Image

Don't use capturing groups in c# Regex

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:

(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^ ^^^^^

If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):

(?<=^%[^%]*%)[^%]+(?=%)

See regex demo

And in case it is between the 4th and the 5th:

(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^

For single-% delimited strings (see demo):

(?<=^%(?:[^%]*%){3})[^%]+(?=%)

See another demo

Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.

The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.

In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):

(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)

Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.



Related Topics



Leave a reply



Submit