Regular Expression Groups in C#
The ( )
acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( )
.
RegEx Capturing Groups in C#
One way we might like to try is to test that if our expression would be working in another language.
Also, we might want to simplify our expression:
^(.*?)([\s:]+)?{([\s\S].*)?.$
where we have three capturing groups. The first and third ones are our desired key and values.
RegEx
You can modify/simplify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /^(.*?)([\s:]+)?{([\s\S].*)?.$/gm;
const str = `key:{value}
key:{valu{0}e}
key:{valu
{0}e}
key: {val-u{0}e}
key: {val__[!]
-u{0}{1}e}`;
const subst = `$1,$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
How the Regex C# Regex groups are identified
- Group 1 is defined by
(\w+?\s)
and it captures only last iteration - Group 2 is defined by
(U22334)
- Named group
one
is defined with whole regex definition - Named group
two
is defined by(?<two>(U22334))
For detailed explanation you can use e.g. page regex101.com (this link is with your regex pattern and tested string)
How to read RegEx Captures in C#
The C# regex API can be quite confusing. There are groups and captures:
- A group represents a capturing group, it's used to extract a substring from the text
- There can be several captures per group, if the group appears inside a quantifier.
The hierarchy is:
- Match
- Group
- Capture
- Group
(a match can have several groups, and each group can have several captures)
For example:
Subject: aabcabbc
Pattern: ^(?:(a+b+)c)+$
In this example, there is only one group: (a+b+)
. This group is inside a quantifier, and is matched twice. It generates two captures: aab
and abb
:
aabcabbc
^^^ ^^^
Cap1 Cap2
When a group is not inside of a quantifier, it generates only one capture. In your case, you have 3 groups, and each group captures once. You can use match.Groups[1].Value
, match.Groups[2].Value
and match.Groups[3].Value
to extract the 3 substrings you're interested in, without resorting to the capture notion at all.
What is the regex pattern for named capturing groups in .NET?
string pattern = @"(?<Person>[\w ]+) has been to (?<NumberOfGames>\d+) bingo games\. The last was on (?<Day>\w+) (?<Date>\d\d/\d\d/\d{4})\. She won with the Numbers: (?<Numbers>.*?)$";
Other posts have mentioned how to pull out the groups, but this regex matches on your input.
C# Regex: match capture groups multiple times
As per the comments, for the first posted example data, you could omit the anchor ^
and use the 3 capturing groups:
(?<replace>(?<column>\[[^]]+\]\.\w+)\.Contains\("(?<like>%[^%]+\%)"\))(?: And | Or |$)
Demo
Explanation
(?<replace>
Start groupreplace
(?<column>
Start groupcolumn
\[[^]]+\]
Match opening[
, 1+ times not closing]
and then closing]
\.\w+
Match a dot and 1+ word characters
)
Closing groupcolumn
\.Contains\("
Match.contains("
(?<like>)
Start grouplike
%[^%]+\%
Match%
, 1+ times not%
and then %`
)
Close grouplike
"\)
Match")
)
Close groupreplace
(?: And | Or |$)
Alternation which matches eitherAnd
,Or
or the end of the string
Regex in C# How to replace only capture groups and not non-capture groups
Instead of trying to ignore the strings with words and!@#$%^&*()_- in them, I just included them in my search, placed an extra single quote on either end, and then remove all instances of two single quotes like so:
// Find any string of words and !@#$%^&*()_- in and out of quotes.
Regex getwords = new Regex(@"(^(?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)([\w!@#$%^&*())_-]+)|((?!and\b)(?!or\b)(?!not\b)(?!empty\b)(?!notempty\b)(?!currentdate\b)(?<=\W)([\w!@#$%^&*()_-]+)|('[\w\s!@#$%^&*()_-]+')))", RegexOptions.IgnoreCase);
// Find all cases of two single quotes
Regex getQuotes = new Regex(@"('')");
// Get string from user
Console.WriteLine("Type in a string");
string search = Console.ReadLine();
// Execute Expressions.
search = getwords.Replace(search, "'$1'");
search = getQuotes.Replace(search, "'");
c# Match multiple regex groups, keeping each match/word seperate
You seem to seek
GeForce\s+\w+-(\w+)-(\w+)
The regex demo is available here.
Pattern explanation:
GeForce
- a literal substringGeForce
\s+
- 1 or more whitespaces\w+-
- 1+ word chars and a hyphen(\w+)
- Group 1 capturing 1+ word chars-
- a hyphen
-(\w+)
- Group 2 capturing 1+ word chars
To access the groups, use Match.Groups[X].Value
.
C# demo:
var re = @"GeForce\s+\w+-(\w+)-(\w+)";
var str = "GeForce TURBO-GTX1080-8G NVIDIA\nGeForce TURBO-GTX1070-4Gi";
var res = Regex.Matches(str, re)
.Cast<Match>()
.Select(m => m.Groups.Cast<Group>().Skip(1).Select(g => g.Value) )
.ToList();
foreach (var m in res)
Console.WriteLine(string.Join(" : ", m));
If you also need to match digits there, use
GeForce\s+\w+-([^\W\d]*(\d+)[^\W\d]*)-([^\W\d]*(\d+)[^\W\d]*)
See this regex demo. The code will be the same as above.
Here, \w+
are replaced with [^\W\d]*(\d+)[^\W\d]*
that match:
[^\W\d]*
- zero or more word chars except digits (that is,[\p{L}_]
, or[\w-[\d]]
)(\d+)
- Group X that captures one or more digits[^\W\d]*
- ibid.
Don't use capturing groups in c# Regex
If you are sure you have no %
s inside double %%
s, you can just use lookarounds like this:
(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^ ^^^^^
If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6
, see demo):
(?<=^%[^%]*%)[^%]+(?=%)
See regex demo
And in case it is between the 4th and the 5th:
(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
For single-% delimited strings (see demo):
(?<=^%(?:[^%]*%){3})[^%]+(?=%)
See another demo
Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than %
appears in.
The (?<=^%%[^%]*%%)
makes sure the is %%[something_other_then_%]%%
right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3})
matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%%
after the string start.
In case there can be single %
symbols inside the double %%
, you can use an unroll-the-loop regex (see demo):
(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)
Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%)
. For short strings, the .*?
based solution should work faster. For very long input texts, use the unrolled version.
Related Topics
Blazor - Display Wait or Spinner on API Call
Regular Expression Groups in C#
How to Create 7-Zip Archives with .Net
How to Get the File Size from Http Headers
How to Get Temporary Folder for Current User
Check If Number Is Prime Number
Why Can't I Inherit Static Classes
Serializing Private Member Data
Servicestack Request Dto Design
How to Read a Large (1 Gb) Txt File in .Net
Format Xml String to Print Friendly Xml String
Can Console.Clear Be Used to Only Clear a Line Instead of Whole Console
Attach a File from Memorystream to a Mailmessage in C#
General Purpose Fromevent Method