Overlapping matches in Regex
Update 2016:
To get nn
, nn
, nn
, SDJMcHattie proposes in the comments (?=(nn))
(see regex101).
(?=(nn))
Original answer (2008)
A possible solution could be to use a positive look behind:
(?<=n)n
It would give you the end position of:
- nnnn
- nnnn
- nnnn
As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)
I would prefer to his proposition (?=nn)n
the simpler form:
(n)(?=(n))
That would reference the first position of the strings you want and would capture the second n in group(2).
That is so because:
- Any valid regular expression can be used inside the lookahead.
- If it contains capturing parentheses, the backreferences will be saved.
So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).
Getting overlapping regex matches in C#
You are already selecting the 1 in front of the second zero by the first match.
100001 0001
^^^^^^
This is the first match. The rest is just 0001
which does not match your regex.
You can circumvent this behavior if you are using lookaheads/lookbehinds:
(?<=1)(0*)(?=1)
Live example
Because you cannot use lookbehinds in JavaScript, it is enough to only use one lookahead, to prevent the overlapping:
1(0*)(?=1)
Live example
And a hint for your regex101
example: You did not add the global flag, which prevents more than one selection.
Counting overlapping matches with Regex in C#
This will return 4
as you expect:
Regex.Matches("020202020", @"0(?=20)").Count;
The lookahead matches the 20
without consuming it, so the next match attempt starts at the position following the first 0
. You can even do the whole regex as a lookahead:
Regex.Matches("020202020", @"(?=020)").Count;
The regex engine automatically bumps ahead one position each time it makes a zero-length match. So, to find all runs of three 2
's or four 2
's, you can use:
Regex.Matches("22222222", @"(?=222)").Count; // 6
...and:
Regex.Matches("22222222", @"(?=2222)").Count; // 5
EDIT: Looking over your question again, it occurs to me you might be looking for 2
's interspersed with 0
's
Regex.Matches("020202020", @"(?=20202)").Count; // 2
If you don't know how many 0
's there will be, you can use this:
Regex.Matches("020202020", @"(?=20*20*2)").Count; // 2
And of course, you can use quantifiers to reduce repetition in the regex:
Regex.Matches("020202020", @"(?=2(?:0*2){2})").Count; // 2
Replacing overlapping matches in a string (regex or string operations)
I think I would forgo regex and write a simple loop as below (there is room for improvement), because I think it would be quicker and more understandable.
public IEnumerable<int> FindStartingOccurrences(string input, string pattern)
{
var occurrences = new List<int>();
for (int i=0; i<input.Length; i++)
{
if (input.Length+1 > i+pattern.Length)
{
if (input.Substring(i, pattern.Length) == pattern)
{
occurrences.Add(i);
}
}
}
return occurrences;
}
and then call like:
var occurrences = FindStartingOccurrences("aaabbaaaaaccaadaaa", "aa");
How to Find All Matches in Regular Expressions when one Overlaps OR Contains the Other?
Regular expressions are designed to find one match at a time. Even a global match operation is simply repeated applications of the same regex, each starting at the end of the previous match in the target string. So no, regexes are not able to find all matches in this way.
I will stick my neck out and say that I don't believe you can even find "all strings beginning with 'a' in 'akzzaz'" with a regex. /(a.*)/g
will find the entire string, while /(a.*?)/g
will find just 'a' twice.
The way I would code this would be to locate all 'a's, and search each of the substrings from there to the end of the string for all 'z's. So search 'akzzaz` and 'az' for 'z', giving 'akz', 'akzz', 'akzzaz', and 'az'. That is a fairly simple thing to do, but not a job for a regex unless the actual 'a' and 'z' tokens are complex.
How is Regex.Split giving me overlapping matches?
From the documentation:
If capturing parentheses are used in a
Regex.Split
expression, any captured text is included in the resulting string array. For example, splitting the string " plum-pear" on a hyphen placed within capturing parentheses adds a string element that contains the hyphen to the returned array.
You have two sets of capturing parenthesis, one inclusive of the quotes and one exclusive. These return the strings you are seeing.
Note that the pattern for RegEx.Split
isn't supposed to match the desired results, it's supposed to match the delimiters. A quoted string is usually not a delimiter.
Also, your results seem very odd, because you've used a greedy match. Apparently the requirement "The input string is split as many times as possible." makes matching non-greedy for the entire operation.
Overall, I'd say you're using the wrong tool. Regular expressions are, depending on implementation, incapable of dealing with nested groupings or extremely inefficient. A simple DFA should work much better and never need more than a single scan.
How can I Prioritize Overlapping Patterns in RegEx?
How about the following (regex101.com example):
/((?:[A-Z0-9]{1,4}|[Bb]lank)(?=\h[-–]\h)|[Bb]lank)(?:\h[-–]\h|\|)?(.*?)(?=[Bb]lank|\||[A-Z0-9]{1,4}\h[-–]\h|$)/gm
Explanation
[Bb]lank
All matches for "blank" check for a lower OR uppercase "B"
((?:[A-Z0-9]{1,4}|[Bb]lank)(?=\h[-–]\h)|[Bb]lank)
The 1st capturing group: match either the alpha numeric first value or a "blank" first value with " - " or " – " after (positive lookahead) OR a "blank" first value that won't have a 2nd matching group.
(?:\h[-–]\h|\|)?
A separator of " - " OR " – " OR "|" which will occur zero or one times.
(.*?)
Ungreedily match the 2nd matching group.
(?=[Bb]lank|\||[A-Z0-9]{1,4}\h[-–]\h|$)
Using a positive lookahead,look for a "blank" OR "|" OR alpha numeric first value with " - " or " – " after OR the end of the line (to catch the last item on the row) to find the end of where we should capture
How do I fix this regex? (How to get overlapping matches that share a word?)
With RegexOptions.Multiline | RegexOptions.IgnoreCase
^(?<Title>.*(?:Manager|Officer)).*\n(?<Name>.*)(?:\n(?!.*(?:Manager|Officer))(?<Detail>.*))+$
See: http://regexhero.net/tester/?id=1ac1bd9f-be0a-4bea-ac01-cc32a6605ae7
Retrieve values using
Match.Groups["Name"].Value
Match.Groups["Title"].Value
Match.Groups["Detail"].Captures[1..n].Value
Related Topics
Htmlagilitypack -- Does <Form> Close Itself for Some Reason
Put Wpf Control into a Windows Forms Form
How to Embed an Application Manifest into an Application Using VS2008
C# Picturebox Transparent Background Doesn't Seem to Work
Why Enums Require an Explicit Cast to Int Type
Serialize Property, But Do Not Deserialize Property in JSON.Net
Importing Nested Namespaces Automatically in C#
How to Deserialize a JSON Array into an Object Using JSON.Net
Display a Image in a Console Application
Set Background Color of Wpf Textbox in C# Code
Async/Await Different Thread Id