How can I match overlapping strings with regex?
You can't do this with a regex alone, but you can get pretty close:
var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;
while ( (match = pat.exec( '1234567' ) ) != null ) {
results.push( match[1] );
}
console.log(results);
How to match all overlapping pattern
There's no reason to use a regex for this. Regex is overkill for such a simple task--it's over complex, and less efficient. Instead you should just use strings.Index
, and a for loop:
input := "...#...#....#.....#..#..#..#......."
idx := []int{}
j := 0
for {
i := strings.Index(input[j:], "..#..")
if i == -1 {
break
}
fmt.Println(j)
idx = append(idx, j+i)
j += i+1
}
fmt.Println("Indexes:", idx)
Playground link
Replacing overlapping matches in a string (regex or string operations)
I think I would forgo regex and write a simple loop as below (there is room for improvement), because I think it would be quicker and more understandable.
public IEnumerable<int> FindStartingOccurrences(string input, string pattern)
{
var occurrences = new List<int>();
for (int i=0; i<input.Length; i++)
{
if (input.Length+1 > i+pattern.Length)
{
if (input.Substring(i, pattern.Length) == pattern)
{
occurrences.Add(i);
}
}
}
return occurrences;
}
and then call like:
var occurrences = FindStartingOccurrences("aaabbaaaaaccaadaaa", "aa");
Overlapping matches in Regex
Update 2016:
To get nn
, nn
, nn
, SDJMcHattie proposes in the comments (?=(nn))
(see regex101).
(?=(nn))
Original answer (2008)
A possible solution could be to use a positive look behind:
(?<=n)n
It would give you the end position of:
- nnnn
- nnnn
- nnnn
As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)
I would prefer to his proposition (?=nn)n
the simpler form:
(n)(?=(n))
That would reference the first position of the strings you want and would capture the second n in group(2).
That is so because:
- Any valid regular expression can be used inside the lookahead.
- If it contains capturing parentheses, the backreferences will be saved.
So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).
Regexp to capture overlapping matches
You can use a positive lookahead assertion to get all 3 matches:
(?=(A.A))
RegEx Demo
For your input it finds 3 matches in captured group #1:
ABA
ACA
ADA
PHP Code:
if (preg_match_all('/(?=(A.A))/', "ABACADA", $m))
print_r($m[1]); // printing index 1
Output:
Array
(
[0] => ABA
[1] => ACA
[2] => ADA
)
Regex including overlapping matches with same start
As I've said above, regex is a primarily linear and single-rule-only kind of engine - you can choose between greedy capture or not, but you cannot select both. Also, most regex engines do not support overlapping matches (and even those who support it kind of fake it with substrings / forced head move) because it also doesn't fit regex philosophy.
If you're looking only for simple overlapping matches between two substrings, you can implement it yourself:
def find_substrings(data, start, end):
result = []
s_len = len(start) # a shortcut for `start` length
e_len = len(end) # a shortcut for `end` length
current_pos = data.find(start) # find the first occurrence of `start`
while current_pos != -1: # loop while we can find `start` in our data
# find the first occurrence of `end` after the current occurrence of `start`
end_pos = data.find(end, current_pos + s_len)
while end_pos != -1: # loop while we can find `end` after the current `start`
end_pos += e_len # just so we include the selected substring
result.append(data[current_pos:end_pos]) # add the current substring
end_pos = data.find(end, end_pos) # find the next `end` after the curr. `start`
current_pos = data.find(start, current_pos + s_len) # find the next `start`
return result
Which will yield:
substrings = find_substrings("BADACBA", "B", "A")
# ['BA', 'BADA', 'BADACBA', 'BA']
But you'll have to modify it for more complex matches.
How to find overlapping matches with a regexp?
findall
doesn't yield overlapping matches by default. This expression does however:
>>> re.findall(r'(?=(\w\w))', 'hello')
['he', 'el', 'll', 'lo']
Here (?=...)
is a lookahead assertion:
(?=...)
matches if...
matches next, but doesn’t consume any of the
string. This is called a lookahead assertion. For example,
Isaac (?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
Related Topics
Can't Perform a React State Update on an Unmounted Component
Graph Visualization Library in JavaScript
Best Way to Detect When a User Leaves a Web Page
How to Read and Write into File Using JavaScript
CSS Hover VS. JavaScript Mouseover
Jquery Event That Triggers After CSS Is Loaded
Screen Styling When Virtual Keyboard Is Active
Invoke Method in Objective C Code from HTML Code Using Uiwebview
Is It Spread "Syntax" or the Spread "Operator"
How Does This JavaScript/Jquery Syntax Work: (Function( Window, Undefined ) { })(Window)
Static Variables in JavaScript
Preloading Images With JavaScript
Variable Name as a String in JavaScript
How to Fetch the Value of a Non-Standard CSS Property via JavaScript
Hide Scrollbar and Show on Hover Like Facebook's New Chat Sidebar