How to Get Possibly Overlapping Matches in a String

How to get all possible overlapping matches for a string

Here's the regex you need: /III/g - simple enough, right? Now here's how you use it:

var text = "MUIIIUIIIU", find = "III", replace "U",
regex = new RegExp(find,"g"), matches = [], match;
while(match = regex.exec(text)) {
matches.push(match);
regex.lastIndex = match.index+1;
}

That regex.lastIndex... line overrides the usual regex behaviour of not matching results that overap. Also I'm using a RegExp constructor to make this more flexible. You could even build it into a function this way.

Now you have an array of match objects, you can do this:

matches.forEach(function(m) { // older browsers need a shim or old-fashioned for loop
console.log(text.substr(0,m.index)+replace+text.substr(m.index+find.length));
});

EDIT: Here is a JSFiddle demonstrating the above code.

How to get possibly overlapping matches in a string

def matching_substrings(string, regex)
string.size.times.each_with_object([]) do |start_index, maching_substrings|
start_index.upto(string.size.pred) do |end_index|
substring = string[start_index..end_index]
maching_substrings.push(substring) if substring =~ /^#{regex}$/
end
end
end

matching_substrings('abcadc', /a.*c/) # => ["abc", "abcadc", "adc"]
matching_substrings('foobarfoo', /(\w+).*\1/)
# => ["foobarf",
# "foobarfo",
# "foobarfoo",
# "oo",
# "oobarfo",
# "oobarfoo",
# "obarfo",
# "obarfoo",
# "oo"]
matching_substrings('why is this downvoted?', /why.*/)
# => ["why",
# "why ",
# "why i",
# "why is",
# "why is ",
# "why is t",
# "why is th",
# "why is thi",
# "why is this",
# "why is this ",
# "why is this d",
# "why is this do",
# "why is this dow",
# "why is this down",
# "why is this downv",
# "why is this downvo",
# "why is this downvot",
# "why is this downvote",
# "why is this downvoted",
# "why is this downvoted?"]

Replacing overlapping matches in a string (regex or string operations)

I think I would forgo regex and write a simple loop as below (there is room for improvement), because I think it would be quicker and more understandable.

        public IEnumerable<int> FindStartingOccurrences(string input, string pattern)
{
var occurrences = new List<int>();

for (int i=0; i<input.Length; i++)
{
if (input.Length+1 > i+pattern.Length)
{
if (input.Substring(i, pattern.Length) == pattern)
{
occurrences.Add(i);
}
}
}

return occurrences;
}

and then call like:

var occurrences = FindStartingOccurrences("aaabbaaaaaccaadaaa", "aa");

How to find overlapping matches with a regexp?

findall doesn't yield overlapping matches by default. This expression does however:

>>> re.findall(r'(?=(\w\w))', 'hello')
['he', 'el', 'll', 'lo']

Here (?=...) is a lookahead assertion:

(?=...) matches if ... matches next, but doesn’t consume any of the
string. This is called a lookahead assertion. For example,
Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

python 3 regex - find all overlapping matches' start and end index in a string

The problem you get is related to the fact that a lookahead is a zero-width assertion that consumes (i.e. adds to the match result) no text. It is a mere position in the string. Thus, all your matches start and end at the same location in the string.

You need to enclose the lookahead pattern with a capturing group (i.e. (?=(11111))) and access start and end of group 1 (with i.start(1) and i.end(1)):

import re
s = '1'*15
result = re.finditer(r'(?=(11111))', s)

for i in result:
print(i.start(1), i.end(1))

See the Python demo, its output is

(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)
(5, 10)
(6, 11)
(7, 12)
(8, 13)
(9, 14)
(10, 15)

Overlapping matches in Regex

Update 2016:

To get nn, nn, nn, SDJMcHattie proposes in the comments (?=(nn)) (see regex101).

(?=(nn))

Original answer (2008)

A possible solution could be to use a positive look behind:

(?<=n)n

It would give you the end position of:

  1. nnnn
     
  2. nnnn
     
  3. nnnn

As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)

I would prefer to his proposition (?=nn)n the simpler form:

(n)(?=(n))

That would reference the first position of the strings you want and would capture the second n in group(2).

That is so because:

  • Any valid regular expression can be used inside the lookahead.
  • If it contains capturing parentheses, the backreferences will be saved.

So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).



Get overlapping positions of pattern in string in r

You can use gregexpr and a PCRE regex with a capturing group enclosing the entire positive lookahead pattern:

pattern <- "(?=([1-]{2}))"
s <- "-1-"
res <- gregexpr(pattern, s, perl=TRUE)
starts <- attr(res[[1]],'capture.start')
lengths <- attr(res[[1]],'capture.length')
ends <- starts + lengths - 1
df_positions <- do.call(rbind, Map(data.frame, start=starts, end=ends, length=lengths))
df_positions

Output:

  start end length
1 1 2 2
2 2 3 2

See an R demo



Related Topics



Leave a reply



Submit