How to Find All Occurrences of a Substring

How to find all occurrences of a substring?

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

Occurrences of substring in a string

The last line was creating a problem. lastIndex would never be at -1, so there would be an infinite loop. This can be fixed by moving the last line of code into the if block.

String str = "helloslkhellodjladfjhello";
String findStr = "hello";
int lastIndex = 0;
int count = 0;

while(lastIndex != -1){

lastIndex = str.indexOf(findStr,lastIndex);

if(lastIndex != -1){
count ++;
lastIndex += findStr.length();
}
}
System.out.println(count);

Find the list indices of all occurrences of a substring within the list

Try this:

indices = [idx for idx, s in enumerate(alphabet) if name in s]

where alphabet is your list of strings, and name is the desired substring.

Get all occurrences of a substring in a very big string

Easy solution:

var str = "...";
var searchKeyword = "...";

var startingIndices = [];

var indexOccurence = str.indexOf(searchKeyword, 0);

while(indexOccurence >= 0) {
startingIndices.push(indexOccurence);

indexOccurence = str.indexOf(searchKeyword, indexOccurence + 1);
}

If you need something highly performant, you may look over specific text search/indexing algorithms like Aho–Corasick algorithm or Boyer–Moore string-search algorithm.

Really depends on your use case and if the text you're searching into is changing or is static and can be indexed beforehand for maximum performance.

PHP Find all occurrences of a substring in a string

Without using regex, something like this should work for returning the string positions:

$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}

// Displays 3 and 10
foreach ($positions as $value) {
echo $value ."<br />";
}

Find all occurrences of substring in string in Java

You can use capturing inside a positive look-ahead to get all overlapping matches and use Matcher#start to get the indices of the captured substrings.

As for the regex, it will look like

(?=(aa))

In Java code:

String s = "aaaaaa";
Matcher m = Pattern.compile("(?=(aa))").matcher(s);
List<Integer> pos = new ArrayList<Integer>();
while (m.find())
{
pos.add(m.start());
}
System.out.println(pos);

Result:

[0, 1, 2, 3, 4]

See IDEONE demo

Finding multiple occurrences of a string within a string in Python

Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2

ll found at 1
ll found at 10
ll found at 16

This also works for lists and other sequences.

How to find all occurrences of a substring?

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

Get indexes of all occurrences of a substring within a string

Use match_indices. Example from Rust docs:

let v: Vec<_> = "abcXXXabcYYYabc".match_indices("abc").collect();
assert_eq!(v, [(0, "abc"), (6, "abc"), (12, "abc")]);

let v: Vec<_> = "1abcabc2".match_indices("abc").collect();
assert_eq!(v, [(1, "abc"), (4, "abc")]);

let v: Vec<_> = "ababa".match_indices("aba").collect();
assert_eq!(v, [(0, "aba")]); // only the first `aba`


Related Topics



Leave a reply



Submit