Find All Substring's Occurrences and Locations

How to find all occurrences of a substring?

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

Find all substring's occurrences and locations

string str,sub; // str is string to search, sub is the substring to search for

vector<size_t> positions; // holds all the positions that sub occurs within str

size_t pos = str.find(sub, 0);
while(pos != string::npos)
{
positions.push_back(pos);
pos = str.find(sub,pos+1);
}

Edit
I misread your post, you said substring, and I assumed you meant you were searching a string. This will still work if you read the file into a string.

PHP Find all occurrences of a substring in a string

Without using regex, something like this should work for returning the string positions:

$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}

// Displays 3 and 10
foreach ($positions as $value) {
echo $value ."<br />";
}

Swift find all occurrences of a substring

You just keep advancing the search range until you can't find any more instances of the substring:

extension String {
func indicesOf(string: String) -> [Int] {
var indices = [Int]()
var searchStartIndex = self.startIndex

while searchStartIndex < self.endIndex,
let range = self.range(of: string, range: searchStartIndex..<self.endIndex),
!range.isEmpty
{
let index = distance(from: self.startIndex, to: range.lowerBound)
indices.append(index)
searchStartIndex = range.upperBound
}

return indices
}
}

let keyword = "a"
let html = "aaaa"
let indicies = html.indicesOf(string: keyword)
print(indicies) // [0, 1, 2, 3]

How to find indices of all occurrences of one string in another in JavaScript?

var str = "I learned to play the Ukulele in Lebanon."
var regex = /le/gi, result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push(result.index);
}

UPDATE

I failed to spot in the original question that the search string needs to be a variable. I've written another version to deal with this case that uses indexOf, so you're back to where you started. As pointed out by Wrikken in the comments, to do this for the general case with regular expressions you would need to escape special regex characters, at which point I think the regex solution becomes more of a headache than it's worth.

function getIndicesOf(searchStr, str, caseSensitive) {    var searchStrLen = searchStr.length;    if (searchStrLen == 0) {        return [];    }    var startIndex = 0, index, indices = [];    if (!caseSensitive) {        str = str.toLowerCase();        searchStr = searchStr.toLowerCase();    }    while ((index = str.indexOf(searchStr, startIndex)) > -1) {        indices.push(index);        startIndex = index + searchStrLen;    }    return indices;}
var indices = getIndicesOf("le", "I learned to play the Ukulele in Lebanon.");
document.getElementById("output").innerHTML = indices + "";
<div id="output"></div>

Finding multiple occurrences of a string within a string in Python

Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2

ll found at 1
ll found at 10
ll found at 16

This also works for lists and other sequences.

Find all locations of substring in NSString (not just first)

You can use rangeOfString:options:range: and set the third argument to be beyond the range of the first occurrence. For example, you can do something like this:

NSRange searchRange = NSMakeRange(0,string.length);
NSRange foundRange;
while (searchRange.location < string.length) {
searchRange.length = string.length-searchRange.location;
foundRange = [string rangeOfString:substring options:0 range:searchRange];
if (foundRange.location != NSNotFound) {
// found an occurrence of the substring! do stuff here
searchRange.location = foundRange.location+foundRange.length;
} else {
// no more substring to find
break;
}
}

Indexes of all occurrences of character in a string

This should print the list of positions without the -1 at the end that Peter Lawrey's solution has had.

int index = word.indexOf(guess);
while (index >= 0) {
System.out.println(index);
index = word.indexOf(guess, index + 1);
}

It can also be done as a for loop:

for (int index = word.indexOf(guess);
index >= 0;
index = word.indexOf(guess, index + 1))
{
System.out.println(index);
}

[Note: if guess can be longer than a single character, then it is possible, by analyzing the guess string, to loop through word faster than the above loops do. The benchmark for such an approach is the Boyer-Moore algorithm. However, the conditions that would favor using such an approach do not seem to be present.]

Find all the occurrences of a character in a string

The function:

def findOccurrences(s, ch):
return [i for i, letter in enumerate(s) if letter == ch]

findOccurrences(yourString, '|')

will return a list of the indices of yourString in which the | occur.

How to find occurrences of a string in string in C++?

int occurrences = 0;
string::size_type start = 0;

while ((start = base_string.find(to_find_occurrences_of, start)) != string::npos) {
++occurrences;
start += to_find_occurrences_of.length(); // see the note
}

string::find takes a string to look for in the invoking object and (in this overload) a character position at which to start looking, and returns the position of the occurrence of the string, or string::npos if the string is not found.

The variable start starts at 0 (the first character) and in the condition of the loop, you use start to tell find where to start looking, then assign the return value of find to start. Increment the occurrence count; now that start holds the position of the string, you can skip to_find_occurrences_of.length()1 characters ahead and start looking again.



1 drhirsch makes the point that if to_find_occurrences_of contains a repeated sequence of characters, doing start += to_find_occurrences_of.length() may skip some occurrences. For instance, if base_string was "ffff" and to_find_occurrences_of was "ff", then only 2 occurrences would be counted if you add to_find_occurrences_of.length() to start. If you want to avoid that, add 1 instead of to_find_occurrences_of.length() to start, and in that example, 3 occurrences would be counted instead of just 2.



Related Topics



Leave a reply



Submit