How to Get Indexes of All Occurrences of a Pattern in a String

Indexes of all occurrences of character in a string

This should print the list of positions without the -1 at the end that Peter Lawrey's solution has had.

int index = word.indexOf(guess);
while (index >= 0) {
System.out.println(index);
index = word.indexOf(guess, index + 1);
}

It can also be done as a for loop:

for (int index = word.indexOf(guess);
index >= 0;
index = word.indexOf(guess, index + 1))
{
System.out.println(index);
}

[Note: if guess can be longer than a single character, then it is possible, by analyzing the guess string, to loop through word faster than the above loops do. The benchmark for such an approach is the Boyer-Moore algorithm. However, the conditions that would favor using such an approach do not seem to be present.]

How to find indices of all occurrences of one string in another in JavaScript?


var str = "I learned to play the Ukulele in Lebanon."
var regex = /le/gi, result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push(result.index);
}

UPDATE

I failed to spot in the original question that the search string needs to be a variable. I've written another version to deal with this case that uses indexOf, so you're back to where you started. As pointed out by Wrikken in the comments, to do this for the general case with regular expressions you would need to escape special regex characters, at which point I think the regex solution becomes more of a headache than it's worth.





function getIndicesOf(searchStr, str, caseSensitive) {

var searchStrLen = searchStr.length;

if (searchStrLen == 0) {

return [];

}

var startIndex = 0, index, indices = [];

if (!caseSensitive) {

str = str.toLowerCase();

searchStr = searchStr.toLowerCase();

}

while ((index = str.indexOf(searchStr, startIndex)) > -1) {

indices.push(index);

startIndex = index + searchStrLen;

}

return indices;

}


var indices = getIndicesOf("le", "I learned to play the Ukulele in Lebanon.");


document.getElementById("output").innerHTML = indices + "";
<div id="output"></div>

How to get all indexes of a pattern in a string?

You can use the RegExp#exec method several times:

var regex = /a/g;
var str = "abcdab";

var result = [];
var match;
while (match = regex.exec(str))
result.push(match.index);

alert(result); // => [0, 4]

Helper function:

function getMatchIndices(regex, str) {
var result = [];
var match;
regex = new RegExp(regex);
while (match = regex.exec(str))
result.push(match.index);
return result;
}

alert(getMatchIndices(/a/g, "abcdab"));

How to find all occurrences of a substring?

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

How to find all occurrences of a pattern and their indices in Python

python re module to the rescue.

>>> import re
>>> [x.start() for x in re.finditer('foo', 'foo foo foo foo')]
[0, 4, 8, 12]

re.finditer returns a generator, what this means is that instead of using list-comprehensions you could use in in a for-loop which would be more memory efficient.

You could extend this to get the span of your pattern in the given text. i.e. the start and end index.

>>> [x.span() for x in re.finditer('foo', 'foo foo foo foo')]
[(0, 3), (4, 7), (8, 11), (12, 15)]

Isn't Python Awesome :) couldn't stop myself from quoting XKCD, downvotes or no downvotes...

Sample Image

How to get indexes of all occurrences of a pattern in a string

You can use .scan and $` global variable, which means The string to the left of the last successful match, but it doesn't work inside usual .scan, so you need this hack (stolen from this answer):

string = "Jack and Jill went up the hill to fetch a pail of water. Jack fell down and broke his crown. And Jill came tumbling after. "  
string.to_enum(:scan, /(jack|jill)/i).map do |m,|
p [$`.size, m]
end

output:

[0, "Jack"]
[9, "Jill"]
[57, "Jack"]
[97, "Jill"]

UPD:

Note the behaviour of lookbehind – you get the index of the really matched part, not the look one:

irb> "ab".to_enum(:scan, /ab/     ).map{ |m,| [$`.size, $~.begin(0), m] }
=> [[0, 0, "ab"]]
irb> "ab".to_enum(:scan, /(?<=a)b/).map{ |m,| [$`.size, $~.begin(0), m] }
=> [[1, 1, "b"]]

Finding all indexes of a specified character within a string

A simple loop works well:

var str = "scissors";
var indices = [];
for(var i=0; i<str.length;i++) {
if (str[i] === "s") indices.push(i);
}

Now, you indicate that you want 1,4,5,8. This will give you 0, 3, 4, 7 since indexes are zero-based. So you could add one:

if (str[i] === "s") indices.push(i+1);

and now it will give you your expected result.

A fiddle can be see here.

I don't think looping through the whole is terribly efficient

As far as performance goes, I don't think this is something that you need to be gravely worried about until you start hitting problems.

Here is a jsPerf test comparing various answers. In Safari 5.1, the IndexOf performs the best. In Chrome 19, the for loop is the fastest.

Sample Image

Find the indexes of all regex matches?

This is what you want: (source)

re.finditer(pattern, string[, flags]) 

Return an iterator yielding MatchObject instances over all
non-overlapping matches for the RE pattern in string. The string is
scanned left-to-right, and matches are returned in the order found. Empty
matches are included in the result unless they touch the beginning of
another match.

You can then get the start and end positions from the MatchObjects.

e.g.

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]

Finding all instances of a string within a string


local str = "honewaidoneaeifjoneaowieone"

-- This one only gives you the substring;
-- it doesn't tell you where it starts or ends
for substring in str:gmatch 'one' do
print(substring)
end

-- This loop tells you where the substrings
-- start and end. You can use these values in
-- string.find to get the matched string.
local first, last = 0
while true do
first, last = str:find("one", first+1)
if not first then break end
print(str:sub(first, last), first, last)
end

-- Same as above, but as a recursive function
-- that takes a callback and calls it on the
-- result so it can be reused more easily
local function find(str, substr, callback, init)
init = init or 1
local first, last = str:find(substr, init)
if first then
callback(str, first, last)
return find(str, substr, callback, last+1)
end
end

find(str, 'one', print)

Get all occurrences of a substring in a very big string

Easy solution:

var str = "...";
var searchKeyword = "...";

var startingIndices = [];

var indexOccurence = str.indexOf(searchKeyword, 0);

while(indexOccurence >= 0) {
startingIndices.push(indexOccurence);

indexOccurence = str.indexOf(searchKeyword, indexOccurence + 1);
}

If you need something highly performant, you may look over specific text search/indexing algorithms like Aho–Corasick algorithm or Boyer–Moore string-search algorithm.

Really depends on your use case and if the text you're searching into is changing or is static and can be indexed beforehand for maximum performance.



Related Topics



Leave a reply



Submit