Indexes of all occurrences of character in a string
This should print the list of positions without the -1
at the end that Peter Lawrey's solution has had.
int index = word.indexOf(guess);
while (index >= 0) {
System.out.println(index);
index = word.indexOf(guess, index + 1);
}
It can also be done as a for
loop:
for (int index = word.indexOf(guess);
index >= 0;
index = word.indexOf(guess, index + 1))
{
System.out.println(index);
}
[Note: if guess
can be longer than a single character, then it is possible, by analyzing the guess
string, to loop through word
faster than the above loops do. The benchmark for such an approach is the Boyer-Moore algorithm. However, the conditions that would favor using such an approach do not seem to be present.]
How to find indices of all occurrences of one string in another in JavaScript?
var str = "I learned to play the Ukulele in Lebanon."
var regex = /le/gi, result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push(result.index);
}
UPDATE
I failed to spot in the original question that the search string needs to be a variable. I've written another version to deal with this case that uses indexOf
, so you're back to where you started. As pointed out by Wrikken in the comments, to do this for the general case with regular expressions you would need to escape special regex characters, at which point I think the regex solution becomes more of a headache than it's worth.
function getIndicesOf(searchStr, str, caseSensitive) {
var searchStrLen = searchStr.length;
if (searchStrLen == 0) {
return [];
}
var startIndex = 0, index, indices = [];
if (!caseSensitive) {
str = str.toLowerCase();
searchStr = searchStr.toLowerCase();
}
while ((index = str.indexOf(searchStr, startIndex)) > -1) {
indices.push(index);
startIndex = index + searchStrLen;
}
return indices;
}
var indices = getIndicesOf("le", "I learned to play the Ukulele in Lebanon.");
document.getElementById("output").innerHTML = indices + "";
<div id="output"></div>
How to get all indexes of a pattern in a string?
You can use the RegExp#exec
method several times:
var regex = /a/g;
var str = "abcdab";
var result = [];
var match;
while (match = regex.exec(str))
result.push(match.index);
alert(result); // => [0, 4]
Helper function:
function getMatchIndices(regex, str) {
var result = [];
var match;
regex = new RegExp(regex);
while (match = regex.exec(str))
result.push(match.index);
return result;
}
alert(getMatchIndices(/a/g, "abcdab"));
How to find all occurrences of a substring?
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer
returns a generator, so you could change the []
in the above to ()
to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
How to find all occurrences of a pattern and their indices in Python
python re
module to the rescue.
>>> import re
>>> [x.start() for x in re.finditer('foo', 'foo foo foo foo')]
[0, 4, 8, 12]
re.finditer
returns a generator, what this means is that instead of using list-comprehensions you could use in in a for-loop
which would be more memory efficient.
You could extend this to get the span of your pattern in the given text. i.e. the start and end index.
>>> [x.span() for x in re.finditer('foo', 'foo foo foo foo')]
[(0, 3), (4, 7), (8, 11), (12, 15)]
Isn't Python Awesome :) couldn't stop myself from quoting XKCD, downvotes or no downvotes...
How to get indexes of all occurrences of a pattern in a string
You can use .scan
and $`
global variable, which means The string to the left of the last successful match, but it doesn't work inside usual .scan
, so you need this hack (stolen from this answer):
string = "Jack and Jill went up the hill to fetch a pail of water. Jack fell down and broke his crown. And Jill came tumbling after. "
string.to_enum(:scan, /(jack|jill)/i).map do |m,|
p [$`.size, m]
end
output:
[0, "Jack"]
[9, "Jill"]
[57, "Jack"]
[97, "Jill"]
UPD:
Note the behaviour of lookbehind – you get the index of the really matched part, not the look one:
irb> "ab".to_enum(:scan, /ab/ ).map{ |m,| [$`.size, $~.begin(0), m] }
=> [[0, 0, "ab"]]
irb> "ab".to_enum(:scan, /(?<=a)b/).map{ |m,| [$`.size, $~.begin(0), m] }
=> [[1, 1, "b"]]
Finding all indexes of a specified character within a string
A simple loop works well:
var str = "scissors";
var indices = [];
for(var i=0; i<str.length;i++) {
if (str[i] === "s") indices.push(i);
}
Now, you indicate that you want 1,4,5,8. This will give you 0, 3, 4, 7 since indexes are zero-based. So you could add one:
if (str[i] === "s") indices.push(i+1);
and now it will give you your expected result.
A fiddle can be see here.
I don't think looping through the whole is terribly efficient
As far as performance goes, I don't think this is something that you need to be gravely worried about until you start hitting problems.
Here is a jsPerf test comparing various answers. In Safari 5.1, the IndexOf performs the best. In Chrome 19, the for loop is the fastest.
Find the indexes of all regex matches?
This is what you want: (source)
re.finditer(pattern, string[, flags])
Return an iterator yielding MatchObject instances over all
non-overlapping matches for the RE pattern in string. The string is
scanned left-to-right, and matches are returned in the order found. Empty
matches are included in the result unless they touch the beginning of
another match.
You can then get the start and end positions from the MatchObjects.
e.g.
[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]
Finding all instances of a string within a string
local str = "honewaidoneaeifjoneaowieone"
-- This one only gives you the substring;
-- it doesn't tell you where it starts or ends
for substring in str:gmatch 'one' do
print(substring)
end
-- This loop tells you where the substrings
-- start and end. You can use these values in
-- string.find to get the matched string.
local first, last = 0
while true do
first, last = str:find("one", first+1)
if not first then break end
print(str:sub(first, last), first, last)
end
-- Same as above, but as a recursive function
-- that takes a callback and calls it on the
-- result so it can be reused more easily
local function find(str, substr, callback, init)
init = init or 1
local first, last = str:find(substr, init)
if first then
callback(str, first, last)
return find(str, substr, callback, last+1)
end
end
find(str, 'one', print)
Get all occurrences of a substring in a very big string
Easy solution:
var str = "...";
var searchKeyword = "...";
var startingIndices = [];
var indexOccurence = str.indexOf(searchKeyword, 0);
while(indexOccurence >= 0) {
startingIndices.push(indexOccurence);
indexOccurence = str.indexOf(searchKeyword, indexOccurence + 1);
}
If you need something highly performant, you may look over specific text search/indexing algorithms like Aho–Corasick algorithm or Boyer–Moore string-search algorithm.
Really depends on your use case and if the text you're searching into is changing or is static and can be indexed beforehand for maximum performance.
Related Topics
Rails - Local Variables Versus Instance Variables
Order Products by Association Count
Ruby - Digest::Digest Is Deprecated; Use Digest
Ruby - Send Get Request with Headers
Use Rvm to Force Specific Ruby in Xcode Run Script Build Phase
How to Elegantly Rename All Keys in a Hash in Ruby
Insecure World Writable Dir /Users/Username in Path, Mode 040777 When Running Ruby Commands
Allow Public Connections to Local Ruby on Rails Development Server
Is Ruby Really an Interpreted Language If All of Its Implementations Are Compiled into Bytecode
How to Make Httparty Ignore Ssl
Use of Caret Symbol (^) in Ruby
Ruby Regex Error: Incompatible Encoding Regexp Match (Ascii-8Bit Regexp with Utf-8 String)
Ruby 1.9: Regular Expressions with Unknown Input Encoding
No Such File or Directory - Git Ls-Files -- Windows
Form_For "First Argument in Form Cannot Contain Nil or Be Empty" Error