How to find all occurrences of a substring?
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer
returns a generator, so you could change the []
in the above to ()
to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
Occurrences of substring in a string
The last line was creating a problem. lastIndex
would never be at -1, so there would be an infinite loop. This can be fixed by moving the last line of code into the if block.
String str = "helloslkhellodjladfjhello";
String findStr = "hello";
int lastIndex = 0;
int count = 0;
while(lastIndex != -1){
lastIndex = str.indexOf(findStr,lastIndex);
if(lastIndex != -1){
count ++;
lastIndex += findStr.length();
}
}
System.out.println(count);
Find the list indices of all occurrences of a substring within the list
Try this:
indices = [idx for idx, s in enumerate(alphabet) if name in s]
where alphabet
is your list of strings, and name
is the desired substring.
Get all occurrences of a substring in a very big string
Easy solution:
var str = "...";
var searchKeyword = "...";
var startingIndices = [];
var indexOccurence = str.indexOf(searchKeyword, 0);
while(indexOccurence >= 0) {
startingIndices.push(indexOccurence);
indexOccurence = str.indexOf(searchKeyword, indexOccurence + 1);
}
If you need something highly performant, you may look over specific text search/indexing algorithms like Aho–Corasick algorithm or Boyer–Moore string-search algorithm.
Really depends on your use case and if the text you're searching into is changing or is static and can be indexed beforehand for maximum performance.
PHP Find all occurrences of a substring in a string
Without using regex, something like this should work for returning the string positions:
$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();
while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}
// Displays 3 and 10
foreach ($positions as $value) {
echo $value ."<br />";
}
Find all occurrences of substring in string in Java
You can use capturing inside a positive look-ahead to get all overlapping matches and use Matcher#start
to get the indices of the captured substrings.
As for the regex, it will look like
(?=(aa))
In Java code:
String s = "aaaaaa";
Matcher m = Pattern.compile("(?=(aa))").matcher(s);
List<Integer> pos = new ArrayList<Integer>();
while (m.find())
{
pos.add(m.start());
}
System.out.println(pos);
Result:
[0, 1, 2, 3, 4]
See IDEONE demo
Finding multiple occurrences of a string within a string in Python
Using regular expressions, you can use re.finditer
to find all (non-overlapping) occurences:
>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())
ll found 1 3
ll found 10 12
ll found 16 18
Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find
to get the next index:
>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2
ll found at 1
ll found at 10
ll found at 16
This also works for lists and other sequences.
How to find all occurrences of a substring?
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]
re.finditer
returns a generator, so you could change the []
in the above to ()
to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
Get indexes of all occurrences of a substring within a string
Use match_indices. Example from Rust docs:
let v: Vec<_> = "abcXXXabcYYYabc".match_indices("abc").collect();
assert_eq!(v, [(0, "abc"), (6, "abc"), (12, "abc")]);
let v: Vec<_> = "1abcabc2".match_indices("abc").collect();
assert_eq!(v, [(1, "abc"), (4, "abc")]);
let v: Vec<_> = "ababa".match_indices("aba").collect();
assert_eq!(v, [(0, "aba")]); // only the first `aba`
Related Topics
Is There Go Up Line Character? (Opposite of \N)
Strip HTML from Strings in Python
Post Values from an HTML Form and Access Them in a Flask View
How to Unescape HTML Entities in a String in Python 3.1
Remove HTML Tags Not on an Allowed List from a Python String
What's the Easiest Way to Escape HTML in Python
How to Simulate Html5 Drag and Drop in Selenium Webdriver
How to Remove Script Tags With Beautifulsoup
Selenium.Common.Exceptions.Invalidselectorexception With "Span:Contains('String')"
How to Terminate a Python Subprocess Launched With Shell=True
Imagemagick Not Authorized to Convert Pdf to an Image
Reverse/Invert a Dictionary Mapping
How to Install Packages Offline
Indentationerror: Unindent Does Not Match Any Outer Indentation Level
Is False == 0 and True == 1 an Implementation Detail or Is It Guaranteed by the Language