preg_match() vs strpos() for match finding?
I would prefer the strpos
over preg_match
, because regexes are generally more expensive to execute.
According to the official php docs for preg_match
:
Do not use
preg_match()
if you only
want to check if one string is
contained in another string. Use
strpos()
orstrstr()
instead as they
will be faster.
strpos vs preg_match - memory and resource differences
Jeffrey Friedl's Mastering Regular Expressions said that using the built in non-regex functions like strpos()
and str_match()
is always better and faster than using preg_match()
(assuming you're using the preg suite) given that your match text is not a pattern.
Which is more efficient between str_pos and preg_match?
strpos
is much more fast than preg_match
, here is a benchmark:
$array = array();
for($i=0; $i<1000; $i++) $array[] = $i;
$nbloop = 10000;
$text = <<<EOD
I understand that my pattern must contain only a word per cycle because, in the case reported in that question, I must find "microsoft" and "microsoft exchange" and I can't modify my regexp because these two possibilities are given dinamically from a database!
So my question is: which is the better solution between over 200 preg_match and the same numbers of str_pos to check if a subset of char contains these words?
EOD;
$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
$pattern='<\b(?:'.$word.')\b>i';
if (preg_match_all($pattern, $text, $matches)) {
$fields['skill'][] = $matches[0][0];
}
}
}
echo "Elapse regex: ", microtime(true)-$start,"\n";
$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
if(strpos($word, $text)>-1) {
$fields['skill'][] = $word;
}
}
}
echo "Elapse strpos: ", microtime(true)-$start,"\n";
Output:
Elapse regex: 7.9924139976501
Elapse strpos: 0.62015008926392
It's about 13 times faster.
preg_match() or stripos()?
It's been my experience that stripos is much faster.
php regex vs strpos for accuracy
Everyone has a different opinion on something, and the best method for comparing is one of those things.
I would most likely prefer to use strpos()
over preg_match()
because regular expressions are generally more expensive. Both of these functions are quick, but if you are worried about performance then you should use strpos()
to test for a string in this case.
The documentation for preg_match()
clearly states:
Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.
If you are wanting to use preg_match()
I would rewrite your expression to something like the following.
preg_match('/\b(?:cb[12]000?|cr[12][25][50])\b/i', $string);
which is the fast process strpos()/stripos() or preg_match() in php
I found this blog that has run some testes regarding your question, the result was:
- strpos() is 3-16 times faster than preg_match()
- stripos() is 2-30 times slower than strpos()
- stripos() is 20-100 percent faster than preg_match() with the
caseless modifier "//i" - using a regular expression in preg_match() is not faster than using a
long string - using the utf8 modifier "//u" in preg_match() makes it 2 times slower
The code used was:
<?php
function loop(){
$str_50 = str_repeat('a', 50).str_repeat('b', 50);
$str_100 = str_repeat('a', 100).str_repeat('b', 100);
$str_500 = str_repeat('a', 250).str_repeat('b', 250);
$str_1k = str_repeat('a', 1024).str_repeat('b', 1024);
$str_10k = str_repeat('a', 10240).str_repeat('b', 1024);
$str_100k = str_repeat('a', 102400).str_repeat('b', 1024);
$str_500k = str_repeat('a', 1024*500).str_repeat('b', 1024);
$str_1m = str_repeat('a', 1024*1024).str_repeat('b', 1024);
$b = 'b';
$b_10 = str_repeat('b', 10);
$b_100 = str_repeat('b', 100);
$b_1k = str_repeat('b', 1024);
echo str_replace(',', "\t", ',strpos,preg,preg U,preg S,preg regex,stripos,preg u,'.
'preg i,preg u i,preg i regex,stripos uc,preg i uc,preg i uc regex').PHP_EOL;
foreach (array($b, $b_10, $b_100, $b_1k) as $needle) {
foreach (array($str_50, $str_100, $str_500, $str_1k, $str_10k,
$str_100k, $str_500k, $str_1m) as $str) {
echo strlen($needle).'/'.strlen($str);
$start = mt();
for ($i=0; $i<25000; $i++) $j = strpos($str, $needle); // strpos
echo "\t".mt($start);
$regex = '!'.$needle.'!';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg
echo "\t".mt($start);
$regex = '!'.$needle.'!U';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg Ungreedy
echo "\t".mt($start);
$regex = '!'.$needle.'!S';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg extra analysiS
echo "\t".mt($start);
$regex = "!b{".strlen($needle)."}!";
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg regex
echo "\t".mt($start);
$start = mt();
for ($i=0; $i<25000; $i++) $j = stripos($str, $needle); // stripos
echo "\t".mt($start);
$regex = '!'.$needle.'!u';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg utf-8
echo "\t".mt($start);
$regex = '!'.$needle.'!i';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i
echo "\t".mt($start);
$regex = '!'.$needle.'!ui';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i utf-8
echo "\t".mt($start);
$regex = "!b{".strlen($needle)."}!i";
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i regex
echo "\t".mt($start);
echo PHP_EOL;
}
echo PHP_EOL;
}
}
function mt($start=null){
if ($start === null) return microtime(true);
return number_format(microtime(true)-$start, 4);
}
loop();
PHP preg_match() or strpos to check if string starts with substring
Your strpos
is incorrect syntactically, you close the functions before passing the term to search for.
Your regex requires all terms be present and doesn't check for the start of the string. You need to use an or
and a leading anchor. This:
^(?:Bear|Town|Red)
should do it.
Regex Demo: https://regex101.com/r/civdig/3/
Correct strpos
usage:
<?php if( strpos($_GET['code'], "Red" ) === 0 || strpos($_GET['code'], "Bear" ) === 0 || strpos($_GET['code'], "Town" === 0)): ?>
<span>new view</span>
<?php endif ?>
You need the 0
to confirm it matches, otherwise you are checking that it is somewhere in there.
php - strpos vs preg_match - memory and resource differences
Simba's comment is the best answer recommending KCachegrind for application profiling. You can see more about measuring performance in this answer.
For this particular example's question about memory, I measure preg_match
being consistently better using PHP's memory_get_peak_usage
<?php
$keys = ['matchA','matchB','matchC','matchD','matchE'];
foreach ($keys as $key)
preg_match("~(matchA|matchB|matchC|matchD|matchE)~i",$key);
echo 'Peak Memory: '.memory_get_peak_usage();
Peak Memory: 501624
<?php
$keys = ['matchA','matchB','matchC','matchD','matchE'];
foreach ($keys as $key)
(strpos($key, 'matchA') !== false || strpos($key, 'matchB') !== false || strpos($key, 'matchC') !== false || strpos($key, 'matchD') !== false || strpos($key, 'matchE') !== false);
echo 'Peak Memory: '.memory_get_peak_usage();
Peak Memory: 504624
This seems reasonable to me because you're making 4 extra str_pos
function calls.
Related Topics
How to Make Codeigniter Accept "Query String" Urls
PHP Type-Hinting to Primitive Values
How to Use Http_X_Forwarded_For Properly
How to Give Container as Argument to Services
Where to Put the PHP Artisan Migrate Command
Is There an Equivalent in C++ of PHP's Explode() Function
Require_Once :Failed to Open Stream: No Such File or Directory
Windows Cmd.Exe "The System Cannot Find the Path Specified."
Weak Typing in PHP: Why Use Isset at All
How to Access a Variable Across Two Files
Inserting Utf-8 Encoded String into Utf-8 Encoded MySQL Table Fails with "Incorrect String Value"
Execute Python Script from PHP
A PHP/Pthreads Thread Class Can't Use Array