Preg_Match() VS Strpos() for Match Finding

preg_match() vs strpos() for match finding?

I would prefer the strpos over preg_match, because regexes are generally more expensive to execute.

According to the official php docs for preg_match:

Do not use preg_match() if you only
want to check if one string is
contained in another string. Use
strpos() or strstr() instead as they
will be faster.

strpos vs preg_match - memory and resource differences

Jeffrey Friedl's Mastering Regular Expressions said that using the built in non-regex functions like strpos() and str_match() is always better and faster than using preg_match() (assuming you're using the preg suite) given that your match text is not a pattern.

Which is more efficient between str_pos and preg_match?

strpos is much more fast than preg_match, here is a benchmark:

$array = array();
for($i=0; $i<1000; $i++) $array[] = $i;
$nbloop = 10000;
$text = <<<EOD
I understand that my pattern must contain only a word per cycle because, in the case reported in that question, I must find "microsoft" and "microsoft exchange" and I can't modify my regexp because these two possibilities are given dinamically from a database!

So my question is: which is the better solution between over 200 preg_match and the same numbers of str_pos to check if a subset of char contains these words?
EOD;

$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
$pattern='<\b(?:'.$word.')\b>i';
if (preg_match_all($pattern, $text, $matches)) {
$fields['skill'][] = $matches[0][0];
}
}
}
echo "Elapse regex: ", microtime(true)-$start,"\n";

$start = microtime(true);
for ($i=0; $i<$nbloop; $i++) {
foreach ($array as $word) {
if(strpos($word, $text)>-1) {
$fields['skill'][] = $word;
}
}
}
echo "Elapse strpos: ", microtime(true)-$start,"\n";

Output:

Elapse regex: 7.9924139976501
Elapse strpos: 0.62015008926392

It's about 13 times faster.

preg_match() or stripos()?

It's been my experience that stripos is much faster.

php regex vs strpos for accuracy

Everyone has a different opinion on something, and the best method for comparing is one of those things.

I would most likely prefer to use strpos() over preg_match() because regular expressions are generally more expensive. Both of these functions are quick, but if you are worried about performance then you should use strpos() to test for a string in this case.

The documentation for preg_match() clearly states:

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

If you are wanting to use preg_match() I would rewrite your expression to something like the following.

preg_match('/\b(?:cb[12]000?|cr[12][25][50])\b/i', $string);

which is the fast process strpos()/stripos() or preg_match() in php

I found this blog that has run some testes regarding your question, the result was:

  • strpos() is 3-16 times faster than preg_match()
  • stripos() is 2-30 times slower than strpos()
  • stripos() is 20-100 percent faster than preg_match() with the
    caseless modifier "//i"
  • using a regular expression in preg_match() is not faster than using a
    long string
  • using the utf8 modifier "//u" in preg_match() makes it 2 times slower

The code used was:

<?php

function loop(){

$str_50 = str_repeat('a', 50).str_repeat('b', 50);
$str_100 = str_repeat('a', 100).str_repeat('b', 100);
$str_500 = str_repeat('a', 250).str_repeat('b', 250);
$str_1k = str_repeat('a', 1024).str_repeat('b', 1024);
$str_10k = str_repeat('a', 10240).str_repeat('b', 1024);
$str_100k = str_repeat('a', 102400).str_repeat('b', 1024);
$str_500k = str_repeat('a', 1024*500).str_repeat('b', 1024);
$str_1m = str_repeat('a', 1024*1024).str_repeat('b', 1024);

$b = 'b';
$b_10 = str_repeat('b', 10);
$b_100 = str_repeat('b', 100);
$b_1k = str_repeat('b', 1024);

echo str_replace(',', "\t", ',strpos,preg,preg U,preg S,preg regex,stripos,preg u,'.
'preg i,preg u i,preg i regex,stripos uc,preg i uc,preg i uc regex').PHP_EOL;

foreach (array($b, $b_10, $b_100, $b_1k) as $needle) {
foreach (array($str_50, $str_100, $str_500, $str_1k, $str_10k,
$str_100k, $str_500k, $str_1m) as $str) {

echo strlen($needle).'/'.strlen($str);

$start = mt();
for ($i=0; $i<25000; $i++) $j = strpos($str, $needle); // strpos
echo "\t".mt($start);

$regex = '!'.$needle.'!';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg
echo "\t".mt($start);

$regex = '!'.$needle.'!U';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg Ungreedy
echo "\t".mt($start);

$regex = '!'.$needle.'!S';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg extra analysiS
echo "\t".mt($start);

$regex = "!b{".strlen($needle)."}!";
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg regex
echo "\t".mt($start);

$start = mt();
for ($i=0; $i<25000; $i++) $j = stripos($str, $needle); // stripos
echo "\t".mt($start);

$regex = '!'.$needle.'!u';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg utf-8
echo "\t".mt($start);

$regex = '!'.$needle.'!i';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i
echo "\t".mt($start);

$regex = '!'.$needle.'!ui';
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i utf-8
echo "\t".mt($start);

$regex = "!b{".strlen($needle)."}!i";
$start = mt();
for ($i=0; $i<25000; $i++) $j = preg_match($regex, $str); // preg i regex
echo "\t".mt($start);

echo PHP_EOL;
}
echo PHP_EOL;
}
}

function mt($start=null){
if ($start === null) return microtime(true);
return number_format(microtime(true)-$start, 4);
}

loop();

PHP preg_match() or strpos to check if string starts with substring

Your strpos is incorrect syntactically, you close the functions before passing the term to search for.

Your regex requires all terms be present and doesn't check for the start of the string. You need to use an or and a leading anchor. This:

^(?:Bear|Town|Red)

should do it.

Regex Demo: https://regex101.com/r/civdig/3/

Correct strpos usage:

<?php if( strpos($_GET['code'], "Red" ) === 0 || strpos($_GET['code'], "Bear" ) === 0 || strpos($_GET['code'], "Town" === 0)): ?>

<span>new view</span>

<?php endif ?>

You need the 0 to confirm it matches, otherwise you are checking that it is somewhere in there.

php - strpos vs preg_match - memory and resource differences

Simba's comment is the best answer recommending KCachegrind for application profiling. You can see more about measuring performance in this answer.

For this particular example's question about memory, I measure preg_match being consistently better using PHP's memory_get_peak_usage

<?php
$keys = ['matchA','matchB','matchC','matchD','matchE'];

foreach ($keys as $key)
preg_match("~(matchA|matchB|matchC|matchD|matchE)~i",$key);

echo 'Peak Memory: '.memory_get_peak_usage();

Peak Memory: 501624

<?php
$keys = ['matchA','matchB','matchC','matchD','matchE'];

foreach ($keys as $key)
(strpos($key, 'matchA') !== false || strpos($key, 'matchB') !== false || strpos($key, 'matchC') !== false || strpos($key, 'matchD') !== false || strpos($key, 'matchE') !== false);

echo 'Peak Memory: '.memory_get_peak_usage();

Peak Memory: 504624

This seems reasonable to me because you're making 4 extra str_pos function calls.



Related Topics



Leave a reply



Submit