How to Get the Shortest Rather Than Longest Possible Regex Match with Preg_Match()

How to get the shortest rather than longest possible regex match with preg_match()

Use non greedy modifier ? :

preg_match("/\{\{(.*?)\}\}/si",$content,$matches);
here --^

php preg_match non greedy?

Add ? after the quantifier to make it ungreedy. In this case, your regex would be .*?\n.

To specifically match the line beginning with "Overview: ", use this regex:

/^Overview:\s.*$/im

The m modifier allows ^ and $ to match the start and end of lines instead of the entire search string. Note that there is no need to make it ungreedy since . does not match newlines unless you use the s modifier - in fact, making it ungreedy here would be bad for performance.

regular expression in php: take the shortest match

Use

"/{{(.*?)}}/"

The expression ".*" is greedy, taking as many characters as possible.
If you use ".*?" is takes as little characters as possible, that is it stops at the first set of closing brackets.

Are preg_match() and preg_replace() slow?

As Mike Brant said in his answer: There's nothing wrong with using any of the preg_* functions, if you need them.

You want to know if it's a good idea to have something like 20 preg_match calls in a single file, well, honestly: I'd say that's too many. I've often stated that "if your solution to a problem relies on more than 3 regex's at any given time, you're part of the problem". I have occasionally sinned against my own mantra, though.

If you are using 20 preg_match calls, chances are you can halve that number simply by having a closer look at the actual regular expressions. Regex's, especially the Perl regex, are incredibly powerful, and are well worth the time to get to know them. The reason why they tend to be slower is simply because the regex has to be parsed, and "translated" to a considerable number of branches and loops at some low level. If, say, you want to replace all lower-case a's with an upper-case char, you could use a regular expression, sure, but in PHP this would look like this:

preg_replace('/a/','A',$string);

Look at the expression, the first argument: it's a string that is passed as an argument. This string will be parsed (when parsing, the delimiters are checked, a match string is created and then the string is iterated, each char is compared to the pattern (in this case a), and if the substring matches, it's replaced.

Seems like a bit of a hasstle, especially considering that the last step (comparing substrings and replace matches) is all we really want.

$string = str_replace('a','A',$string);

Does just that, without the additional checks performed when a regular expression is parsed and validated.

Don't forget that preg_match also constructs an array of matches, and constructing an array isn't free either.

In short: regex's are slower because the expression is parsed, validated and finally translated into a set of simple, low-level instructions.

Note that, in some cases people use explode and implode for string manipulations. This, too, creates an array which is -again- not free. Considering that you're imploding that very same array shortly thereafter. Perhaps another option is more desirable (and in some cases preg_replace can be faster here).

Basically: regex's need additional processing, that simple string functions don't require. But when in doubt, there's only 1 way to be absolutely sure: set up a test script...

PHP - Check if string contains words longer than 4 characters, then include + *, and for those shorter than 4 characters include only *

You can use preg_replace with two regexes for replacement, one which matches words with 1-3 letters and one which matches words with 4 or more:

$string = "This is a short sentence which should include all regex results";
echo preg_replace(array('/\b(\w{1,3})\b/', '/\b(\w{4,})\b/'), array('$1*', '+$1*'), $string);

Output:

+This* is* a* +short* +sentence* +which* +should* +include* all* +regex* +results*

Demo on 3v4l.org

How to get first possible match using preg match in php

Try this way:

$str = "var str = 'abcd [[test search string]] some text here ]]';";

preg_match("/(\[\[test[^]]*\]\])/im", $str, $match);

print_r($match);

Why does this regex only mach the last occurence of the pattern

Okay, I had a few minutes to spare on my mobile phone before bedtime, so I ran with Wiktor's comment and whacked up a series of preg_ functions to try to convert your bbcode to html. I don't have any experience with bbcode, so I am purely addressing your sample input and not considering fringe cases. I think php has a bbcode parser library somewhere, but I don't know if your bbcode syntax is the standard.

Some break down of the patterns implemented.

First, isolate each whole [table]...[/table] string in the document. (Regex101 Demo) ~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~ will match the strings and pass the fullmatch as $m[0] and the substring between the table tags as $m[1] to BBTableToHTML().

Next, BBTableToHTML() will make 3 separate passes over the $m[1] string. Each of those patterns will send their respective matched strings to the associated custom function and return the modified string.

Before sending the updated $m[1] from BBTableToHTML() back to the echo, your desired <table...> and </table> tags will bookend $m[1].

Demos of the preg_replace_callback_array() patterns:

  1. ~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~ https://regex101.com/r/thINHQ/2
  2. ~(?:\[\*].*\R*)+~ https://regex101.com/r/thINHQ/3
  3. ~\[\*](.*)~ https://regex101.com/r/thINHQ/4

Code: (Demo)

$bbcode = <<<BBCODE
[b]Check out this demo[/b]
¯\_(ツ)_/¯
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
simple text
[table]
[**]a 1[||]and a 2[/**]
[*]A[|]B
[*]C[|]D
[/table]

[s]3, you're out[/s]
blah
BBCODE;

function BBTableToHTML($m) {
return "<table class=\"ui compact stripet yellow table\">\n" .
preg_replace_callback_array(
[
'~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~' => 'BBTHeadToHTML',
'~(?:\[\*].*\R*)+~' => 'BBTBodyToHTML',
'~\[\*](.*)~' => 'BBTBodyRowToHTML'
],
$m[1]
) .
"</table>";
}

function BBTHeadToHTML($m) {
return "\t<thead>\n" .
"\t\t<tr>\n\t\t\t<th>" . str_replace('[||]', "</th>\n\t\t\t<th>", $m[1]) . "</th>\n\t\t</tr>\n" .
"\t</thead>";
}

function BBTBodyToHTML($m) {
return "\t<tbody>\n{$m[0]}\t</tbody>\n";
}

function BBTBodyRowToHTML($m) {
return "\t\t<tr>\n\t\t\t<td>" . str_replace('[|]', "</td>\n\t\t\t<td>", $m[1]) . "</td>\n\t\t</tr>";
}

echo preg_replace_callback(
'~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~',
'BBTableToHTML',
$bbcode
);

Output:

[b]Check out this demo[/b]
¯\_(ツ)_/¯
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
simple text
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>a 1</th>
<th>and a 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>C</td>
<td>D</td>
</tr>
</tbody>
</table>

[s]3, you're out[/s]
blah

Extract words from string with preg_match_all

This works if the words to look for are UTF-8 (at least 4 chars long, as per specs), consisting of alphabetic characters of ISO-8859-15 (which is fine for Spanish, but also for English, German, French, etc.):

$n_words = preg_match_all('/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){4,}/', $str, $match_arr);
$word_arr = $match_arr[0];


Related Topics



Leave a reply



Submit