PHP Preg_Match Non Greedy

php preg_match non greedy?

Add ? after the quantifier to make it ungreedy. In this case, your regex would be .*?\n.

To specifically match the line beginning with "Overview: ", use this regex:

/^Overview:\s.*$/im

The m modifier allows ^ and $ to match the start and end of lines instead of the entire search string. Note that there is no need to make it ungreedy since . does not match newlines unless you use the s modifier - in fact, making it ungreedy here would be bad for performance.

Non greedy regex

Try using [\s\S], which means all space and non-space characters, instead of .. Also, there's no need to add <funcion> and </funcion> in match groups.

/<funcion>([\s\S]*?)<\/funcion>/s

Also, keep in mind that the best way to do this is parsing the XML using a XML parser. Even if it's not a XML document, as you mentioned on your comment, extract the part that should be parsed and use XML parser to parse it.

Using regex with non-greedy second

The following should work, you just have to repeat the expression sequence you want again. There are a few ways to do it. The simplest way is:

$text = "start text1 end text2 end text3 end";
$regex = "~start.+?end.+?end~";
preg_match($regex, $text, $match);
print_r($match);

You may also want to use an exact quantifier to describe the pattern:

$text = "start text1 end text2 end text3 end";
$regex = "~start(.+?end){2}~";
preg_match($regex, $text, $match);
print_r($match);

The "{2}" tells it to match everything in the parentheses before it exactly twice.

How can I write a regex which matches non greedy?

The non-greedy ? works perfectly fine. It's just that you need to select dot matches all option in the regex engines (regexpal, the engine you used, also has this option) you are testing with. This is because, regex engines generally don't match line breaks when you use .. You need to tell them explicitly that you want to match line-breaks too with .

For example,

<img\s.*?>

works fine!

Check the results here.

Also, read about how dot behaves in various regex flavours.

PHP preg_replace non-greedy trouble

Your non-greedy modifier is working as expected. But preg_match replaces all occurences of the the (non-greedy) match with the replacement text ("" in your case). If you want only the first one replaced, you could pass 1 as the optional 4th argument (limit) to preg_replace function (PHP docs for preg_replace). On the website you linked, this can be accomplished by typing 1 into the text input between the word "Flags" and the word "limit".

Non greedy match does not work

Use this regex instead:

<w:t[^>]*><w:p>

[^>]* allows all characters except >

see https://regex101.com/r/nuMzTk/1

regular expression in php: take the shortest match

Use

"/{{(.*?)}}/"

The expression ".*" is greedy, taking as many characters as possible.
If you use ".*?" is takes as little characters as possible, that is it stops at the first set of closing brackets.

Non-greedy wildcard ignored

It makes sense when you think about the underlying theory behind regular expressions.

A regular expression is what is known as a finite state automaton (FSA). What this means is that it will, in essence, process your string one character at a time from left to right, occasionally going backwards by "giving up" characters. In your example, the regex sees the first # and, noting that the # isn't participating in any other parts of the pattern, starts matching the next token (.+?, in your case). It does that until it hits the colon, then matches the next token (again, .+?). Since it's going left-to-right, it'll match up to the first hash, and then stop, because it's being lazy.

This is actually a common misconception - the ? modifier for a quantifier isn't non-greedy, it's lazy. It'll match the minimum possible string, going left to right.

To fix your original regex, you could modify it like this:

/.+#(.+?):(.+?)#/im

What this would do is use a greedy match before the last hash before the colon, forcing the first capture group into only using the stuff between that hash and the colon. In the same vein, that group wouldn't need the lazy modifier either, yielding a final regex of:

/.+#(.+):(.+?)#/im


Related Topics



Leave a reply



Submit