PHP Preg_Match (.*) Not Matching Past Line Breaks

PHP preg_match (.*) not matching past line breaks

Use the s modifier.

preg_match('/Para(.*)three/s', $row['file'], $m);

Pattern Modifiers

Regular expressions preg_match with line breaks

Use:

preg_match_all('/<strong>(.*)<\/strong>/s',$html,$data);

preg_match() not working in some cases

The biggest problem you have here is that you are trying to parse HTML code using regex. Even if you can get it to work with the data you have, as soon as the data contains nested <ul> tags, your regex will blow up, and at that point it will become extremely difficult to get it working. Parsing HTML really ought to be done using a DOM parser (ie PHP's DOMDocument class). Regex is the wrong tool for the job.

That said, if you must do it with regex, you need to use the s modifier, due to the input being across multiple lines. This modifier changes the behaviour of the dot character in the regex so that it includes line feed characters.

So your final pattern needs to look like this:

preg_match('/<ul>(.*)<\/ul>/s', $Real, $VarReal);

Hope that helps.

How do I match any character across multiple lines in a regular expression?

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

Match linebreaks - \n or \r\n?

I will answer in the opposite direction.


  1. For a full explanation about \r and \n I have to refer to this question, which is far more complete than I will post here: Difference between \n and \r?

Long story short, Linux uses \n for a new-line, Windows \r\n and old Macs \r. So there are multiple ways to write a newline. Your second tool (RegExr) does for example match on the single \r.

  1. [\r\n]+ as Ilya suggested will work, but will also match multiple consecutive new-lines. (\r\n|\r|\n) is more correct.

line break on regular expressions

'#.#m'

The m means MULTILINE, it makes the point able to match the newlines=line breaks \n

EDIT:

as it has been corrected by sharp eyes and good brain, it is evidently '#.+#s'

EDIT2:

As Michael Goldshteyn said, this should work

$ch = '<tag>\s+<div class="feed_title">Some Title</div>\s+<div class="feed_content">Some text</div>\s+</tag>'

preg_match('#<tag>(.+?)</tag>#s',$ch,$match)

There is another solution, without s flag, I think:

preg_match('#<tag>((.|\s)+?)</tag>#',$ch,$match)

But it's more complicated

.

EDIT 3:

I think that the presence of \s in $ch is a nonsense. \s is used in a RE, not in strings.

I wrote that because I was thinking that it could be blanks or \t that could be before <tag> and at the beginning of other lines

\t is written with an escape; that's not a reason to write \s also in a string

How to make dot match newline characters using regular expressions

You need to use the DOTALL modifier (/s).

'/<div>(.*)<\/div>/s'

This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:

'/<div>(.*?)<\/div>/s'

You could also solve this by matching everything except '<' if there aren't other tags:

'/<div>([^<]*)<\/div>/'

Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':

'#<div>([^<]*)</div>#'

However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.

PHP's preg_match() and preg_match_all() functions

preg_match stops looking after the first match. preg_match_all, on the other hand, continues to look until it finishes processing the entire string. Once match is found, it uses the remainder of the string to try and apply another match.

http://php.net/manual/en/function.preg-match-all.php

How to get the shortest rather than longest possible regex match with preg_match()

Use non greedy modifier ? :

preg_match("/\{\{(.*?)\}\}/si",$content,$matches);
here --^

Php preg_match using URL as regex

preg_grep() provides a shorter line of code, but because the substring to be matched doesn't appear to have any variable characters in it, best practice would indicate strpos() is better suited.

Code: (Demo)

$urls=[
'http://www.example.com/eng-gb/products/test-1',
'http://www.example.com/eng-gb/badproducts/test-2',
'http://www.example.com/eng-gb/products/test-3',
'http://www.example.com/eng-gb/badproducts/products/test-4',
'http://www.example.com/products/test-5',
'http://www.example.com/eng-gb/about-us',
];

var_export(preg_grep('~^http://www.example\.com/eng-gb/products/[^/]*$~',$urls));
echo "\n\n";
var_export(array_filter($urls,function($v){return strpos($v,'http://www.example.com/eng-gb/products/')===0;}));

Output:

array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)

array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)

Some notes:

Using preg_grep():

  • Use a non-slash pattern delimiter so that you don't have to escape all of the slashes inside the pattern.
  • Escape the dot at .com.
  • Write the full domain and directory path with start and end anchors for tightest validation.
  • Use a negated character class near the end of the pattern to ensure that no additional directories are added (unless of course you wish to include all subdirectories).
  • My pattern will match a url that ends with /products/ but not /products. This is in accordance with the details in your question.

Using strpos():

  • Checking for strpos()===0 means that the substring must be found at the start of the string.
  • This will allow any trailing characters at the end of the string.


Related Topics



Leave a reply



Submit