PHP Regex: How to Match \R and \N Without Using [\R\N]

PHP Regex: How to match \r and \n without using [\r\n]?


PCRE and newlines

PCRE has a superfluity of newline related escape sequences and alternatives.

Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives.

To match any Unicode newline sequence that is in the ASCII range.

preg_match('~\R~', $string);

This is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85)

To match any Unicode newline sequence; including newline characters outside the ASCII range and both the line separator (U+2028) and paragraph separator (U+2029), you want to turn on the u (unicode) flag.

preg_match('~\R~u', $string);

The u (unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).

The is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})

It is possible to restrict \R to match CR, LF, or CRLF only:

preg_match('~(*BSR_ANYCRLF)\R~', $string);

The is equivalent to the following group:

(?>\r\n|\n|\r)

Additional

Five different conventions for indicating line breaks in strings are supported:

(*CR)        carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences

Note: \R does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.

Match linebreaks - \n or \r\n?

I will answer in the opposite direction.


  1. For a full explanation about \r and \n I have to refer to this question, which is far more complete than I will post here: Difference between \n and \r?

Long story short, Linux uses \n for a new-line, Windows \r\n and old Macs \r. So there are multiple ways to write a newline. Your second tool (RegExr) does for example match on the single \r.

  1. [\r\n]+ as Ilya suggested will work, but will also match multiple consecutive new-lines. (\r\n|\r|\n) is more correct.

PHP Regex - Detect 3 br or newline characters

This regex should do it:

/(?:(?=[\r\n])\r?\n?|<br[^>]+>){3,}/

Explanation:

  • Group:
    • Either:
      • Make sure there is one of \r or \n ahead
      • Match optional \r, then optional \n
      • (This formation allows you to match all three newline formats: \r, \n and \r\n)
    • Or:
      • A <br> tag, which may contain other stuff, such as <br />, <br style="clear:both">, etc.
  • Match the group three or more times.

Regex doesn't match line break

See a post I answered awhile back explaining this.

  • How to match \r and \n without using [\r\n]?

But to answer your question, apart from \r and \n PCRE also has another character group matching newlines, you can use a nifty escape sequence for this case which is \R.

\R matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by \v (vertical whitespace) and the multi character sequence \x0D\x0A.

preg_replace_callback("~\{([^\}]*)\}(\R)?~", function($matches) {
print_r($matches);
}, $string);

Output

Array
(
[0] => {$code}
[1] => $code
)
Array
(
[0] => {$varinnewline}

[1] => $varinnewline
[2] =>

)
Array
(
[0] => {if $bla}
[1] => if $bla
)
Array
(
[0] => {else}
[1] => else
)
Array
(
[0] => {/if}

[1] => /if
[2] =>

)

Regex \R doesn't work inside character class

From the PCRE manual:

Escape sequences in character classes


All the sequences that define a single character value can be used
both inside and outside character classes. In addition, inside a
characterclass, \b is interpreted as the backspace character (hex 08).

\N is not allowed in a character class. \B, \R, and \X are not
special inside a character class. Like other unrecognized escape
sequences,they are treated as the literal characters "B", "R",
and "X" by default
, but cause an error if the PCRE_EXTRA option is set.
Outside acharacter class, these sequences have different meanings.

(emphasis on relevant bit added by me)

Preg_replace with optional newlines

You can make a pattern optional (i.e., allow 0 or 1 instances of it) by following it with a ?, so this should do it for you:

'/^(m:|maken:)(.*)([\r\n])?/i'

But I think it would be easier to just strip all newlines coming in, it's not like they'll render in the output anyway.

How do I match any character across multiple lines in a regular expression?

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

Matching any amount of words regular expression

Assuming that your input data always looks like your example (title segment, colon, words; all on a single line), this should do it:

preg_match_all('/([A-Za-z]+:)\s*(.*)/', $contents, $array);

This would result in $array[1] matching something like Name:, and then $array[2] would match the rest of the line (you may have to use trim() to strip any leading and/or trailing white space from $array[2]).

If you only want to capture "words" in the second part, I believe you could change the second capture group to something like:

preg_match_all('/([A-Za-z]+:)\s*([\w\s]+)/', $contents, $array);

Note also that you shouldn't use the [A-z] construct, since there are non-alphabetical characters in the ASCII table between the upper case letters and the lower case letters. See the ASCII Table for a character map.



Related Topics



Leave a reply



Submit