How do I match any character across multiple lines in a regular expression?
It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:
/(.*)<FooBar>/s
The s at the end causes the dot to match all characters including newlines.
regex: match multiple lines until a line contains
You need a positive lookahead.
foo\.[ab][\s\S]*?(?=\n.*?=|$)
[\s\S]*?
matches lazily any character(?=\n.*?=|$)
until a newline containing an=
is ahead or$
end.
See demo at regex101
Regex matching over multiple lines
Your tries were pretty close. In the first one you probably need to set the flag that allows the .
to match line feeds. It normally doesn't. In your second, you need to set the non-greedy ?
mode on the anything match .*
. Otherwise .*
tries to match the entire rest of the text.
It would be something like this. /^ <br>\n\d+\s[a-zA-Z"“](.*?\n)*?<hr\/>/
But anyway, this is something that is best done in Perl. Perl is where all the advanced regex comes from.
use strict;
use diagnostics;
our $text =<<EOF;
The figure that now stood by its bows was tall and swart, with one white tooth <br>
evilly protruding from its steel-like lips. <br>
<br>
1 "Hardly" had they pulled out from under the ship’s lee, when a <br>
fourth keel, coming from the windward side, pulled round under the stern, <br>
and showed the five strangers <br>
127 <br>
<br>
<hr/>
More text.
EOF
our $regex = qr{^ <br>\n\d+ +[A-Z"“].*?<hr/>}ism;
$text =~ s/($regex)/<!-- Removed -->/;
print "Removed text:\n[$1]\n\n";
print "New text:\n[$text]\n";
That prints:
Removed text:
[ <br>
1 "Hardly" had they pulled out from under the ship’s lee, when a <br>
fourth keel, coming from the windward side, pulled round under the stern, <br>
and showed the five strangers <br>
127 <br>
<br>
<hr/>]
New text:
[The figure that now stood by its bows was tall and swart, with one white tooth <br>
evilly protruding from its steel-like lips. <br>
<!-- Removed -->
More text.
]
The qr
operator builds a regular expression so that it can be stored in a variable. The ^
at the beginning means to anchor this match at the beginning of a line. The ism
on the end stands for case i
nsensitive, s
ingle string, m
ultiple embedded lines. s
allows .
to match line feeds. m
allows ^
to match at the beginning of lines embedded in the string. You would add a g
flag to end of the substitution to do a global replacement. s///g
The Perl regex documentation explains everything.
https://perldoc.perl.org/perlretut
See also Multiline replace in perl with extended expressions not working.
HTH
regex to match terms in multiple lines
You can use this regex to match across the lines in Javascript:
/^(?=[^]*term1)(?=[^]*term2)(?![^]*term3)[^]*$/
In JS, [^]
matches any character including new line.
RegEx Demo
If not using JS or want to make this regex portable to other flavors then one can use:
/^(?=[\D\d]*term1)(?=[\D\d]*term2)(?![\D\d]*term3)[\D\d]*$/
Matching across multiple lines regular expression
You need to use something like
^0[\s\S]*?[\n\r]Unique:
and replace with Unique:
.
^
- start of a line0
- a literal0
[\s\S]*?
- zero or more characters incl. a newline as few as possible[\n\r]
- a linebreak symbolUnique:
- a whole wordUnique:
Another possible regex is:
^0[^\r]*(?:\r(?!Unique:)[^\r]*)*
where \r
is the line endings in the current file. Replace with an empty string.
Note that you could also use (?m)^0.*?[\r\n]Unique:
regex (to replace with Unique:
) with the (?m)
option:
m
: multi-line (dot(.
) match newline)
Regex matching pattern in multiple lines without specific word in the match
You might use
^PAT_A[^;\n]*(?:\n(?![^\n;]*NOT_MATCH_THIS)[^;\n]*)*\n[^;\n]*PAT_B[^;]*;
In parts, the pattern matches:
^
Start of stringPAT_A
Match literally[^;\n]*
Optionally match any char except;
or a newline(?:
Non capture group (to repeat as a whole)\n(?![^\n;]*NOT_MATCH_THIS)
Match a newline, and assert that the string does not containNOT_MATCH_THIS
and does not contain a;
or a newline to stay on the same line[^;\n]*
If the previous assertion is true, match the whole line (no containing a;
)
)*
Close the non capture group, and optionally repeat matching all lines\n[^;\n]*
Match a newline, and any char except;
or a newlinePAT_B[^;]*;
Then match PAT_B followed by any char except;
followed by matching the;
Regex demo
Regex to select multiple lines
Use (?s)
DOTALL modifier to make dot to match newline characters.
(?s)Quick.*?over.*?dog
OR
Add word boundary \b
if necessary. \b
matches between a word character and a non-word character.
(?s)\bQuick\b.*?\bover\b.*?\bdog\b
OR
If you're running javascript, [\s\S]*?
matches any character including line breaks. Note that there isn't a dotall modifier s
in js.
\bQuick\b[\s\S]*?\bover\b[\s\S]*?\bdog\b
DEMO
Regex code not collecting multiple lines of matching pattern
I couldn't help but respond to this as I am familiar with both regex and guitar haha.
For your short regex, please see the following regex on regex101.com:
https://regex101.com/r/NqGhoh/1/
The multiline modifier is required.
The main problem with this is that you are handling newlines on the front and back of the expression. I have modified the expression in a couple ways:
- Made the regex match newlines only on the end, always looking for a
^
at the beginning. - Matching the carriage return new line combination as \r?\n as a carriage return should always be followed by a newline when it is used.
- Used non-capturing groups to improve overhead and reduce complexity when looking at matches. This is the
?:
just inside the parenthesis. It means the group won't be captured in the result, just used for encapsulation.
I started testing your longer regex and may update that as well, though it sounds like you already know what to do with the shorter one corrected.
Regex - Match multiple lines that don't end with character
This PCRE expression should deliver the required result:
/^.*?(?<! _)$/gms
This is using the negative lookbehind (?<! _)
in combination with the multiline flag (m
) to match up to a line end that is not preceded by _
. The single-line flag (s
) ensures that the dot also matches newlines.
Here's a regex101 example.
javascript regex to match multiple lines
JavaScript lacks the s
(singleline/dotall) regex option, but you can workaround it by replacing .
with [\s\S]
(match any character that is a whitespace or that is not a whitespace, which basically means match everything). Also, make your quantifier lazy and get rid of the spaces in the pattern, since there's also no x
(extended) option in JS:
var regex = /###([\s\S]*?)###/;
Example:
var input = "###\nsome content\nthat I need to match\nand might have some special character in it such as | <> []\n###";
var regex = /###([\s\S]*?)###/;
document.getElementById("output").innerText = regex.exec(input)[1];
<pre id="output"></pre>
Related Topics
Homebrew Installation on MAC Os X Failed to Connect to Raw.Githubusercontent.Com Port 443
Is There a Way in Ruby 1.9 to Remove Invalid Byte Sequences from Strings
How to Find Each Instance of a Class in Ruby
Detect If Application Was Started as Http Server or Not (Rake Task, Rconsole etc)
How to Improve Jruby Load Time
Form_For "First Argument in Form Cannot Contain Nil or Be Empty" Error
How to Run Only Specific Tests in Rspec
How to Implement Cookie Support in Ruby Net/Http
How to Get a Backtrace from a Systemstackerror: Stack Level Too Deep
How to Easily Parse a Url with Parameters in a Rails Test
Rails - Pass Id Parameter on a Link_To
Selenium Scroll Element into (Center Of) View
How to Read a File from Bottom to Top in Ruby
How to Install Redcloth on Windows
How to Dump Strings in Yaml Using Literal Scalar Style
Ruby Design Pattern: How to Make an Extensible Factory Class