Match Multiline Text Using Regular Expression

How do I match any character across multiple lines in a regular expression?

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

Regex matching over multiple lines

Your tries were pretty close. In the first one you probably need to set the flag that allows the . to match line feeds. It normally doesn't. In your second, you need to set the non-greedy ? mode on the anything match .*. Otherwise .* tries to match the entire rest of the text.

It would be something like this. /^ <br>\n\d+\s[a-zA-Z"“](.*?\n)*?<hr\/>/

But anyway, this is something that is best done in Perl. Perl is where all the advanced regex comes from.

use strict;
use diagnostics;

our $text =<<EOF;
The figure that now stood by its bows was tall and swart, with one white tooth <br>
evilly protruding from its steel-like lips. <br>
<br>
1 "Hardly" had they pulled out from under the ship’s lee, when a <br>
fourth keel, coming from the windward side, pulled round under the stern, <br>
and showed the five strangers <br>
127 <br>
<br>
<hr/>
More text.
EOF

our $regex = qr{^ <br>\n\d+ +[A-Z"“].*?<hr/>}ism;
$text =~ s/($regex)/<!-- Removed -->/;
print "Removed text:\n[$1]\n\n";
print "New text:\n[$text]\n";

That prints:

Removed text:
[ <br>
1 "Hardly" had they pulled out from under the ship’s lee, when a <br>
fourth keel, coming from the windward side, pulled round under the stern, <br>
and showed the five strangers <br>
127 <br>
<br>
<hr/>]

New text:
[The figure that now stood by its bows was tall and swart, with one white tooth <br>
evilly protruding from its steel-like lips. <br>
<!-- Removed -->
More text.
]

The qr operator builds a regular expression so that it can be stored in a variable. The ^ at the beginning means to anchor this match at the beginning of a line. The ism on the end stands for case insensitive, single string, multiple embedded lines. s allows . to match line feeds. m allows ^ to match at the beginning of lines embedded in the string. You would add a g flag to end of the substitution to do a global replacement. s///g

The Perl regex documentation explains everything.
https://perldoc.perl.org/perlretut

See also Multiline replace in perl with extended expressions not working.

HTH

regex: match multiple lines until a line contains

You need a positive lookahead.

foo\.[ab][\s\S]*?(?=\n.*?=|$)
  • [\s\S]*? matches lazily any character
  • (?=\n.*?=|$) until a newline containing an = is ahead or $ end.

See demo at regex101

regex to match terms in multiple lines

You can use this regex to match across the lines in Javascript:

/^(?=[^]*term1)(?=[^]*term2)(?![^]*term3)[^]*$/

In JS, [^] matches any character including new line.

RegEx Demo

If not using JS or want to make this regex portable to other flavors then one can use:

/^(?=[\D\d]*term1)(?=[\D\d]*term2)(?![\D\d]*term3)[\D\d]*$/

Matching across multiple lines regular expression

You need to use something like

^0[\s\S]*?[\n\r]Unique:

and replace with Unique:.

  • ^ - start of a line
  • 0 - a literal 0
  • [\s\S]*? - zero or more characters incl. a newline as few as possible
  • [\n\r] - a linebreak symbol
  • Unique: - a whole word Unique:

Another possible regex is:

^0[^\r]*(?:\r(?!Unique:)[^\r]*)*

where \r is the line endings in the current file. Replace with an empty string.

Note that you could also use (?m)^0.*?[\r\n]Unique: regex (to replace with Unique:) with the (?m) option:

m: multi-line (dot(.) match newline)

Match multiline text using regular expression

First, you're using the modifiers under an incorrect assumption.

Pattern.MULTILINE or (?m) tells Java to accept the anchors ^ and $ to match at the start and end of each line (otherwise they only match at the start/end of the entire string).

Pattern.DOTALL or (?s) tells Java to allow the dot to match newline characters, too.

Second, in your case, the regex fails because you're using the matches() method which expects the regex to match the entire string - which of course doesn't work since there are some characters left after (\\W)*(\\S)* have matched.

So if you're simply looking for a string that starts with User Comments:, use the regex

^\s*User Comments:\s*(.*)

with the Pattern.DOTALL option:

Pattern regex = Pattern.compile("^\\s*User Comments:\\s+(.*)", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}

ResultString will then contain the text after User Comments:

Access multiple lines from a text using Regular Expression and iterate in python

You could make use of the PyPi regex module and use the \G anchor to get iterative matches.

Pattern

(?:UOM\s|\G(?!^))\d\s+([\w.]+)\s([\w\s-]+\s)([0-9.]+)\s(\w+)\n([\w/\s]+\n)
  • (?: Non capture group

    • UOM\s Match UOM and whitespace char
    • | Or
    • \G(?!^) Asset the position at the end of the previous match, not at the start
  • ) Close group
  • \d\s+ Match a digit and 1+ whitespace chars
  • ([\w.]+)\s Capture group 1 Match word chars or dots and whitespace char
  • ([\w\s-]+\s) Capture group 2 Match word / whitespace chars or - and ending whitespac char
  • ([0-9.]+)\s Capture group 3 Match digits or . and whitespace char
  • (\w+)\n Capture group 4 Match word chars and newline
  • ([\w/\s]+\n) Capture group 5 Match word / whitespace chars or / and newline

Regex demo | Python demo

Example code

import regex

pattern = r"(?:UOM\s|\G(?!^))\d\s+([\w.]+)\s([\w\s-]+\s)([0-9.]+)\s(\w+)\n([\w/\s]+\n)"

test_str = ("Item SKU Product Desc. Qty UOM\n"
"1 _L180.0001352879 Clam Tuatua Medium 20- 1.00 cs\n"
"34pc/Kg 15kg/CS\n\n\n\n"
"Item SKU Product Desc. Qty UOM\n"
"1 L465.0001354266 Yoghurt Passionfruit Organic 4.00 PC\n"
"Vegan 1kg\n"
"2 L465.0001354264 Yoghurt Plain Organic Vegan 4.00 PC\n"
"1kg\n\n\n")


matches = regex.finditer(pattern, test_str)

for matchNum, match in enumerate(matches, start=1):
print("Product description : " + match.group(2) + match.group(5))
print("Quantity : " + match.group(3))

Output

Product description : Clam Tuatua Medium 20- 34pc/Kg 15kg/CS
Quantity : 1.00
Product description : Yoghurt Passionfruit Organic Vegan 1kg
Quantity : 4.00
Product description : Yoghurt Plain Organic Vegan 1kg
Quantity : 4.00

How to match regex over multiple lines

You need to use the s flag (not the m flag).

It's called the DOTALL option.

This works for me:

  String input = "1stline\n2ndLINE\n3rdline";
boolean b = input.matches("(?is).*2ndline.*");

I found it here.

Note you must use .* before and after the regex if you want to use String.matches().

That's because String.matches() attempts to match the entire string with the pattern.

(.* means zero or more of any character when used in a regex)


Another approach, found here:

  String input = "1stline\n2ndLINE\n3rdline";
Pattern p = Pattern.compile("(?i)2ndline", Pattern.DOTALL);
Matcher m = p.matcher(input);
boolean b = m.find();
print("match found: " + b);

I found it by googling "java regex multiline" and clicking the first result.

(it's almost as if that answer was written just for you...)

There's a ton of info about patterns and regexes here.


If you want to match only if 2ndline appears at the beginning of a line, do this:

   boolean b = input.matches("(?is).*\\n2ndline.*");

Or this:

 Pattern p = Pattern.compile("(?i)\\n2ndline", Pattern.DOTALL);


Related Topics



Leave a reply



Submit