Regular expression matching a multiline block of text
Try this:
re.compile(r"^(.+)\n((?:\n.+)+)", re.MULTILINE)
I think your biggest problem is that you're expecting the ^
and $
anchors to match linefeeds, but they don't. In multiline mode, ^
matches the position immediately following a newline and $
matches the position immediately preceding a newline.
Be aware, too, that a newline can consist of a linefeed (\n
), a carriage-return (\r
), or a carriage-return+linefeed (\r\n
). If you aren't certain that your target text uses only linefeeds, you should use this more inclusive version of the regex:
re.compile(r"^(.+)(?:\n|\r\n?)((?:(?:\n|\r\n?).+)+)", re.MULTILINE)
BTW, you don't want to use the DOTALL modifier here; you're relying on the fact that the dot matches everything except newlines.
Python - Regex match multiple patterns in multiple lines
import re
regex = r"""
Orion/PatchDetails/process_form.+?release=\d+
(.+) # use this as your match
zip[^\"]
"""
matches = re.compile(regex, re.MULTILINE | re.DOTALL | re.VERBOSE)
Add re.DOTALL
to let .
include \n
. For your regex, this lets you match multiple lines
https://regex101.com/r/jBwq20/1
Python regex match across multiple lines
You may use
config_value = 'Example'
pattern=r'(?sm)^{}=(.*?)(?=[\r\n]+\w+=|\Z)'.format(config_value)
match = re.search(pattern, s)
if match:
print(match.group(1))
See the Python demo.
Pattern details
(?sm)
-re.DOTALL
andre.M
are on^
- start of a lineExample=
- a substring(.*?)
- Group 1: any 0+ chars, as few as possible(?=[\r\n]+\w+=|\Z)
- a positive lookahead that requires the presence of 1+ CR or LF symbols followed with 1 or more word chars followed with a=
sign, or end of the string (\Z
).
See the regex demo.
Multiline regex match retrieving line numbers and matches
you could first find all \n
characters in the text and their respective position/character index. since each \n
...well...starts a new line, the index of each value in this list indicates the line number the found \n
character terminates. then search all occurrences of you pattern and use the aforementioned list to look up the start/end position of the match...
import re
import bisect
text = """new
File()
aa new File()
new
File()
there is a new File() and new
File() again
new
File()
there is not a matching pattern here File() new
new File() test new File() occurs twice on this line
"""
# character indices of all \n characters in text
nl = [m.start() for m in re.finditer("\n", text, re.MULTILINE|re.DOTALL)]
matches = list(re.finditer(r"(new\s+File\(\))", text, re.MULTILINE|re.DOTALL))
match_count = 0
for m in matches:
match_count += 1
r = range(bisect.bisect(nl, m.start()-1), bisect.bisect(nl, m.end()-1)+1)
print(re.sub(r"\s+", " ", m.group(1), re.DOTALL), "found on line(s)", *r)
print(f"{match_count} occurrences of new File() found in file....")
output:
new File() found on line(s) 0 1
new File() found on line(s) 2
new File() found on line(s) 3 4
new File() found on line(s) 5
new File() found on line(s) 5 6
new File() found on line(s) 7 8 9 10 11
new File() found on line(s) 13
new File() found on line(s) 13
8 occurrences of new File() found in file....
regex multiline not working on repeated patterns
You should use Lazy quantifiers, as they stop on the first match they find:
\bSTDMETHOD\b((.|\n|\r)*?)\bPURE\b
Tested on Regexr.com
regex matching whitespace characters and multiple optional patterns before start of text
Your pattern matches one instance of # ...
or !mycommand
. One way to solve this problem is to put all of them into one match, and use re.search
to find the first match.
To do this, you need to repeat the part that matches # ...
or !mycommand
using *
:
^\s*^(?:#.*\s*|!mycommand\s*)*
I have also changed #.*
to #.*\s*
so that it goes all the way to the next line where a non-whitespace is found.
Demo
Responding to your comment:
if the string begins with code, this regex should not match anything
You can try:
\A\s*^(?:#.*\s*|!mycommand\s*)+
I changed to \A
so that it only matches the absolute start of the string, instead of start of line. I also changed the last *
to +
so at least one # ...
or !mycommand
has to be present.
Matching regular expression to multiple line blocks in python
There are numerous ways that this could be solved, here is one. I've added the information to a dictionary, where you will get a list of dictionaries as an output.
def parse_doc(filename):
with open(filename) as f:
pattern1 = re.compile(r'<sec name="(\D_\d\d_\w+)"\s+sound_freq="(\D\D\D\d+:\d+-\d+)"')
pattern2 = re.compile(r'<per fre="(Volum_+\d+Kb)"+\svalue="(\d+.+)"')
doc = []
for i in f.readlines():
p1 = re.match(pattern1, i)
p2 = re.match(pattern2, i)
line = {}
if p1:
line.update({'sec': p1.group(1), 'sound_freq': p1.group(2)})
if p2:
line.update({p2.group(1): p2.group(2)})
if len(line)>0:
doc.append(line)
return doc
print(parse_doc('doc.txt'))
Output
[{'sec': 'M_20_K40745170', 'sound_freq': 'mhr17:7907527-7907589'}, {'Volum_5Kb': '89.00'}, {'Volum_40Kb': '00.00'}, {'Volum_70Kb': '77.00'}]
If you want to get all the values you could get it using the following:
def parse_doc_all(filename):
with open(filename) as f:
pattern1 = re.compile(r'(.|\w+)="([^\s]+)"')
doc = {}
for i in f.readlines():
doc.update({p[0]: p[1] for p in re.findall(pattern1, i)})
return doc
print(parse_doc_all('doc.txt'))
Which will give you
{'name': 'M_20_K40745170', 'sound_freq': 'mhr17:7907527-7907589', 'tension': 'SGCGSCGSCGSCGSC', 's_c': '0', 'number': '5748', 'v': '0.1466469683747654', 'y': '0.0', 'units': 'sec', 'first_name': 'g7tty', 'description': 'xyz', 'abc': 'trt', 'id': 'abc', 'fre': 'Volum_70Kb', 'value': '77.00'}
Related Topics
How to Change Python Version in Anaconda Spyder
Counting Non Zero Values in Each Column of a Dataframe in Python
Converting Pandas Column of Comma-Separated Strings into Integers
Find the Index of the First Digit in a String
Add Padding to Images to Get Them into the Same Shape
Get Value of Span Tag Using Beautifulsoup
Python Xlsxwriter Set Border Around Multiple Cells
Python: Read Text File and Split File into List Variables, With Each Variable Having 4 Lines Each
How to Eliminate Null Valued Cells from a CSV Dataset Using Python
How to Update a Pyspark Dataframe With New Values from Another Dataframe
How to Kill a While Loop With a Keystroke
How to Select All Elements Greater Than a Given Values in a Dataframe
Comparing Items in Lists Within Same Indices Python