What is the difference between re.search and re.match?
re.match
is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^
in the pattern.
As the re.match documentation says:
If zero or more characters at the
beginning of string match the regular expression pattern, return a
correspondingMatchObject
instance.
ReturnNone
if the string does not
match the pattern; note that this is
different from a zero-length match.Note: If you want to locate a match
anywhere in string, usesearch()
instead.
re.search
searches the entire string, as the documentation says:
Scan through string looking for a
location where the regular expression
pattern produces a match, and return a
correspondingMatchObject
instance.
ReturnNone
if no position in the
string matches the pattern; note that
this is different from finding a
zero-length match at some point in the
string.
So if you need to match at the beginning of the string, or to match the entire string use match
. It is faster. Otherwise use search
.
The documentation has a specific section for match
vs. search
that also covers multiline strings:
Python offers two different primitive
operations based on regular
expressions:match
checks for a match
only at the beginning of the string,
whilesearch
checks for a match
anywhere in the string (this is what
Perl does by default).Note that
match
may differ fromsearch
even when using a regular expression
beginning with'^'
:'^'
matches only
at the start of the string, or in
MULTILINE
mode also immediately
following a newline. The “match
”
operation succeeds only if the pattern
matches at the start of the string
regardless of mode, or at the starting
position given by the optionalpos
argument regardless of whether a
newline precedes it.
Now, enough talk. Time to see some example code:
# example code:
string_with_newlines = """something
someotherthing"""
import re
print re.match('some', string_with_newlines) # matches
print re.match('someother',
string_with_newlines) # won't match
print re.match('^someother', string_with_newlines,
re.MULTILINE) # also won't match
print re.search('someother',
string_with_newlines) # finds something
print re.search('^someother', string_with_newlines,
re.MULTILINE) # also finds something
m = re.compile('thing$', re.MULTILINE)
print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines,
re.MULTILINE) # also matches
Python: Difference between re.match(pattern) v/s re.search('^' + pattern)
You should take a look at Python's re.search()
vs. re.match()
document which clearly mentions about the other difference which is:
Note however that in
MULTILINE
modematch()
only matches at the beginning of the string, whereas usingsearch()
with a regular expression beginning with '^' will match at the beginning of each line.
>>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
<_sre.SRE_Match object; span=(4, 5), match='X'>
The first difference (for future readers) being:
Python offers two different primitive operations based on regular expressions:
re.match()
checks for a match only at the beginning of the string, whilere.search()
checks for a match anywhere in the string (this is what Perl does by default).For example:
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<_sre.SRE_Match object; span=(2, 3), match='c'>
Regular expressions beginning with '^' can be used with
search()
to restrict the match at the beginning of the string:
>>> re.match("c", "abcdef") # No match
>>> re.search("^c", "abcdef") # No match
>>> re.search("^a", "abcdef") # Match
<_sre.SRE_Match object; span=(0, 1), match='a'>
Differences between re.match, re.search, re.fullmatch
Giving credit for @Ruzihm's answer since parts of my answer derive from his.
Quick overview
A quick rundown of the differences:
re.match
is anchored at the start^pattern
- Ensures the string begins with the pattern
re.fullmatch
is anchored at the start and end of the pattern^pattern$
- Ensures the full string matches the pattern (can be especially useful with alternations as described here)
re.search
is not anchoredpattern
- Ensures the string contains the pattern
A more in-depth comparison of re.match
vs re.search
can be found here
With examples:
aa # string
a|aa # regex
re.match: a
re.search: a
re.fullmatch: aa
ab # string
^a # regex
re.match: a
re.search: a
re.fullmatch: # None (no match)
So what about \A
and \Z
anchors?
The documentation states the following:
Python offers two different primitive operations based on regular
expressions:re.match()
checks for a match only at the beginning of
the string, whilere.search()
checks for a match anywhere in the
string (this is what Perl does by default).
And in the Pattern.fullmatch
section it says:
If the whole string matches this regular expression, return a corresponding match object.
And, as initially found and quoted by Ruzihm in his answer:
Note however that in MULTILINE mode match() only matches at the
beginning of the string, whereas using search() with a regular
expression beginning with^
will match at the beginning of each
line.>>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
<re.Match object; span=(4, 5), match='X'>
\A^A
B
X$\Z
# re.match('X', s) no match
# re.search('^X', s) no match
# ------------------------------------------
# and the string above when re.MULTILINE is enabled effectively becomes
\A^A$
^B$
^C$\Z
# re.match('X', s, re.MULTILINE) no match
# re.search('^X', s, re.MULTILINE) match X
With regards to \A
and \Z
, neither performs differently for re.MULTILINE
since \A
and \Z
are effectively the only ^
and $
in the whole string.
So using \A
and \Z
with any of the three methods yields the same results.
Answer (line anchors vs string anchors)
What this tells me is that re.match
and re.fullmatch
don't match line anchors ^
and $
respectively, but that they instead match string anchors \A
and \Z
respectively.
Trying to understand re.search() vs re.findall()
The method re.findall
returns a list of matched substrings, but the method re.search
returns a match
object, you can recover the full matched substring like this.
b.group() # 'Eventin queue contains 5 elements, first element is 20 minutes old'
What you were seeing, <_sre.SRE_Match object; span=(0, 66), match='Eventin queue contains 5 elements, first element >
, is only a representation of the object.
same result from both re.search() and re.match() are the same, but not the same by comparison operator
Because the Match type does not have a custom __eq__
method, the equality operation will always return False, unless it's the exact same Match instance.
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality.
https://docs.python.org/3/reference/expressions.html#value-comparisons
Every time you call re.match or re.search, the return value will be a different Match object, even when the input data is exactly the same.
>>> needle, haystack = 's', 'spam'
>>> re.match(needle, haystack) == re.match(needle, haystack)
False
Why use re.match(), when re.search() can do the same thing?
"Why" questions are hard to answer. As a matter of fact, you could define the function re.match()
like this:
def match(pattern, string, flags):
return re.search(r"\A(?:" + pattern + ")", string, flags)
(because \A
always matches at the start of the string, regardless of the re.M
flag status´).
So re.match
is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches()
which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).
It's different for the match
and search
methods of regex objects, though, as Eric has pointed out.
Python regex - understanding the difference between match and search
When calling the function re.match
specifically, the ^
character does have little meaning because this function begins the matching process at the beginning of the line. However, it does have meaning for other functions in the re module, and when calling match on a compiled regular expression object.
For example:
text = """\
Mares eat oats
and does eat oats
"""
print re.findall('^(\w+)', text, re.MULTILINE)
This prints:
['Mares', 'and']
With a re.findall()
and re.MULTILINE
enabled, it gives you the first word (with no leading whitespace) on each line of your text.
It might be useful if doing something more complex, like lexical analysis with regular expressions, and passing into the compiled regular expression a starting position in the text it should start matching at (which you can choose to be the ending position from the previous match). See the documentation for RegexObject.match method.
Simple lexer / scanner as an example:
text = """\
Mares eat oats
and does eat oats
"""
pattern = r"""
(?P<firstword>^\w+)
|(?P<lastword>\w+$)
|(?P<word>\w+)
|(?P<whitespace>\s+)
|(?P<other>.)
"""
rx = re.compile(pattern, re.MULTILINE | re.VERBOSE)
def scan(text):
pos = 0
m = rx.match(text, pos)
while m:
toktype = m.lastgroup
tokvalue = m.group(toktype)
pos = m.end()
yield toktype, tokvalue
m = rx.match(text, pos)
for tok in scan(text):
print tok
which prints
('firstword', 'Mares')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')
('firstword', 'and')
('whitespace', ' ')
('word', 'does')
('whitespace', ' ')
('word', 'eat')
('whitespace', ' ')
('lastword', 'oats')
('whitespace', '\n')
This distinguishes between types of word; a word at the beginning of a line, a word at the end of a line, and any other word.
re.match versus re.findall
re.match
matches the pattern from the start of the string. re.findall
however searches for occurrences of the pattern anywhere in the string.
If you have the pattern "mail failure"
and the string:
subject = "=?UTF-8?B?0JLQsNGI0LUg0YHQvtC+0LHRidC10L3QuNC1INC90LUg0LTQvtGB0YLQsNCy0LvQtdC90L4=?=. Mail failure."
re.match
will return None
because the string does not start with "mail failure"
. re.findall
though will return a match because the string contains "mail failure"
.
re.match vs re.search
In your match, the first .* is greedy, it is matching as much as it can, including numbers.
If you make it less greedy, it will work:
.*?([0-9]{1,})Y.*
(PS I think this greedy issue doesn't make it a fair comparison of re.search and re.match)
Related Topics
Strange Result When Removing Item from a List While Iterating Over It
How to Make a Dictionary from Separate Lists of Keys and Values
Difference Between Modes A, A+, W, W+, and R+ in Built-In Open Function
How to Do Relative Imports in Python
What Are Iterator, Iterable, and Iteration
How to Pass Arguments to a Button Command in Tkinter
Why Do People Write #!/Usr/Bin/Env Python on the First Line of a Python Script
Drop All Duplicate Rows Across Multiple Columns in Python Pandas
Why Does My Recursive Function Return None
Why Does This Unboundlocalerror Occur (Closure)
Difference Between Python'S List Methods Append and Extend
How to Flush the Output of the Print Function
Return Json Response from Flask View