re.findall not returning full match?
The problem you have is that if the regex that re.findall
tries to match captures groups (i.e. the portions of the regex that are enclosed in parentheses), then it is the groups that are returned, rather than the matched string.
One way to solve this issue is to use non-capturing groups (prefixed with ?:
).
>>> import re
>>> s = 'size=50;size=51;'
>>> re.findall('size=(?:50|51);', s)
['size=50;', 'size=51;']
If the regex that re.findall
tries to match does not capture anything, it returns the whole of the matched string.
Although using character classes might be the simplest option in this particular case, non-capturing groups provide a more general solution.
Why does findall not return the whole match when matching with a group?
You should use re.finditer
instead of re.findall
and then print the whole matching group:
>>> for m in re.finditer('(ra|RA)[a-zA-Z0-9]*',"RAJA45909"):
... print(m.group())
...
RAJA45909
The documentation of findall
says:
If one or more groups are present in the pattern, return a list of
groups; this will be a list of tuples if the pattern has more than one
group.
Your regex has only one group and thus the result is a list of texts matched by that single group. If we add an other group you see:
>>> for m in re.findall('(ra|RA)([a-zA-Z0-9]*)',"RAJA45909"):
... print(m)
...
('RA', 'JA45909')
So findall
when used with groups matches the whole regex but only returns the portions matched by the groups. While finditer
always returns a complete match object.
python regex - findall not returning output as expected
When you use parentheses in your regex, re.findall()
will return only the parenthesized groups, not the entire matched string. Put a ?:
after the (
to tell it not to use the parentheses to extract a group, and then the results should be the entire matched string.
re.findall not returning correct results
As per re.findall
documentation:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
So, turn all capturing groups into non-capturing or remove them if possible (here, it is best to remove them as they are just redundant):
macs = re.findall(r"[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}",test)
Python regex findall function only returning matchings on groups instead of full string
Use '(b(.a)*)'
as your regex pattern instead. You need result[0]
in the following example.
import re
result = re.findall('(b(.a)*)', 'bcacaca')
result
Output:
[('bcacaca', 'ca')]
A Better Option - Using a Non-capturing Group
As @Nick mentioned, a non-capturing group could be used here as follows. Consider the following scenario. For step-by-step explanation see the next section. Also, I encourage you to use this resource: regex101.com.
## Define text and pattern
text = 'bcacaca dcaca dbcaca'
pattern = 'b?(?:.a)*'
## Evaluate regex
result = re.findall(pattern, text)
# output
# ['bcacaca', '', '', 'caca', '', '', 'bcaca', '']
## Drop empty strings from result
result = list(filter(None, result))
# output
# ['bcacaca', 'caca', 'bcaca']
Explanation for Using a Non-capturing Group
References
- Remove empty strings from a list of strings
How to return a string if a re.findall finds no match
You could do this in a single line:
results += re.findall(pattern, extracted_string) or ["Error"]
BTW, you get no benefit from compiling the pattern inside the vendor loop because you're only using it once.
Your function could also return the whole search result using a single list comprehension:
return [m for v in vendor for m in re.findall(v, extracted_string) or ["Error"]]
It is a bit weird that you would actually want to modify AND return the results list being passed as parameter. This may produce some unexpected side effects when you use the function.
Your "Error" flag may appear several times in the result list, and given that each pattern may return multiple matches, it will be hard to determine which pattern failed to find a value.
If you only want to signal an error when none of the vendor patterns match, you could use the or ["Error"]
trick on whole result:
return [m for v in vendor for m in re.findall(v, extracted_string)] or ["Error"]
Why re.findall does not find the match in this case?
Your code works and finds all - you just misunderstand regex GROUPs and its usage when calling findall:
# code partially generated by regex101.com to demonstrate the issue
# see https://regex101.com/r/Gngy0r/1
import re
regex = r"\s([0-9A-Z]+\w*)\s+\S*?[Aa]lloy\s"
test_str = " 1AZabc sdfsdfAlloy "
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# use findall and print its results
print(re.findall(regex, test_str))
Output:
# full match that you got
Match 1 was found at 0-20: 1AZabc sdfsdfAlloy
# and what was captured
Group 1 found at 1-7: 1AZabc
# findall only gives you the groups ...
['1AZabc']
Either remove the ( ) or put all into () that you are interested in:
regex = r"\s([0-9A-Z]+\w*\s+\S*?[Aa]lloy)\s"
Related Topics
How to Pass Extra Arguments to a Python Decorator
Subprocess: Deleting Child Processes in Windows
How to Keep Index When Using Pandas Merge
How Can This Function Be Rewritten to Implement Ordereddict
Correct Way to Define Class Variables in Python
Difference Between Returns and Printing in Python
What Is the Current Choice for Doing Rpc in Python
Opencv Error: (-215)Size.Width>0 && Size.Height>0 in Function Imshow
Call a Python Function from Jinja2
Python List Directory, Subdirectory, and Files
How to Install Python 3.X and 2.X on the Same Windows Computer
How to Include Related Model Fields Using Django Rest Framework
Comprehension for Flattening a Sequence of Sequences
Using Multipartposthandler to Post Form-Data with Python
How to Get Flask to Run on Port 80