How to match any string from a list of strings in regular expressions in python?
Join the list on the pipe character |
, which represents different options in regex.
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."
print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)
Output: ['fun']
You cannot use match
as it will match from start.
Using search
you will get only the first match. So use findall
instead.
Also use lookahead if you have overlapping matches not starting at the same point.
Part of string matches any element of a list using regex
You can join and interpolate the list.
regex = r"\d*_(" + '|'.join(lst) + r")_\d*"
As an aside, that's equivalent to
regex = r"_(" + '|'.join(lst) + r")_"
If something may or may not be there at the boundary of the match, you can simplify by taking it out; the regex will still match whether or not it's there. (If you are capturing or anchoring the match, then of course these are necessary for other reasons.)
The third-party regex
library lets you say
regex = r"\d*_L<lst>_\d*"
where again of course the \d*
are redundant.
You need to pip install regex
and then obvisously import regex
instead of import re
.
Matching list of regular expression to list of strings
Use any()
to test if any of the regular expressions match, rather than looping over the entire list.
Compile all the regular expressions first, so this doesn't have to be done repeatedly.
reg_list = [re.compile(rx) for rx in reg_list]
for word in y:
if any(rx.search(word) for rx in reg_list):
RESULT_LIST.append(word)
If any strings in a list match regex
You can use the builtin any()
:
r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()
Passing in the lazy generator to any()
will allow it to exit on the first match without having to check any farther into the iterable.
Regular Expressions: Search in list
You can create an iterator in Python 3.x or a list in Python 2.x by using:
filter(r.match, list)
To convert the Python 3.x iterator to a list, simply cast it; list(filter(..))
.
using regex to extract characters of a list of strings
Using \d+\n\d+\n\d+
matches 1+ digits only followed by a newline in that order. To match numbers with an optional decimal part, you can use \d+(?:\.\d+)?
For the first list, there are digits at the start of the string, where there are also lines that do not contain digits at all.
If you want to match all those numbers, regardless of the format, you can match the number from the start of the string
^\d+(?:\.\d+)?
Regex demo
Example
import re
lst1 = ['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']
pattern1 = r"^\d+(?:\.\d+)?"
for s in lst1:
print(re.findall(pattern1, s, re.M))
Output
['5.10', '1.30', '9.50']
['1.01', '8.75', '100.00']
['1.85', '3.25', '4.25']
['1.60', '3.65', '5.10']
The second list has digits followed by newlines and digits. To get the first 3 numbers you can use 3 capture groups:
^(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)
Regex demo
Example
lst2 = ['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']
pattern2 = r"^(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)"
for s in lst2:
print(re.findall(pattern2, s))
Output
[('9.25', '4.05', '1.45')]
[('1.32', '4.60', '18.0')]
[('2.45', '2.65', '3.80')]
[('1.75', '3.75', '4.65')]
How to match any string from a list of strings in regular expressions in python?
Join the list on the pipe character |
, which represents different options in regex.
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."
print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)
Output: ['fun']
You cannot use match
as it will match from start.
Using search
you will get only the first match. So use findall
instead.
Also use lookahead if you have overlapping matches not starting at the same point.
Find words in a list that match the input string using Regular Expressions
If you don't need to find overlapping matches, you can turn the list into a regular expression that uses |
to match alternatives. Then use re.findall()
to get all the matches.
import re
words = ["123","hello","nice","red","boy"]
string = "helloniceboy"
regex = re.compile('|'.join(re.escape(x) for x in words))
result = re.findall(regex, string)
re.escape()
ensures that the words will be matched literally, even if they contain characters that have special meaning in regular expressions.
If you do need to find overlapping matches, the other answer that uses if word in input
in a loop will work better.
How to match two lists of strings with regex in Python
You need to test whether any
of the strings in buf
match each regex in desired
, and then return True
if all
of them do:
import re
buf = ["horse101", "elephant5", "dog64", "mouse90", "cat52"]
desired = ["cat52", "dog[0-9]+"]
print(all(any(re.match(d + '$', b) for b in buf) for d in desired))
Output:
True
Note that we add $
to the regex so that (for example) dog[0-9]+
will not match dog4a
(adding ^
to the beginning is not necessary as re.match
anchors matches to the start of the string).
Related Topics
Type Object 'Datetime.Datetime' Has No Attribute 'Datetime'
Using Lxml and Iterparse() to Parse a Big (+- 1Gb) Xml File
How to Write Tests for the Argparse Portion of a Python Module
Is There Any Built-In Way to Get the Length of an Iterable in Python
How to Import a Text File on Aws S3 into Pandas Without Writing to Disk
How to Break Up This Long Line in Python
Does Tkinter Have a Table Widget
Get the Position of the Largest Value in a Multi-Dimensional Numpy Array
Securely Storing Environment Variables in Gae with App.Yaml
Having Trouble Making a List of Lists of a Designated Size
Comparing Numpy Arrays Containing Nan
Why Return Notimplemented Instead of Raising Notimplementederror
Python Requests.Exceptions.Sslerror: Eof Occurred in Violation of Protocol
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
Reduce Left and Right Margins in Matplotlib Plot