How to Match Any String from a List of Strings in Regular Expressions in Python

How to match any string from a list of strings in regular expressions in python?

Join the list on the pipe character |, which represents different options in regex.

string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)

Output: ['fun']

You cannot use match as it will match from start.
Using search you will get only the first match. So use findall instead.

Also use lookahead if you have overlapping matches not starting at the same point.

Part of string matches any element of a list using regex

You can join and interpolate the list.

regex = r"\d*_(" + '|'.join(lst) + r")_\d*"

As an aside, that's equivalent to

regex = r"_(" + '|'.join(lst) + r")_"

If something may or may not be there at the boundary of the match, you can simplify by taking it out; the regex will still match whether or not it's there. (If you are capturing or anchoring the match, then of course these are necessary for other reasons.)

The third-party regex library lets you say

regex = r"\d*_L<lst>_\d*"

where again of course the \d* are redundant.
You need to pip install regex and then obvisously import regex instead of import re.

Matching list of regular expression to list of strings

Use any() to test if any of the regular expressions match, rather than looping over the entire list.

Compile all the regular expressions first, so this doesn't have to be done repeatedly.

reg_list = [re.compile(rx) for rx in reg_list]

for word in y:
if any(rx.search(word) for rx in reg_list):
RESULT_LIST.append(word)

If any strings in a list match regex

You can use the builtin any():

r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()

Passing in the lazy generator to any() will allow it to exit on the first match without having to check any farther into the iterable.

Regular Expressions: Search in list

You can create an iterator in Python 3.x or a list in Python 2.x by using:

filter(r.match, list)

To convert the Python 3.x iterator to a list, simply cast it; list(filter(..)).

using regex to extract characters of a list of strings

Using \d+\n\d+\n\d+ matches 1+ digits only followed by a newline in that order. To match numbers with an optional decimal part, you can use \d+(?:\.\d+)?

For the first list, there are digits at the start of the string, where there are also lines that do not contain digits at all.

If you want to match all those numbers, regardless of the format, you can match the number from the start of the string

^\d+(?:\.\d+)?

Regex demo

Example

import re

lst1 = ['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']
pattern1 = r"^\d+(?:\.\d+)?"
for s in lst1:
print(re.findall(pattern1, s, re.M))

Output

['5.10', '1.30', '9.50']
['1.01', '8.75', '100.00']
['1.85', '3.25', '4.25']
['1.60', '3.65', '5.10']

The second list has digits followed by newlines and digits. To get the first 3 numbers you can use 3 capture groups:

^(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)

Regex demo

Example

lst2 = ['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']

pattern2 = r"^(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)\n(\d+(?:\.\d+)?)"
for s in lst2:
print(re.findall(pattern2, s))

Output

[('9.25', '4.05', '1.45')]
[('1.32', '4.60', '18.0')]
[('2.45', '2.65', '3.80')]
[('1.75', '3.75', '4.65')]

How to match any string from a list of strings in regular expressions in python?

Join the list on the pipe character |, which represents different options in regex.

string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)

Output: ['fun']

You cannot use match as it will match from start.
Using search you will get only the first match. So use findall instead.

Also use lookahead if you have overlapping matches not starting at the same point.

Find words in a list that match the input string using Regular Expressions

If you don't need to find overlapping matches, you can turn the list into a regular expression that uses | to match alternatives. Then use re.findall() to get all the matches.

import re

words = ["123","hello","nice","red","boy"]
string = "helloniceboy"
regex = re.compile('|'.join(re.escape(x) for x in words))
result = re.findall(regex, string)

re.escape() ensures that the words will be matched literally, even if they contain characters that have special meaning in regular expressions.

If you do need to find overlapping matches, the other answer that uses if word in input in a loop will work better.

How to match two lists of strings with regex in Python

You need to test whether any of the strings in buf match each regex in desired, and then return True if all of them do:

import re

buf = ["horse101", "elephant5", "dog64", "mouse90", "cat52"]
desired = ["cat52", "dog[0-9]+"]

print(all(any(re.match(d + '$', b) for b in buf) for d in desired))

Output:

True

Note that we add $ to the regex so that (for example) dog[0-9]+ will not match dog4a (adding ^ to the beginning is not necessary as re.match anchors matches to the start of the string).



Related Topics



Leave a reply



Submit