How to retrieve partial matches from a list of strings
startswith
andin
, return a Boolean.- The
in
operator is a test of membership. - This can be performed with a
list-comprehension
orfilter
. - Using a
list-comprehension
, within
, is the fastest implementation tested. - If case is not an issue, consider mapping all the words to lowercase.
l = list(map(str.lower, l))
.
- Tested with python 3.10.0
filter
:
- Using
filter
creates afilter
object, solist()
is used to show all the matching values in alist
.
l = ['ones', 'twos', 'threes']
wanted = 'three'
# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))
# using in
result = list(filter(lambda x: wanted in x, l))
print(result)
[out]:
['threes']
list-comprehension
l = ['ones', 'twos', 'threes']
wanted = 'three'
# using startswith
result = [v for v in l if v.startswith(wanted)]
# using in
result = [v for v in l if wanted in v]
print(result)
[out]:
['threes']
Which implementation is faster?
- Tested in Jupyter Lab using the
words
corpus fromnltk v3.6.5
, which has 236736 words - Words with
'three'
['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']
from nltk.corpus import words
%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
[out]:
64.8 ms ± 856 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(filter(lambda x: wanted in x, words.words()))
[out]:
54.8 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [v for v in words.words() if v.startswith(wanted)]
[out]:
57.5 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [v for v in words.words() if wanted in v]
[out]:
50.2 ms ± 791 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Python matching partial strings in list elements between two lists
You don't want to remove elements from the list you are iterating in. Instead, you can add a condition to verify if the matched word has already been added to your output list.
It should be something like:
lst = []
for i in match:
has_match = False
for j in data:
if i.split()[0] in j:
has_match = True
print(i, j)
if j not in lst:
lst.append(j)
if len(i) > 1:
k = ' '.join(i.split()[:2])
if k in j:
has_match = True
print(i, j)
if j not in lst:
lst.append(j)
if not has_match:
lst.append(i + ' - not found')
I also removed the break
keywords, since they may stop your code from finding matches in multiple strings in data
. Using a boolean should do the work. Let us know if you have further questions.
Finding partial string matches between list and elements of list of lists
I want to suggest a solution to your problem.
Firstly, we create function that recognizes if a word is a substring of any word in another list:
def is_substring_of_element_in_list(word, list_of_str):
if len(list_of_str) == 0:
return (False, -1)
is_sub = any([word in s for s in list_of_str])
if (is_sub == True):
ix = [word in s for s in list_of_str].index(True)
else:
ix = -1
return is_sub, ix
Now, we can use this function to check if each word from the test list is a substring of a word on your list. Notice, we can use every word only once so we need to remove a string if a given word is a substring of.
def is_list_is_in_mylist(t, mylist):
mylist_now = sorted(mylist, key=len)
test_now = sorted(t, key=len)
counter = 0
for word in t:
is_sub, index = is_substring_of_element_in_list(word, mylist_now)
if is_sub:
mylist_now.pop(index)
test_now.remove(word)
counter += 1
if counter == len(t) and counter == len(mylist):
print("success")
else:
print("fail")
Pay attention, we need to sort the elements in the list to avoiding mistakes caused by the order of the words. For example, if my_list = ['f', 'foo']
and test1 = ['f', 'foo']
and test2 = ['foo', 'f']
without sorting, one of the success and the other will be faild.
Now, you can iterate over your test with simple for loop:
for t in test:
is_list_is_in_mylist(t, mylist)
Finding partial matches in a list of lists in Python
My guess is, you're just not matching the second condition properly e.g. if you do something like this:
'127.0.0.1' in i and 'Misconfiguration' in i
but i
looks like:
['2014', '127.0.0.1', '127', 'DNS sever Misconfiguration']
then '127.0.0.1'
will be in i
, but 'Misconfiguration'
won't - because it's a list, and in
for lists is exact match, but what you're looking for is a substring of an element of i
. If these are consistent, you can do something like:
'127.0.0.1' in i and 'Misconfiguration' in i[3]
or if they aren't, and you have to substring check all entries:
'127.0.0.1' in i and any('Misconfiguration' in x for x in i)
should do it. That will substring check each item in i
for your search term.
Related Topics
How to Share Single Sqlite Connection in Multi-Threaded Python Application
Python - Split a List of Dicts into Individual Dicts
Create an Array With a Pre Determined Mean and Standard Deviation
Sqlalchemy, Prevent Duplicate Rows
Unable Log in to the Django Admin Page With a Valid Username and Password
How to Plot in Real-Time in a While Loop Using Matplotlib
How to Calculate a Gaussian Kernel Matrix Efficiently in Numpy
Jupyter Notebook, Python3 Print Function: No Output, No Error
Capturing Video from Two Cameras in Opencv At Once
Incorrect Column Alignment When Printing Table in Python Using Tab Characters
How to Enable Autocomplete (Intellisense) for Python Package Modules
Split List into Lists Based on a Character Occurring Inside of an Element
How to Download Multiple Files or an Entire Folder from Google Colab
Using Continue in a Try and Except Inside While-Loop
How to Smooth a Curve in the Right Way
Read Merged Cells in Excel With Python