Python Searching for Partial Matches in a List

How to retrieve partial matches from a list of strings

startswith and in, return a Boolean.
The in operator is a test of membership.
This can be performed with a list-comprehension or filter.
Using a list-comprehension, with in, is the fastest implementation tested.
If case is not an issue, consider mapping all the words to lowercase.
- l = list(map(str.lower, l)).
Tested with python 3.10.0

`filter`:

Using filter creates a filter object, so list() is used to show all the matching values in a list.

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

`list-comprehension`

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

Which implementation is faster?

Tested in Jupyter Lab using the words corpus from nltk v3.6.5, which has 236736 words
Words with 'three'
- ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']

from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
[out]:
64.8 ms ± 856 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(filter(lambda x: wanted in x, words.words()))
[out]:
54.8 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if v.startswith(wanted)]
[out]:
57.5 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if wanted in v]
[out]:
50.2 ms ± 791 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Python matching partial strings in list elements between two lists

You don't want to remove elements from the list you are iterating in. Instead, you can add a condition to verify if the matched word has already been added to your output list.

It should be something like:

lst = []
for i in match:
    has_match = False
    for j in data:
        if i.split()[0] in j:
            has_match = True
            print(i, j)
            if j not in lst:
                lst.append(j)
        if len(i) > 1:
            k = ' '.join(i.split()[:2])
            if k in j:
                has_match = True
                print(i, j)
                if j not in lst:
                    lst.append(j)
    if not has_match:
        lst.append(i + ' - not found')

I also removed the break keywords, since they may stop your code from finding matches in multiple strings in data. Using a boolean should do the work. Let us know if you have further questions.

Finding partial string matches between list and elements of list of lists

I want to suggest a solution to your problem.

Firstly, we create function that recognizes if a word is a substring of any word in another list:

def is_substring_of_element_in_list(word, list_of_str):
    if len(list_of_str) == 0:
        return (False, -1)
    is_sub = any([word in s for s in list_of_str])
    if (is_sub == True):
        ix = [word in s for s in list_of_str].index(True)
    else: 
        ix = -1
    return is_sub, ix

Now, we can use this function to check if each word from the test list is a substring of a word on your list. Notice, we can use every word only once so we need to remove a string if a given word is a substring of.

def is_list_is_in_mylist(t, mylist):
    mylist_now = sorted(mylist, key=len)
    test_now = sorted(t, key=len)
    counter = 0
    for word in t:
        is_sub, index = is_substring_of_element_in_list(word, mylist_now)
        if is_sub:
            mylist_now.pop(index)
            test_now.remove(word)
            counter += 1
    if counter == len(t) and counter == len(mylist):
        print("success")
    else:
        print("fail")

Pay attention, we need to sort the elements in the list to avoiding mistakes caused by the order of the words. For example, if my_list = ['f', 'foo'] and test1 = ['f', 'foo'] and test2 = ['foo', 'f'] without sorting, one of the success and the other will be faild.

Now, you can iterate over your test with simple for loop:

for t in test:
    is_list_is_in_mylist(t, mylist)

Finding partial matches in a list of lists in Python

My guess is, you're just not matching the second condition properly e.g. if you do something like this:

'127.0.0.1' in i and 'Misconfiguration' in i

but i looks like:

['2014', '127.0.0.1', '127', 'DNS sever Misconfiguration']

then '127.0.0.1' will be in i, but 'Misconfiguration' won't - because it's a list, and in for lists is exact match, but what you're looking for is a substring of an element of i. If these are consistent, you can do something like:

'127.0.0.1' in i and 'Misconfiguration' in i[3]

or if they aren't, and you have to substring check all entries:

'127.0.0.1' in i and any('Misconfiguration' in x for x in i)

should do it. That will substring check each item in i for your search term.

Python Searching for Partial Matches in a List

How to retrieve partial matches from a list of strings

`filter`:

`list-comprehension`

Which implementation is faster?

Python matching partial strings in list elements between two lists

Finding partial string matches between list and elements of list of lists

Finding partial matches in a list of lists in Python

Related Topics

Leave a reply

How to retrieve partial matches from a list of strings

filter:

list-comprehension

Which implementation is faster?

Python matching partial strings in list elements between two lists

Finding partial string matches between list and elements of list of lists

Finding partial matches in a list of lists in Python

Related Topics

Leave a reply

`filter`:

`list-comprehension`