Python - Remove any element from a list of strings that is a substring of another element
First building block: substring.
You can use in
to check:
>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False
Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.
def substringSieve(string_list):
out = []
for s in string_list:
if not any([s in r for r in string_list if s != r]):
out.append(s)
return out
You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):
def substringSieve(string_list):
string_list.sort(key=lambda s: len(s), reverse=True)
out = []
for s in string_list:
if not any([s in o for o in out]):
out.append(s)
return out
Remove items in a list based on a list of substrings in another list in python3.x
Using Regex.
Ex:
import re
list1 = ['lunch time', 'sandwich shop', 'starts at noon','grocery store']
list2 = ['lunch','noon']
pattern = re.compile(r"|".join(list2))
print([i for i in list1 if not pattern.search(i)])
Output:
['sandwich shop', 'grocery store']
Removing substrings in list of list of strings, maintain order - Python
Instead of removing elements from a list, why not create a new one matching your requirements (since being safer)?
# method to filter out substrings
def substr_in_list(elem, lst):
for s in lst:
if elem != s and elem in s:
return True
return False
words = [[j for j in i if not substr_in_list(j, i)] for i in words]
Output :
[['gamma_ray_bursts', 'merger', 'death', 'throes', 'magnetic_flares', 'neutrino_antineutrino', 'objections', 'double_neutron_star', 'parker_instability', 'positrons'], ['dot', 'gravitational_lensing', 'splittings', 'limits', 'amplifications', 'time_delays', 'extracting_information', 'fix', 'distant_quasars'], ['recoil', 'gamma_ray_bursts', 'neutron_stars', 'jennings', 'possible_origins', 'birthplaces', 'disjoint', 'arrival_directions'], ['sn_sn', 'type_ii_supernovae', 'distances', 'dilution', 'extinction', 'extragalactic_distance_scale', 'expanding_photosphere', 'photospheres', 'supernovae_sn', 'span_wide_range'], ['photon_pair', 'high_energy', 'gamma_ray_burst', 'optical_depth', 'absorbing_medium', 'implications', 'problem', 'annihilation_radiation', 'emergent_spectrum', 'limit', 'radiation_transfer', 'collimation', 'regions']]
Python: Remove Strings in a List that are contained by at least one other String in the same List
Quite optimized function with 2 loops, which saves a lot of loop iterations:
def filterlist(l):
# keep track of elements, which will be deleted
deletelist = [False for _ in l]
for i, el in enumerate(l):
# already in deletelist, jump right to the next el
if deletelist[i]:
continue
for j, el2 in enumerate(l):
# comparing item to itself or el2 already in deletelist?
# jump to next el2
if i == j or deletelist[j]:
continue
# the comparison everyone expects
if el in el2:
deletelist[j] = True
# also, check the other way around
# will save loop iterations later
elif el2 in el:
deletelist[i] = True
break # causes jump to next el
# create new list, keep elements that are not in deletelist
return [el for i, el in enumerate(l) if not deletelist[i]]
Usually built-in functions are faster, so let's compare it to Ed Ward's solution:
# result of Ed Ward's solution using timeit:
100000 loops, best of 10: 5.38 usec per loop
# filterlist function with loops using timeit:
100000 loops, best of 10: 4.42 usec per loop
Interesting, but to get a really representative result, you should run timeit with a larger element list.
find and remove some substrings from a long list of string in python
Create a string with all the special characters you'd like to remove, and strip them off the right side:
strings = ['short', 'club', 'edit', 'post\C2', 'le\C3', 'lundi', 'janvier', '2008']
special = ''.join(['\C2','\C3','\E2']) # see note
Note at this point that \
is a special character and you should escape it whenever you use it, to avoid ambiguity. You can also simply create a string literal rather than using str.join
.
special = '\\C2\\C3\\E2' # that's better
strings[:] = [item.rstrip(special) for item in strings]
Related Topics
How to Share Single Sqlite Connection in Multi-Threaded Python Application
Pickle - Cpickle.Unpicklingerror: Invalid Load Key, '?'
How to Ignore Null Byte When Reading a CSV File
Printing Simple Diamond Pattern in Python
How to Wait Until I Receive Data Using a Python Socket
How to Create a Common Function to Execute a Python Script in Jenkins
Masking Horizontal and Vertical Lines With Open Cv
Split String At Nth Occurrence of a Given Character
I Need to Code a 1 22 333 4444 Pattern in Python With While Loops
How to Automatically Download Files from a Pop Up Dialog Using Selenium-Python
How to Enable Autocomplete (Intellisense) for Python Package Modules
_Tkinter.Tclerror: No Display Name and No $Display Environment Variable
Is There a Memory Efficient and Fast Way to Load Big Json Files
How to Read Image Data from a Url in Python
How to Close an Internet Tab With Cmd/Python
Webscraping Financial Data from Morningstar