Removing List of Words from a String

Removing list of words from a string

This is one way to do it:

query = 'What is hello'
stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
querywords = query.split()

resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)

print(result)

I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.

How to strip a specific word from a string?

Use str.replace.

>>> papa.replace('papa', '')
' is a good man'
>>> app.replace('papa', '')
'app is important'

Alternatively use re and use regular expressions. This will allow the removal of leading/trailing spaces.

>>> import re
>>> papa = 'papa is a good man'
>>> app = 'app is important'
>>> papa3 = 'papa is a papa, and papa'
>>>
>>> patt = re.compile('(\s*)papa(\s*)')
>>> patt.sub('\\1mama\\2', papa)
'mama is a good man'
>>> patt.sub('\\1mama\\2', papa3)
'mama is a mama, and mama'
>>> patt.sub('', papa3)
'is a, and'

python: how to remove words from a string

There are a few ways to go about doing this, and I'll address 2. One is to split up the string by words and compare word by word against the string that you want to remove words from. The other is to scan the string for each grouping of those characters. I'll give an example of each with their advantages and disadvantages.

The first way is to split the list by words. This is good because it goes over the whole list, and you can use a list comprehension to pull out just the values you want, however, as written it only splits on spaces, so it would miss anything that is touching punctuation. This question addresses how to avoid that problem so that this answer could work.

your_string = "it's a toy,isn't a tool.i don't know anything."
removal_list = ["it's","didn't","isn't","don't"]

edit_string_as_list = your_string.split()

final_list = [word for word in edit_string_as_list if word not in removal_list]

final_string = ' '.join(final_list)

The second option is to remove all instances of those terms in the string as is. This is good because it can avoid the punctuation problems, but it does have a drawback; if you remove something and it is part of another word, that part will be removed (For example, if you have a string with the word "sand" in it and try to remove "and" it will remove the "and" from "sand" and leave "s" in the string.)

your_string = "it's a toy,isn't a tool.i don't know anything."
removal_list = ["it's","didn't","isn't","don't"]

for word in removal_list:
your_string = your_string.replace(word, "")

I hope one of these solutions meets your needs.

Python - Remove target words from each string in a list

You're taking each string in the list, and removing each word separately, and then appending those to new_list. Instead what you need to do is remove those specific words and then add it to new_list. This can simply be accomplished with a bit of reorganizing

new_list = []

for i in my_list:
x = i

for word in remove:
x = x.replace(word, "")

new_list.append(x)

However, this will remove occurrences within a word, not only whole words. Removing only whole words can be accomplished with some more logic such as

new_list = []

for i in my_list:
x = i.split()

new_list.append(" ".join(a if a not in remove else '' for a in x))

This one is a bit more complicated, but it's splitting each string into a list and using list comprehension to form a new list that has all the words to be removed filtered out, and then joins those together with a space. This could also possibly be done with a map. Note, this will cause double spaces where removed words were, which can be remedied with an addition such as

" ".join(a if a not in remove else '' for a in x)).replace("  ", " ")

Remove specific words from a string in an efficient way

Make an array or Set of the strings you want to remove, then filter by whether the word being iterated over is in the Set.

const input = ["select from table order by asc limit 10 no binding"]
const wordsToExclude = new Set(['limit', 'order', 'by', 'asc', '10']);
const words = input[0].split(' ').filter(word => !wordsToExclude.has(word));
console.log(words);

How to remove list of words from a list of strings

Here is my stab at it. This uses regular expressions.

import re
pattern = re.compile("(of|the|in|for|at)\W", re.I)
phrases = ['of New York', 'of the New York']
map(lambda phrase: pattern.sub("", phrase), phrases) # ['New York', 'New York']

Sans lambda:

[pattern.sub("", phrase) for phrase in phrases]

Update

Fix for the bug pointed out by gnibbler (thanks!):

pattern = re.compile("\\b(of|the|in|for|at)\\W", re.I)
phrases = ['of New York', 'of the New York', 'Spain has rain']
[pattern.sub("", phrase) for phrase in phrases] # ['New York', 'New York', 'Spain has rain']

@prabhu: the above change avoids snipping off the trailing "in" from "Spain". To verify run both versions of the regular expressions against the phrase "Spain has rain".

Remove all words from a string that exist in a list

You can try something more simple:

import re

remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'

''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])

And the result would be:

' is walking with , wishing good luck to .'

The important part is the last line:

''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])

What it does:

  • You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
  • You are creating another list with list comprehension, filtering all the items, which are not in remove_list
  • You are converting the result list back to string with str.join()

The BNF notation for list comprehensions and a little bit more information on them may be found here

PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.

How to remove multiple words from a string Java

Programmers often do this:

String sentence = "Hello Java World!";
sentence.replace("Java", "");
System.out.println(sentence);

=> Hello Java World

Strings are immutable, and the replace function returns a new string object. So instead write

String sentence = "Hello Java World!";
sentence = sentence.replace("Java", "");
System.out.println(sentence);

=> Hello World!

(the whitespace still exists)

With that, your replace function could look like

public String remove(String phrase, String[] words) {
String result = phrase;
for (String word: words) {
result = result.replace(word, "").replace(" ", " ");
}
return result.trim();
}

How to remove an array of words from a string in javascript?

You can use join and build regex dynamically from the array and replace the matching values

function removeFromString(arr,str){  let regex = new RegExp("\\b"+arr.join('|')+"\\b","gi")  return str.replace(regex, '')}
console.log(removeFromString(["one, two," , "and four"],"Remove one, two, not three and four" ));console.log(removeFromString(["one" , "and four"],"Remove one, two, not three and four" ));console.log(removeFromString(["Hello"], "Hello World") )


Related Topics



Leave a reply



Submit