How to Remove Strings Present in a List from a Column in Pandas

How to remove strings present in a list from a column in pandas

I think need str.replace if want remove also substrings:

df['name'] = df['name'].str.replace('|'.join(To_remove_lst), '')

If possible some regex characters:

import re
df['name'] = df['name'].str.replace('|'.join(map(re.escape, To_remove_lst)), '')

print (df)
   ID            name
0   1           Kitty
1   2           Puppy
2   3     is  example
3   4   stackoverflow
4   5           World

But if want remove only words use nested list comprehension:

df['name'] = [' '.join([y for y in x.split() if y not in To_remove_lst]) for x in df['name']]

Removing list of strings from column in pandas

Use the "word boundary" expression \b like.

In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]: 
0         details about your goal
1     expected and actual results
2         show some code anywhere
Name: My_Column, dtype: object

How to remove specific strings from a list in pyspark dataframe column

You can use regexp_replace with '|'.join(). The first is commonly used to replace substring matches. The latter will join the different elements of the list with |. The combination of the two will remove any parts of your column that are present in your list.

import pyspark.sql.functions as F

df = df.withColumn('column_a', F.regexp_replace('column_a', '|'.join(lst), ''))

Keep strings present in a list from a column in pandas

Use Series.str.findall with Series.str.join:

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]

df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
   ID          name
0   1         Kitty
1   2         Kandy
2   3  Micky, Loudy
3   4              
4   5  Kitty, Wicky

Python Pandas remove items from list in one column from the list in other column

You can create your own function to create the new lists and then use apply on the dataframe to execute the function for each row like so:

import pandas as pd

df = pd.DataFrame({'col1':[['a', 'b', 'c'], ['a', 'c', 'f', 'd'], ['d', 'c', 'e', 'f']], 
                   'col2':[['a', 'b'], ['a', 'f'], ['d', 'e', 'f', 'c']]})

def func(df):
    return list(set(df['col1']) - set(df['col2']))

df['col3'] = df.apply(func, axis = 1)

The function converts the lists to sets and uses set subtraction to remove values contained in col2 from col1.

Method to remove a caret from list of strings?

This produces the result you are looking for, I think: a dataframe with 2 columns, Name_of_String and AverageTime, all items in strings are included, with those that are not in the dictionary with AverageTime as 2.91.

Be careful when typing your code, you have switched between Name_of_String and Name_Of_String in your question, which will produce errors (if they are supposed to be the same column). Also, dictionaries use {} not [], which cannot take key: value pairs.

import pandas as pd

strings = ['MyString1^111',
           'MyString2',
           'MyString3',
           'MyString4^222',
           'MyString5^888']

noCaret = [x.replace('^', '') for x in strings]

dictionary = {"MyString2": 3.76, "MyString3": 2.66}

stringsDF = pd.DataFrame(data={"Name_of_String": noCaret})
stringsDF["AverageTime"] = stringsDF["Name_of_String"].map(dictionary).fillna(2.91)

stringsDF
#Out: 
#  Name_of_String  AverageTime
#0   MyString1111         2.91
#1      MyString2         3.76
#2      MyString3         2.66
#3   MyString4222         2.91
#4   MyString5888         2.91

Remove string from one column if present in string of another column pandas

Here's a solution:

df = (
    df.reset_index()
    .assign(new_col=df.reset_index()
        .pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
        .explode('x')
        .loc[lambda x: x['x'] != x['city'], 'x']
        .groupby(level=0)
        .agg(list)
        .str.join(' ')
    )
    .set_index('index')
)

Output:

>>> df
                           company  postal_code  name state         city       new_col
index                                                                                 
2000-01-01          abc gresham co        97080  john    mi      gresham        abc co
2000-01-01             startup llc        97080  jeff    hi     portland   startup llc
2001-01-01  beaverton business biz        99999   sam    ca    beaverton  business biz
2002-01-01                 andy co        92222  joey    or  los angeles       andy co

One-liner:

df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')

How to Remove Strings Present in a List from a Column in Pandas