How to Remove Strings Present in a List from a Column in Pandas

How to remove strings present in a list from a column in pandas

I think need str.replace if want remove also substrings:

df['name'] = df['name'].str.replace('|'.join(To_remove_lst), '')

If possible some regex characters:

import re
df['name'] = df['name'].str.replace('|'.join(map(re.escape, To_remove_lst)), '')

print (df)
ID name
0 1 Kitty
1 2 Puppy
2 3 is example
3 4 stackoverflow
4 5 World

But if want remove only words use nested list comprehension:

df['name'] = [' '.join([y for y in x.split() if y not in To_remove_lst]) for x in df['name']]

Removing list of strings from column in pandas

Use the "word boundary" expression \b like.

In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]:
0 details about your goal
1 expected and actual results
2 show some code anywhere
Name: My_Column, dtype: object

How to remove specific strings from a list in pyspark dataframe column

You can use regexp_replace with '|'.join(). The first is commonly used to replace substring matches. The latter will join the different elements of the list with |. The combination of the two will remove any parts of your column that are present in your list.

import pyspark.sql.functions as F

df = df.withColumn('column_a', F.regexp_replace('column_a', '|'.join(lst), ''))

Keep strings present in a list from a column in pandas

Use Series.str.findall with Series.str.join:

To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]

df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
ID name
0 1 Kitty
1 2 Kandy
2 3 Micky, Loudy
3 4
4 5 Kitty, Wicky

Python Pandas remove items from list in one column from the list in other column

You can create your own function to create the new lists and then use apply on the dataframe to execute the function for each row like so:

import pandas as pd

df = pd.DataFrame({'col1':[['a', 'b', 'c'], ['a', 'c', 'f', 'd'], ['d', 'c', 'e', 'f']],
'col2':[['a', 'b'], ['a', 'f'], ['d', 'e', 'f', 'c']]})

def func(df):
return list(set(df['col1']) - set(df['col2']))

df['col3'] = df.apply(func, axis = 1)

The function converts the lists to sets and uses set subtraction to remove values contained in col2 from col1.

Method to remove a caret from list of strings?

This produces the result you are looking for, I think: a dataframe with 2 columns, Name_of_String and AverageTime, all items in strings are included, with those that are not in the dictionary with AverageTime as 2.91.

Be careful when typing your code, you have switched between Name_of_String and Name_Of_String in your question, which will produce errors (if they are supposed to be the same column). Also, dictionaries use {} not [], which cannot take key: value pairs.

import pandas as pd

strings = ['MyString1^111',
'MyString2',
'MyString3',
'MyString4^222',
'MyString5^888']

noCaret = [x.replace('^', '') for x in strings]

dictionary = {"MyString2": 3.76, "MyString3": 2.66}

stringsDF = pd.DataFrame(data={"Name_of_String": noCaret})
stringsDF["AverageTime"] = stringsDF["Name_of_String"].map(dictionary).fillna(2.91)

stringsDF
#Out:
# Name_of_String AverageTime
#0 MyString1111 2.91
#1 MyString2 3.76
#2 MyString3 2.66
#3 MyString4222 2.91
#4 MyString5888 2.91

Remove string from one column if present in string of another column pandas

Here's a solution:

df = (
df.reset_index()
.assign(new_col=df.reset_index()
.pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
.explode('x')
.loc[lambda x: x['x'] != x['city'], 'x']
.groupby(level=0)
.agg(list)
.str.join(' ')
)
.set_index('index')
)

Output:

>>> df
company postal_code name state city new_col
index
2000-01-01 abc gresham co 97080 john mi gresham abc co
2000-01-01 startup llc 97080 jeff hi portland startup llc
2001-01-01 beaverton business biz 99999 sam ca beaverton business biz
2002-01-01 andy co 92222 joey or los angeles andy co

One-liner:

df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')


Related Topics



Leave a reply



Submit