How to remove strings present in a list from a column in pandas
I think need str.replace
if want remove also substrings:
df['name'] = df['name'].str.replace('|'.join(To_remove_lst), '')
If possible some regex characters:
import re
df['name'] = df['name'].str.replace('|'.join(map(re.escape, To_remove_lst)), '')
print (df)
ID name
0 1 Kitty
1 2 Puppy
2 3 is example
3 4 stackoverflow
4 5 World
But if want remove only words use nested list comprehension:
df['name'] = [' '.join([y for y in x.split() if y not in To_remove_lst]) for x in df['name']]
Removing list of strings from column in pandas
Use the "word boundary" expression \b
like.
In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]:
0 details about your goal
1 expected and actual results
2 show some code anywhere
Name: My_Column, dtype: object
How to remove specific strings from a list in pyspark dataframe column
You can use regexp_replace
with '|'.join()
. The first is commonly used to replace substring matches. The latter will join the different elements of the list with |
. The combination of the two will remove any parts of your column that are present in your list.
import pyspark.sql.functions as F
df = df.withColumn('column_a', F.regexp_replace('column_a', '|'.join(lst), ''))
Keep strings present in a list from a column in pandas
Use Series.str.findall
with Series.str.join
:
To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]
df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
ID name
0 1 Kitty
1 2 Kandy
2 3 Micky, Loudy
3 4
4 5 Kitty, Wicky
Python Pandas remove items from list in one column from the list in other column
You can create your own function to create the new lists and then use apply
on the dataframe to execute the function for each row like so:
import pandas as pd
df = pd.DataFrame({'col1':[['a', 'b', 'c'], ['a', 'c', 'f', 'd'], ['d', 'c', 'e', 'f']],
'col2':[['a', 'b'], ['a', 'f'], ['d', 'e', 'f', 'c']]})
def func(df):
return list(set(df['col1']) - set(df['col2']))
df['col3'] = df.apply(func, axis = 1)
The function converts the lists to sets and uses set subtraction to remove values contained in col2
from col1
.
Method to remove a caret from list of strings?
This produces the result you are looking for, I think: a dataframe with 2 columns, Name_of_String
and AverageTime
, all items in strings
are included, with those that are not in the dictionary with AverageTime
as 2.91.
Be careful when typing your code, you have switched between Name_of_String
and Name_Of_String
in your question, which will produce errors (if they are supposed to be the same column). Also, dictionaries use {}
not []
, which cannot take key: value
pairs.
import pandas as pd
strings = ['MyString1^111',
'MyString2',
'MyString3',
'MyString4^222',
'MyString5^888']
noCaret = [x.replace('^', '') for x in strings]
dictionary = {"MyString2": 3.76, "MyString3": 2.66}
stringsDF = pd.DataFrame(data={"Name_of_String": noCaret})
stringsDF["AverageTime"] = stringsDF["Name_of_String"].map(dictionary).fillna(2.91)
stringsDF
#Out:
# Name_of_String AverageTime
#0 MyString1111 2.91
#1 MyString2 3.76
#2 MyString3 2.66
#3 MyString4222 2.91
#4 MyString5888 2.91
Remove string from one column if present in string of another column pandas
Here's a solution:
df = (
df.reset_index()
.assign(new_col=df.reset_index()
.pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
.explode('x')
.loc[lambda x: x['x'] != x['city'], 'x']
.groupby(level=0)
.agg(list)
.str.join(' ')
)
.set_index('index')
)
Output:
>>> df
company postal_code name state city new_col
index
2000-01-01 abc gresham co 97080 john mi gresham abc co
2000-01-01 startup llc 97080 jeff hi portland startup llc
2001-01-01 beaverton business biz 99999 sam ca beaverton business biz
2002-01-01 andy co 92222 joey or los angeles andy co
One-liner:
df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')
Related Topics
How to Limit the User Input to Only Integers in Python
How to Add List into a New Column in CSV - Python
How to Delete Tkinter Widgets from a Window
Python - Automatically Adjust Width of an Excel File'S Columns
Check If a Key Exists in a Bucket in S3 Using Boto3
Decode Utf-8 Encoding in Json String
Check If a Specific Class and Value Exist in HTML Using Beautifulsoup Python
How to Get the Column Name in Pandas Based on Row Values
Invalidargumenterror: Logits and Labels Must Have the Same First Dimension Seq2Seq Tensorflow
Python - How to Make User Input Not Case Sensitive
Fast Way to Split Column into Multiple Rows in Pandas
Pandas Dataframe Calculations With Previous Row
Finding the Most Frequent Character in a String
Redirect Command Line Results to a Tkinter Gui
How to Suppress Scientific Notation When Printing Float Values
Get Value of Span Tag Using Beautifulsoup
How to Detect and Remove Outliers from Each Column of Pandas Dataframe At One Go