Pandas: replace substring in string
Use replace
with dict
for replacing and regex=True
:
df['url'] = df['url'].replace({'icashier.alipay.com': 'aliexpress.com'}, regex=True)
print (df)
url
0 aliexpress.com/catalog/2758186/detail.aspx
1 aliexpress.com/catalog/2758186/detail.aspx
2 aliexpress.com/catalog/2758186/detail.aspx
3 vk.com
Replacing Substring with another string from column Pandas
Use the same idea as yours (apply()
, replace()
), just modify a bit about using replace()
.
new_df["String"] = new_df.apply(
lambda row: row["String"].replace("id", row["int_id"]) if row["type"] == 1 else row["String"].replace("id", row["ext_id"]),
axis=1
)
output:
Type String ext_id int_id
0 1 UK2820BC 2393 2820
1 1 UK1068BC 4816 1068
2 0 UK4166BC 4166 3625
3 0 UK2803BC 2803 1006
4 1 UK2697BC 1189 2697
How to replace text in a string column of a Pandas dataframe?
Use the vectorised str
method replace
:
df['range'] = df['range'].str.replace(',','-')
df
range
0 (2-30)
1 (50-290)
EDIT: so if we look at what you tried and why it didn't work:
df['range'].replace(',','-',inplace=True)
from the docs we see this description:
str or regex: str: string exactly matching to_replace will be replaced
with value
So because the str values do not match, no replacement occurs, compare with the following:
df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
0 (2,30)
1 -
Name: range, dtype: object
here we get an exact match on the second row and the replacement occurs.
Replace whole string if it contains substring in pandas
You can use str.contains
to mask the rows that contain 'ball' and then overwrite with the new value:
In [71]:
df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport'
df
Out[71]:
name sport
0 Bob tennis
1 Jane ball sport
2 Alice ball sport
To make it case-insensitive pass `case=False:
df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport'
replace part of the string in pandas data frame
It seems you need Series.replace
:
print (df)
val
0 HF - Antartica
1 HF - America
2 HF - Asia
print (df.val.replace({'HF -':'Hi'}, regex=True))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object
Similar solution with str.replace
:
print (df.val.str.replace('HF -', 'Hi'))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object
How to replace substrings in strings in pandas dataframe
One way is to escape your characters using re
, then use pd.Series.str.replace
.
import pandas as pd
import re
bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
'[', ']', '{', '}', ':', '&', '\n']
df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})
df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')
print(df)
# page
# 0 hello
# 1 problemshere
# 2 nothingwronghere
# 3 nobrackets
Pandas Dataframe: Replace substring in col1 identified by string in col2, with string from col3
Use custom lambda function with if-else
for test missing values NaN
or None
like Nonetype
:
f = lambda x: x['main_string'].replace(x['target'], x['replacement'])
if pd.notna(x['target'])
else x['main_string']
df['out'] = df.apply(f, axis=1)
print (df)
main_string target replacement out
0 Hello My Name is XXX XXX John Hello My Name is John
1 Hello My name is YYY YYY Mary Hello My name is Mary
2 Hello my Name is Rob Nan None Hello my Name is Rob
3 Hello My name is ZZZ ZZZ Kate Hello My name is Kate
Alternative solution with list comprehension:
df['out'] = [a.replace(b, c) if pd.notna(b) else a
for a,b,c in df[['main_string','target','replacement']].to_numpy()]
Replace whole string which contains substring in whole dataframe in pandas
Note that I changed the example to contain zzabc123zz
since you mention "substring" in your question but the example you provided did not show that usecase.
You can use df.replace
with a regex.
import pandas as pd
import re
df = pd.DataFrame({'col_1': ['abc', 'abc123', 'abc456'],
'col_2': ['abc123', '123', 'zzabc123zz']})
df.replace(re.compile('.*abc123.*'), 'test', inplace=True)
print(df)
Outputs
col_1 col_2
0 abc test
1 test 123
2 abc456 test
Related Topics
Variable Assignment and Modification (In Python)
What Are Some Good Python Orm Solutions
Replicating Jupyter Notebook Pandas Dataframe HTML Printout
Closest Equivalent of a Factor Variable in Python Pandas
Aes Python Encryption and Ruby Encryption - Different Behaviour
Why Can't Python Find Shared Objects That Are in Directories in Sys.Path
How to Get Reproducible Results in Keras
Pandas Column Access W/Column Names Containing Spaces
How to Define a Function with Optional Arguments
How to Convert a Currency String to a Floating Point Number in Python
How to Highlight Searched Queries in Result Page of Django Template
Numpy/Scipy Equivalent of R Ecdf(X)(X) Function
Xcode 3.2 Ruby and Python Templates
Dead Simple Example of Using Multiprocessing Queue, Pool and Locking
Import Pandas Dataframe Column as String Not Int
How to Map Numeric Data into Categories/Bins in Pandas Dataframe