Pandas: Replace Substring in String

Pandas: replace substring in string

Use replace with dict for replacing and regex=True:

df['url'] = df['url'].replace({'icashier.alipay.com': 'aliexpress.com'}, regex=True)
print (df)
url
0 aliexpress.com/catalog/2758186/detail.aspx
1 aliexpress.com/catalog/2758186/detail.aspx
2 aliexpress.com/catalog/2758186/detail.aspx
3 vk.com

Replacing Substring with another string from column Pandas

Use the same idea as yours (apply(), replace()), just modify a bit about using replace().

new_df["String"] = new_df.apply(
lambda row: row["String"].replace("id", row["int_id"]) if row["type"] == 1 else row["String"].replace("id", row["ext_id"]),
axis=1
)

output:

   Type    String  ext_id  int_id
0 1 UK2820BC 2393 2820
1 1 UK1068BC 4816 1068
2 0 UK4166BC 4166 3625
3 0 UK2803BC 2803 1006
4 1 UK2697BC 1189 2697

How to replace text in a string column of a Pandas dataframe?

Use the vectorised str method replace:

df['range'] = df['range'].str.replace(',','-')

df
range
0 (2-30)
1 (50-290)

EDIT: so if we look at what you tried and why it didn't work:

df['range'].replace(',','-',inplace=True)

from the docs we see this description:

str or regex: str: string exactly matching to_replace will be replaced
with value

So because the str values do not match, no replacement occurs, compare with the following:

df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)

df['range']

0 (2,30)
1 -
Name: range, dtype: object

here we get an exact match on the second row and the replacement occurs.

Replace whole string if it contains substring in pandas

You can use str.contains to mask the rows that contain 'ball' and then overwrite with the new value:

In [71]:
df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport'
df

Out[71]:
name sport
0 Bob tennis
1 Jane ball sport
2 Alice ball sport

To make it case-insensitive pass `case=False:

df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport'

replace part of the string in pandas data frame

It seems you need Series.replace:

print (df)
val
0 HF - Antartica
1 HF - America
2 HF - Asia

print (df.val.replace({'HF -':'Hi'}, regex=True))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object

Similar solution with str.replace:

print (df.val.str.replace('HF -', 'Hi'))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object

How to replace substrings in strings in pandas dataframe

One way is to escape your characters using re, then use pd.Series.str.replace.

import pandas as pd
import re

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
'[', ']', '{', '}', ':', '&', '\n']

df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})

df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')

print(df)

# page
# 0 hello
# 1 problemshere
# 2 nothingwronghere
# 3 nobrackets

Pandas Dataframe: Replace substring in col1 identified by string in col2, with string from col3

Use custom lambda function with if-else for test missing values NaN or None like Nonetype:

f = lambda x: x['main_string'].replace(x['target'], x['replacement']) 
if pd.notna(x['target'])
else x['main_string']
df['out'] = df.apply(f, axis=1)
print (df)
main_string target replacement out
0 Hello My Name is XXX XXX John Hello My Name is John
1 Hello My name is YYY YYY Mary Hello My name is Mary
2 Hello my Name is Rob Nan None Hello my Name is Rob
3 Hello My name is ZZZ ZZZ Kate Hello My name is Kate

Alternative solution with list comprehension:

df['out'] = [a.replace(b, c) if pd.notna(b) else a 
for a,b,c in df[['main_string','target','replacement']].to_numpy()]

Replace whole string which contains substring in whole dataframe in pandas

Note that I changed the example to contain zzabc123zz since you mention "substring" in your question but the example you provided did not show that usecase.

You can use df.replace with a regex.

import pandas as pd
import re

df = pd.DataFrame({'col_1': ['abc', 'abc123', 'abc456'],
'col_2': ['abc123', '123', 'zzabc123zz']})

df.replace(re.compile('.*abc123.*'), 'test', inplace=True)
print(df)

Outputs

    col_1  col_2
0 abc test
1 test 123
2 abc456 test


Related Topics



Leave a reply



Submit