How to Replace Text in a String Column of a Pandas Dataframe

How to replace text in a string column of a Pandas dataframe?

Use the vectorised str method replace:

df['range'] = df['range'].str.replace(',','-')

df
range
0 (2-30)
1 (50-290)

EDIT: so if we look at what you tried and why it didn't work:

df['range'].replace(',','-',inplace=True)

from the docs we see this description:

str or regex: str: string exactly matching to_replace will be replaced
with value

So because the str values do not match, no replacement occurs, compare with the following:

df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)

df['range']

0 (2,30)
1 -
Name: range, dtype: object

here we get an exact match on the second row and the replacement occurs.

replace part of the string in pandas data frame

It seems you need Series.replace:

print (df)
val
0 HF - Antartica
1 HF - America
2 HF - Asia

print (df.val.replace({'HF -':'Hi'}, regex=True))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object

Similar solution with str.replace:

print (df.val.str.replace('HF -', 'Hi'))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object

pandas: replace string with another string

Solution with replace by dictionary:

df['prod_type'] = df['prod_type'].replace({'respon':'responsive', 'r':'responsive'})
print (df)
prod_type
0 responsive
1 responsive
2 responsive
3 responsive
4 responsive
5 responsive
6 responsive

If need set all values in column to some string:

df['prod_type'] = 'responsive' 

Replace whole string if it contains substring in pandas

You can use str.contains to mask the rows that contain 'ball' and then overwrite with the new value:

In [71]:
df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport'
df

Out[71]:
name sport
0 Bob tennis
1 Jane ball sport
2 Alice ball sport

To make it case-insensitive pass `case=False:

df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport'

Pandas Dataframe replace string based on length

Basic Solution

The below solution makes use of a lambda function defined within a call to pandas.DataFrame.apply().

df['url'] = df['url'].apply(lambda x: x if len(x) == 108 else x[:-10])

Here, each value within df['url'] (x) remains the same if len(x) == 108, otherwise it is updated to be x[:-10].

Handling Exceptions

The below solution is similar to that provided above, however in this case some basic exception handling has been implemented within the url_trim() function called by pandas.DataFrame.apply().

This is more robust than the first solution and will not halt code execution when an exception is thrown within pandas.DataFrame.apply() due to unexpected values within df['url'] rows, in these cases the value is simply left unchanged - for example if numpy.nan is used for null values.

def url_trim(x):
try:
if len(x) != 108:
return x[:-10]
else:
return x
except:
return x

df['url'] = df['url'].apply(lambda x: url_trim(x))

Pandas Dataframe replace part of string with value from another column

If want replace by another column is necessary use DataFrame.apply:

df["Formula"]= df.apply(lambda x: x['Formula'].replace('Length', str(x['Length'])), axis=1)
print (df)
Formula Length
0 5 5
1 6+1.5 6
2 5-2.5 5
3 4 4
4 5 5

Or list comprehension:

df["Formula"]= [x.replace('Length', str(y)) for x, y  in df[['Formula','Length']].to_numpy()]

python pandas replacing strings in dataframe with numbers

What about DataFrame.replace?

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]:
Unnamed: 0 respondent brand engine country aware aware_2 aware_3 age \
0 0 a volvo p swe 1 0 1 23
1 1 b volvo None swe 0 0 1 45
2 2 c bmw p us 0 0 1 56
3 3 d bmw p us 0 1 1 43
4 4 e bmw d germany 1 0 1 34
5 5 f audi d germany 1 0 1 59
6 6 g volvo d swe 1 0 0 65
7 7 h audi d swe 1 0 0 78
8 8 i volvo d us 1 1 1 32

tesst set
0 2 1
1 1 2
2 2 1
3 1 2
4 2 1
5 1 2
6 2 1
7 1 2
8 2 1

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects() onto the end to properly convert tesst and set to int64 columns, in case that matters in subsequent operations.

Pandas : Replace string column values

In general, you should avoid manual for loops and use vectorised functionality, where possible, with Pandas. Here you can utilise pd.to_numeric to test and convert values within your series:

s = pd.Series(['$2.75', np.nan, 4.150000, 25.00, '$4.50'])

strs = s.astype(str).str.replace('$', '', regex=False)
res = pd.to_numeric(strs, errors='coerce').fillna(0)

print(res)

0 2.75
1 0.00
2 4.15
3 25.00
4 4.50
dtype: float64

How to replace exact string to other using replace() of Panda.DataFrame?

Try:

df['tumor-size'] = df['tumor-size'].replace("^'0-4'$", "'00-04'")


Related Topics



Leave a reply



Submit