How to replace text in a string column of a Pandas dataframe?
Use the vectorised str
method replace
:
df['range'] = df['range'].str.replace(',','-')
df
range
0 (2-30)
1 (50-290)
EDIT: so if we look at what you tried and why it didn't work:
df['range'].replace(',','-',inplace=True)
from the docs we see this description:
str or regex: str: string exactly matching to_replace will be replaced
with value
So because the str values do not match, no replacement occurs, compare with the following:
df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
0 (2,30)
1 -
Name: range, dtype: object
here we get an exact match on the second row and the replacement occurs.
replace part of the string in pandas data frame
It seems you need Series.replace
:
print (df)
val
0 HF - Antartica
1 HF - America
2 HF - Asia
print (df.val.replace({'HF -':'Hi'}, regex=True))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object
Similar solution with str.replace
:
print (df.val.str.replace('HF -', 'Hi'))
0 Hi Antartica
1 Hi America
2 Hi Asia
Name: val, dtype: object
pandas: replace string with another string
Solution with replace
by dictionary
:
df['prod_type'] = df['prod_type'].replace({'respon':'responsive', 'r':'responsive'})
print (df)
prod_type
0 responsive
1 responsive
2 responsive
3 responsive
4 responsive
5 responsive
6 responsive
If need set all values in column to some string
:
df['prod_type'] = 'responsive'
Replace whole string if it contains substring in pandas
You can use str.contains
to mask the rows that contain 'ball' and then overwrite with the new value:
In [71]:
df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport'
df
Out[71]:
name sport
0 Bob tennis
1 Jane ball sport
2 Alice ball sport
To make it case-insensitive pass `case=False:
df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport'
Pandas Dataframe replace string based on length
Basic Solution
The below solution makes use of a lambda
function defined within a call to pandas.DataFrame.apply()
.
df['url'] = df['url'].apply(lambda x: x if len(x) == 108 else x[:-10])
Here, each value within df['url']
(x
) remains the same if len(x) == 108
, otherwise it is updated to be x[:-10]
.
Handling Exceptions
The below solution is similar to that provided above, however in this case some basic exception handling has been implemented within the url_trim()
function called by pandas.DataFrame.apply()
.
This is more robust than the first solution and will not halt code execution when an exception is thrown within pandas.DataFrame.apply()
due to unexpected values within df['url']
rows, in these cases the value is simply left unchanged - for example if numpy.nan
is used for null values.
def url_trim(x):
try:
if len(x) != 108:
return x[:-10]
else:
return x
except:
return x
df['url'] = df['url'].apply(lambda x: url_trim(x))
Pandas Dataframe replace part of string with value from another column
If want replace by another column is necessary use DataFrame.apply
:
df["Formula"]= df.apply(lambda x: x['Formula'].replace('Length', str(x['Length'])), axis=1)
print (df)
Formula Length
0 5 5
1 6+1.5 6
2 5-2.5 5
3 4 4
4 5 5
Or list comprehension:
df["Formula"]= [x.replace('Length', str(y)) for x, y in df[['Formula','Length']].to_numpy()]
python pandas replacing strings in dataframe with numbers
What about DataFrame.replace
?
In [9]: mapping = {'set': 1, 'test': 2}
In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]:
Unnamed: 0 respondent brand engine country aware aware_2 aware_3 age \
0 0 a volvo p swe 1 0 1 23
1 1 b volvo None swe 0 0 1 45
2 2 c bmw p us 0 0 1 56
3 3 d bmw p us 0 1 1 43
4 4 e bmw d germany 1 0 1 34
5 5 f audi d germany 1 0 1 59
6 6 g volvo d swe 1 0 0 65
7 7 h audi d swe 1 0 0 78
8 8 i volvo d us 1 1 1 32
tesst set
0 2 1
1 1 2
2 2 1
3 1 2
4 2 1
5 1 2
6 2 1
7 1 2
8 2 1
As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects()
onto the end to properly convert tesst and set to int64
columns, in case that matters in subsequent operations.
Pandas : Replace string column values
In general, you should avoid manual for
loops and use vectorised functionality, where possible, with Pandas. Here you can utilise pd.to_numeric
to test and convert values within your series:
s = pd.Series(['$2.75', np.nan, 4.150000, 25.00, '$4.50'])
strs = s.astype(str).str.replace('$', '', regex=False)
res = pd.to_numeric(strs, errors='coerce').fillna(0)
print(res)
0 2.75
1 0.00
2 4.15
3 25.00
4 4.50
dtype: float64
How to replace exact string to other using replace() of Panda.DataFrame?
Try:
df['tumor-size'] = df['tumor-size'].replace("^'0-4'$", "'00-04'")
Related Topics
How to Validate a Url with a Regular Expression in Python
How to Read Realtime Microphone Audio Volume in Python and Ffmpeg or Similar
Conda Command Will Prompt Error: "Bad Interpreter: No Such File or Directory"
Show Default Value for Editing on Python Input Possible
Replace All Elements of Python Numpy Array That Are Greater Than Some Value
Why Do Integers in Database Row Tuple Have an 'L' Suffix
How to Remove Stop Words Using Nltk or Python
Why am I Getting a Filenotfounderror
List of Lists into Numpy Array
Why Python Recursive Function Returns None
Too Many Values to Unpack', Iterating Over a Dict. Key=>String, Value=>List
How to Execute Python File in Linux
How to Directly Send a Python Output to Clipboard
How to Postpone/Defer the Evaluation of F-Strings
How to Read a File with a Semi Colon Separator in Pandas