Convert pandas.Series from dtype object to float, and errors to nans
Use pd.to_numeric
with errors='coerce'
# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s
0 1
1 2
2 3
3 4
4 .
dtype: object
pd.to_numeric(s, errors='coerce')
0 1.0
1 2.0
2 3.0
3 4.0
4 NaN
dtype: float64
If you need the NaN
s filled in, use Series.fillna
.
pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')
0 1
1 2
2 3
3 4
4 0
dtype: float64
Note, downcast='infer'
will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.
From v0.24+, pandas introduces a Nullable Integer type, which allows
integers to coexist with NaNs. If you have integers in your column,
you can usepd.__version__
# '0.24.1'
pd.to_numeric(s, errors='coerce').astype('Int32')
0 1
1 2
2 3
3 4
4 NaN
dtype: Int32
There are other options to choose from as well, read the docs for more.
Extension for DataFrames
If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply
.
# Setup.
np.random.seed(0)
df = pd.DataFrame({
'A' : np.random.choice(10, 5),
'C' : np.random.choice(10, 5),
'B' : ['1', '###', '...', 50, '234'],
'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df
A B C D
0 5 1 9 23
1 0 ### 3 1
2 3 ... 5 ...
3 3 50 2 268
4 7 234 4 $$
df.dtypes
A int64
B object
C int64
D object
dtype: object
df2 = df.apply(pd.to_numeric, errors='coerce')
df2
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
df2.dtypes
A int64
B float64
C int64
D float64
dtype: object
You can also do this with DataFrame.transform
; although my tests indicate this is marginally slower:
df.transform(pd.to_numeric, errors='coerce')
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric
on the non-numeric columns only.
df.dtypes.eq(object)
A False
B True
C False
D True
dtype: bool
cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
# df[c] = pd.to_numeric(df[c], errors='coerce')
df
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
Applying pd.to_numeric
along the columns (i.e., axis=0
, the default) should be slightly faster for long DataFrames.
After convert object to float all values are NaN - how to solve this?
It doesn't work because there is a comma. If you want the values before the comma and after the dollar sign, try this:
df["Value_conv"] = df["Value"].str.split('$').str[1].str.split(',').str[0].astype(float)
The reason it gives None
is because errors='coerce'
gives None
if it's not a integer/float like.
As mentioned in the documentation:
It says:
If ‘coerce’, then invalid parsing will be set as NaN.
Converting object type column to float type converts all to Nan?
pd.to_numeric
with parameter errors='coerce'
returns NaN
with invalid parsing for entries that can't be converted to float values:
Official document states that:
errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If ‘raise’, then
invalid parsing will raise an exception.If ‘coerce’, then invalid parsing will be set as NaN.
If ‘ignore’, then invalid parsing will return the input.
Probably, your elements in the dataframe are non-numeric and can't be converted.
How to convert object to float in Pandas?
Try this:
# sample dataframe
d = {'Quantidade':['0,20939', '0,0082525', '0,009852', '0,012920', '0,0252'],
'price':['R$ 165.000,00', 'R$ 100.000,00', 'R$ 61.500,00', 'R$ 65.900,00', 'R$ 49.375,12']}
df = pd.DataFrame(data=d)
# Second column
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)
#Third column
df['price'] = df.price.str.replace(r'\w+\$\s+', '').str.replace('.', '')\
.str.replace(',', '.').astype(float)
Output:
Quantidade price
0 0.209390 165000.00
1 0.008252 100000.00
2 0.009852 61500.00
3 0.012920 65900.00
4 0.025200 49375.12
Python - How to convert from object to float
Now using nucsit026's answer to create a slightly different dataFrame with strings
dic = {'revenue':['7980.79',np.nan,'1000.25','17800.85','None','2457.85','6789.33']}
print(df)
print(df['revenue'].dtypes
Output:
revenue
0 7980.79
1 NaN
2 1000.25
3 17800.85
4 None
5 2457.85
6 6789.33
dtype('O')
try this:
df['revenue']=pd.to_numeric(data['revenue'], errors='coerce').fillna(0, downcast='infer')
it will replace nan
with 0s
Output:
0 7980.79
1 0.00
2 1000.25
3 17800.85
4 0.00
5 2457.85
6 6789.33
Name: revenue, dtype: float64
EDIT:
From your shared error if quotes are the problem you can use
df['revenue']=df['revenue'].str.strip("'")
and then try to convert to float using above mentioned code
EDIT2
OP had some spaces in the column values like this
Month Revenue
Apr-13 16 004 258.24
May-13
Jun-13 16 469 157.71
Jul-13 19 054 861.01
Aug-13 20 021 803.71
Sep-13 21 285 537.45
Oct-13 22 193 453.80
Nov-13 21 862 298.20
Dec-13 10 053 557.64
Jan-14 17 358 063.34
Feb-14 19 469 161.04
Mar-14 22 567 078.21
Apr-14 20 401 188.64
In this case use following code:
df['revenue']=df['revenue'].replace(' ', '', regex=True)
and then perform the conversion
Converting Object Data type to float data type in pandas results NaN values
pd.to_numeric attempts to convert a sequence to numeric and coerces when told to do so.
errors = 'coerce'
will convert anything it can tofloat
, and anything it can't toNaN
s- If you'd like to keep whatever it couldn't convert to
float
in its
original form for debugging, doerrors = 'ignore'
Also, can you please post the original data in PROTEIN_SEQUENCE column? Perhaps, cleaning it a bit before conversion would be helpful.
pandas convert objects with numbers and nans to ints or floats
You can convert to numeric with to_numeric
and errors='coerce'
for floats in columns and for integers use nullable integer data type
(pandas 0.24+):
df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce').astype('Int64')
print (df)
column_name
0 10
1 5
2 20
3 NaN
4 5
5 NaN
6 6
Detail:
print (pd.to_numeric(df['column_name'], errors='coerce'))
0 10.0
1 5.0
2 20.0
3 NaN
4 5.0
5 NaN
6 6.0
Name: column_name, dtype: float64
Convert Pandas column containing NaNs to dtype `int`
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
Related Topics
Threading in a Pyqt Application: Use Qt Threads or Python Threads
How to Find Children of Nodes Using Beautifulsoup
Extract Text from Xml Documents in Python
How to Specify New Lines on Python, When Writing on Files
How to Find Tag with Particular Text with Beautiful Soup
Postgresql: How to Install Plpythonu Extension
The Correct Cmakelists.Txt File to Call a Maxon Libarary in a Python Script Using Pybind11
Is There a Python Equivalent to Ruby Symbols
Why am I Getting "Indentationerror: Expected an Indented Block"
How to Make an Immutable Object in Python
Replace Values in List Using Python
Convert Pandas.Series from Dtype Object to Float, and Errors to Nans
Python Script to Do Something at the Same Time Every Day
Format Output String, Right Alignment
How to Plot Implicit Equations Using Matplotlib