Convert Pandas.Series from Dtype Object to Float, and Errors to Nans

Convert pandas.Series from dtype object to float, and errors to nans

Use pd.to_numeric with errors='coerce'

# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s

0 1
1 2
2 3
3 4
4 .
dtype: object

pd.to_numeric(s, errors='coerce')

0 1.0
1 2.0
2 3.0
3 4.0
4 NaN
dtype: float64

If you need the NaNs filled in, use Series.fillna.

pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')

0 1
1 2
2 3
3 4
4 0
dtype: float64

Note, downcast='infer' will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.

From v0.24+, pandas introduces a Nullable Integer type, which allows
integers to coexist with NaNs. If you have integers in your column,
you can use

pd.__version__
# '0.24.1'

pd.to_numeric(s, errors='coerce').astype('Int32')

0 1
1 2
2 3
3 4
4 NaN
dtype: Int32

There are other options to choose from as well, read the docs for more.


Extension for DataFrames

If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply.

# Setup.
np.random.seed(0)
df = pd.DataFrame({
'A' : np.random.choice(10, 5),
'C' : np.random.choice(10, 5),
'B' : ['1', '###', '...', 50, '234'],
'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df

A B C D
0 5 1 9 23
1 0 ### 3 1
2 3 ... 5 ...
3 3 50 2 268
4 7 234 4 $$

df.dtypes

A int64
B object
C int64
D object
dtype: object

df2 = df.apply(pd.to_numeric, errors='coerce')
df2

A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN

df2.dtypes

A int64
B float64
C int64
D float64
dtype: object

You can also do this with DataFrame.transform; although my tests indicate this is marginally slower:

df.transform(pd.to_numeric, errors='coerce')

A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN

If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric on the non-numeric columns only.

df.dtypes.eq(object)

A False
B True
C False
D True
dtype: bool

cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')

df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
# df[c] = pd.to_numeric(df[c], errors='coerce')

df

A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN

Applying pd.to_numeric along the columns (i.e., axis=0, the default) should be slightly faster for long DataFrames.

After convert object to float all values are NaN - how to solve this?

It doesn't work because there is a comma. If you want the values before the comma and after the dollar sign, try this:

df["Value_conv"] = df["Value"].str.split('$').str[1].str.split(',').str[0].astype(float)

The reason it gives None is because errors='coerce' gives None if it's not a integer/float like.

As mentioned in the documentation:

It says:

If ‘coerce’, then invalid parsing will be set as NaN.

Converting object type column to float type converts all to Nan?

pd.to_numeric with parameter errors='coerce' returns NaN with invalid parsing for entries that can't be converted to float values:

Official document states that:

errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

If ‘raise’, then
invalid parsing will raise an exception.

If ‘coerce’, then invalid parsing will be set as NaN.

If ‘ignore’, then invalid parsing will return the input.

Probably, your elements in the dataframe are non-numeric and can't be converted.

How to convert object to float in Pandas?

Try this:

# sample dataframe
d = {'Quantidade':['0,20939', '0,0082525', '0,009852', '0,012920', '0,0252'],
'price':['R$ 165.000,00', 'R$ 100.000,00', 'R$ 61.500,00', 'R$ 65.900,00', 'R$ 49.375,12']}
df = pd.DataFrame(data=d)
# Second column
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)

#Third column
df['price'] = df.price.str.replace(r'\w+\$\s+', '').str.replace('.', '')\
.str.replace(',', '.').astype(float)

Output:

Quantidade  price
0 0.209390 165000.00
1 0.008252 100000.00
2 0.009852 61500.00
3 0.012920 65900.00
4 0.025200 49375.12

Python - How to convert from object to float

Now using nucsit026's answer to create a slightly different dataFrame with strings

dic = {'revenue':['7980.79',np.nan,'1000.25','17800.85','None','2457.85','6789.33']}
print(df)
print(df['revenue'].dtypes

Output:

    revenue
0 7980.79
1 NaN
2 1000.25
3 17800.85
4 None
5 2457.85
6 6789.33

dtype('O')

try this:

df['revenue']=pd.to_numeric(data['revenue'], errors='coerce').fillna(0, downcast='infer')

it will replace nan with 0s

Output:

0     7980.79
1 0.00
2 1000.25
3 17800.85
4 0.00
5 2457.85
6 6789.33
Name: revenue, dtype: float64

EDIT:

From your shared error if quotes are the problem you can use

df['revenue']=df['revenue'].str.strip("'")

and then try to convert to float using above mentioned code

EDIT2

OP had some spaces in the column values like this

Month  Revenue
Apr-13 16 004 258.24
May-13
Jun-13 16 469 157.71
Jul-13 19 054 861.01
Aug-13 20 021 803.71
Sep-13 21 285 537.45
Oct-13 22 193 453.80
Nov-13 21 862 298.20
Dec-13 10 053 557.64
Jan-14 17 358 063.34
Feb-14 19 469 161.04
Mar-14 22 567 078.21
Apr-14 20 401 188.64

In this case use following code:

df['revenue']=df['revenue'].replace(' ', '', regex=True)

and then perform the conversion

Converting Object Data type to float data type in pandas results NaN values

pd.to_numeric attempts to convert a sequence to numeric and coerces when told to do so.

  • errors = 'coerce' will convert anything it can to float, and anything it can't to NaNs
  • If you'd like to keep whatever it couldn't convert to float in its
    original form for debugging, do errors = 'ignore'

Also, can you please post the original data in PROTEIN_SEQUENCE column? Perhaps, cleaning it a bit before conversion would be helpful.

pandas convert objects with numbers and nans to ints or floats

You can convert to numeric with to_numeric and errors='coerce' for floats in columns and for integers use nullable integer data type (pandas 0.24+):

df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce').astype('Int64')
print (df)
column_name
0 10
1 5
2 20
3 NaN
4 5
5 NaN
6 6

Detail:

print (pd.to_numeric(df['column_name'], errors='coerce'))
0 10.0
1 5.0
2 20.0
3 NaN
4 5.0
5 NaN
6 6.0
Name: column_name, dtype: float64

Convert Pandas column containing NaNs to dtype `int`

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.



Related Topics



Leave a reply



Submit