Pandas: Converting to Numeric, Creating Nans When Necessary

Pandas: Converting to numeric, creating NaNs when necessary

In pandas 0.17.0 convert_objects raises a warning:

FutureWarning: convert_objects is deprecated. Use the data-type
specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

You could use pd.to_numeric method and apply it for the dataframe with arg coerce.

df1 = df.apply(pd.to_numeric, args=('coerce',))

or maybe more appropriately:

df1 = df.apply(pd.to_numeric, errors='coerce')

EDIT

The above method is only valid for pandas version >= 0.17.0, from docs what's new in pandas 0.17.0:

pd.to_numeric is a new function to coerce strings to numbers (possibly with coercion) (GH11133)

Convert Pandas column containing NaNs to dtype `int`

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.

convert pandas values to int and when containing nan values

Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'.

You can specify the dtype when you create the dataframe or after the fact

import pandas as pd
import numpy as np

ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5

Giving:

pandas convert objects with numbers and nans to ints or floats

You can convert to numeric with to_numeric and errors='coerce' for floats in columns and for integers use nullable integer data type (pandas 0.24+):

df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce').astype('Int64')
print (df)
   column_name
0           10
1            5
2           20
3          NaN
4            5
5          NaN
6            6

Detail:

print (pd.to_numeric(df['column_name'], errors='coerce'))
0    10.0
1     5.0
2    20.0
3     NaN
4     5.0
5     NaN
6     6.0
Name: column_name, dtype: float64

Pandas to numeric returning NaN

Need replace £ by empty string before converting to numeric:

hsbcraw[cols]=hsbcraw[cols].replace('£','', regex=True).apply(pd.to_numeric, errors='coerce')

Convert pandas series to int with NaN values

NaN is float typed, so Pandas would always downcast your column to float as long as you have NaN. You can use Nullable Integer, available from Pandas 0.24.0:

df['month_added'] = df['month_added'].astype('Int64')

If that's not possible, you can force Object type (not recommended):

df['month_added'] = pd.Series([int(x) if x > 0 else x for x in df.month_added], dtype='O')

Or since your data is positive and NaN, you can mask NaN with 0:

df['month_added'] = df['month_added'].fillna(0).astype(int)

Why does pandas Series.str convert numbers to NaN?

In this line:

df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)

The x.dtype is looking at the entire Series (column). The column is not numeric. Thus the entire column is operated on like strings.

In your second example, the number is not preserved, it is a string '42'.

The difference in the output will be due to the difference in panda's str and python's str.

In the case of pandas .str, this is not a conversion, it is an accessor, that allows you to do the .strip() to each element. What this means is that you attempt to apply .strip() to an integer. This throws an exception, and pandas responds to the exception by returning Nan.

In the case of .apply(str), you are actually converting the values to a string. Later when you apply .strip() this succeeds, since the value is already a string, and thus can be stripped.

Python Pandas: After converting dataframe to Str NAN is no more NAN

Look at .fillna()

You can use that to edit the NaNs before converting the column values to strings.

Pandas: Converting to Numeric, Creating Nans When Necessary