Pandas: Converting to numeric, creating NaNs when necessary
In pandas 0.17.0
convert_objects
raises a warning:
FutureWarning: convert_objects is deprecated. Use the data-type
specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
You could use pd.to_numeric
method and apply it for the dataframe with arg coerce
.
df1 = df.apply(pd.to_numeric, args=('coerce',))
or maybe more appropriately:
df1 = df.apply(pd.to_numeric, errors='coerce')
EDIT
The above method is only valid for pandas version >= 0.17.0
, from docs what's new in pandas 0.17.0:
pd.to_numeric is a new function to coerce strings to numbers (possibly with coercion) (GH11133)
Convert Pandas column containing NaNs to dtype `int`
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
convert pandas values to int and when containing nan values
Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'
.
You can specify the dtype when you create the dataframe or after the fact
import pandas as pd
import numpy as np
ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5
Giving:
users
0 1
1 <NA>
2 3
3 4
4 5
pandas convert objects with numbers and nans to ints or floats
You can convert to numeric with to_numeric
and errors='coerce'
for floats in columns and for integers use nullable integer data type
(pandas 0.24+):
df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce').astype('Int64')
print (df)
column_name
0 10
1 5
2 20
3 NaN
4 5
5 NaN
6 6
Detail:
print (pd.to_numeric(df['column_name'], errors='coerce'))
0 10.0
1 5.0
2 20.0
3 NaN
4 5.0
5 NaN
6 6.0
Name: column_name, dtype: float64
Pandas to numeric returning NaN
Need replace
£
by empty string before converting to numeric
:
hsbcraw[cols]=hsbcraw[cols].replace('£','', regex=True).apply(pd.to_numeric, errors='coerce')
Convert pandas series to int with NaN values
NaN
is float typed, so Pandas would always downcast your column to float as long as you have NaN
. You can use Nullable Integer, available from Pandas 0.24.0:
df['month_added'] = df['month_added'].astype('Int64')
If that's not possible, you can force Object
type (not recommended):
df['month_added'] = pd.Series([int(x) if x > 0 else x for x in df.month_added], dtype='O')
Or since your data is positive and NaN
, you can mask NaN
with 0
:
df['month_added'] = df['month_added'].fillna(0).astype(int)
Why does pandas Series.str convert numbers to NaN?
In this line:
df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)
The x.dtype
is looking at the entire Series (column). The column is not numeric. Thus the entire column is operated on like strings.
In your second example, the number is not preserved, it is a string '42'
.
The difference in the output will be due to the difference in panda's str and python's str.
In the case of pandas .str
, this is not a conversion, it is an accessor, that allows you to do the .strip()
to each element. What this means is that you attempt to apply .strip()
to an integer. This throws an exception, and pandas responds to the exception by returning Nan.
In the case of .apply(str)
, you are actually converting the values to a string. Later when you apply .strip()
this succeeds, since the value is already a string, and thus can be stripped.
Python Pandas: After converting dataframe to Str NAN is no more NAN
Look at .fillna()
You can use that to edit the NaNs before converting the column values to strings.
Related Topics
Python Socket Receive Large Amount of Data
How to Convert Integer Timestamp into a Datetime
How Are Glob.Glob()'s Return Values Ordered
Create a List with Initial Capacity in Python
How to Implement a Python for Range Loop Without an Iterator Variable
Converting Xml to JSON Using Python
Python Multiprocessing Safely Writing to a File
Converting Dict to Ordereddict
How to Obtain the Element-Wise Logical Not of a Pandas Series
Typeerror: a Bytes-Like Object Is Required, Not 'Str' in Python and CSV
Python and Pip, List All Versions of a Package That's Available
Datetime to String with Series in Pandas
How to Pass Arguments in Pytest by Command Line
Repeating Elements of a List N Times
Regex Matching 5-Digit Substrings Not Enclosed with Digits