Convert Pandas column containing NaNs to dtype `int`
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
convert pandas values to int and when containing nan values
Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'
.
You can specify the dtype when you create the dataframe or after the fact
import pandas as pd
import numpy as np
ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5
Giving:
users
0 1
1 <NA>
2 3
3 4
4 5
How to change format of floar values in column with also NaN values in Pandas Data Frame in Python?
You can use convert_dtypes
to perform an automatic conversion:
df = df.convert_dtypes('col')
For all columns:
df = df.convert_dtypes()
output:
col
0 7
1 2
2 <NA>
After conversion:
df.dtypes
col Int64
dtype: object
Convert pandas series to int with NaN values
NaN
is float typed, so Pandas would always downcast your column to float as long as you have NaN
. You can use Nullable Integer, available from Pandas 0.24.0:
df['month_added'] = df['month_added'].astype('Int64')
If that's not possible, you can force Object
type (not recommended):
df['month_added'] = pd.Series([int(x) if x > 0 else x for x in df.month_added], dtype='O')
Or since your data is positive and NaN
, you can mask NaN
with 0
:
df['month_added'] = df['month_added'].fillna(0).astype(int)
cannot convert nan to int (but there are no nans)
Basically the error is telling you that you NaN
values and I will show why your attempts didn't reveal this:
In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
a
0 1.0
1 NaN
2 3.0
3 4.0
now try to cast:
df['a'].astype(int)
this raises:
ValueError: Cannot convert NA to integer
but then you tried something like this:
In [5]:
for index, row in df['a'].iteritems():
if row == np.NaN:
print('index:', index, 'isnull')
this printed nothing, but NaN
cannot be evaluated like this using equality, in fact it has a special property that it will return False
when comparing against itself:
In [6]:
for index, row in df['a'].iteritems():
if row != row:
print('index:', index, 'isnull')
index: 1 isnull
now it prints the row, you should use isnull
for readability:
In [9]:
for index, row in df['a'].iteritems():
if pd.isnull(row):
print('index:', index, 'isnull')
index: 1 isnull
So what to do? We can drop the rows: df.dropna(subset='a')
, or we can replace using fillna
:
In [8]:
df['a'].fillna(0).astype(int)
Out[8]:
0 1
1 0
2 3
3 4
Name: a, dtype: int32
Pandas - convert float to int when there are NaN values in the column
NaN
is itself float and can't be convert to usual int
. You can use pd.Int64Dtype()
for nullable integers:
# sample data:
df = pd.DataFrame({'id':[1, np.nan]})
df['id'] = df['id'].astype(pd.Int64Dtype())
Output:
id
0 1
1 <NA>
Another option, is use apply
, but then the dtype
of the column will be object
rather than numeric/int:
df['id'] = df['id'].apply(lambda x: x if np.isnan(x) else int(x))
Pandas: convert dtype 'object' to int
Documenting the answer that worked for me based on the comment by @piRSquared.
I needed to convert to a string first, then an integer.
>>> df['purchase'].astype(str).astype(int)
Related Topics
Checking Running Python Script Within the Python Script
How to Install Python 3.6.5 on My Ubuntu 19.10 That Already Contains Python 3.7.5
Error Installing Uwsgi in Virtualenv
How to Open (Read-Write) or Create a File with Truncation Allowed
How to Redirect the Stdout into Some Sort of String Buffer
Iterating on a File Doesn't Work the Second Time
Accessing Pandas Column Using Squared Brackets VS Using a Dot (Like an Attribute)
How to Groupby Consecutive Values in Pandas Dataframe
Create an Empty List with Certain Size in Python
How to Open Multiple Files Using "With Open" in Python
How to Create a Constant in Python
What Does "Hashable" Mean in Python
Indexing One Array by Another in Numpy
How to Check for Palindrome Using Python Logic
Output to the Same Line Overwriting Previous Output
How to Get Method Parameter Names