Convert Pandas Column Containing Nans to Dtype 'Int'

Convert Pandas column containing NaNs to dtype `int`

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.

convert pandas values to int and when containing nan values

Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'.

You can specify the dtype when you create the dataframe or after the fact

import pandas as pd
import numpy as np

ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5

Giving:

   users
0 1
1 <NA>
2 3
3 4
4 5

How to change format of floar values in column with also NaN values in Pandas Data Frame in Python?

You can use convert_dtypes to perform an automatic conversion:

df = df.convert_dtypes('col')

For all columns:

df = df.convert_dtypes()

output:

    col
0 7
1 2
2 <NA>

After conversion:

df.dtypes

col Int64
dtype: object

Convert pandas series to int with NaN values

NaN is float typed, so Pandas would always downcast your column to float as long as you have NaN. You can use Nullable Integer, available from Pandas 0.24.0:

df['month_added'] = df['month_added'].astype('Int64')

If that's not possible, you can force Object type (not recommended):

df['month_added'] = pd.Series([int(x) if x > 0 else x for x in df.month_added], dtype='O')

Or since your data is positive and NaN, you can mask NaN with 0:

df['month_added'] = df['month_added'].fillna(0).astype(int)

cannot convert nan to int (but there are no nans)

Basically the error is telling you that you NaN values and I will show why your attempts didn't reveal this:

In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
a
0 1.0
1 NaN
2 3.0
3 4.0

now try to cast:

df['a'].astype(int)

this raises:

ValueError: Cannot convert NA to integer

but then you tried something like this:

In [5]:
for index, row in df['a'].iteritems():
if row == np.NaN:
print('index:', index, 'isnull')

this printed nothing, but NaN cannot be evaluated like this using equality, in fact it has a special property that it will return False when comparing against itself:

In [6]:
for index, row in df['a'].iteritems():
if row != row:
print('index:', index, 'isnull')

index: 1 isnull

now it prints the row, you should use isnull for readability:

In [9]:
for index, row in df['a'].iteritems():
if pd.isnull(row):
print('index:', index, 'isnull')

index: 1 isnull

So what to do? We can drop the rows: df.dropna(subset='a'), or we can replace using fillna:

In [8]:
df['a'].fillna(0).astype(int)

Out[8]:
0 1
1 0
2 3
3 4
Name: a, dtype: int32

Pandas - convert float to int when there are NaN values in the column

NaN is itself float and can't be convert to usual int. You can use pd.Int64Dtype() for nullable integers:

# sample data:
df = pd.DataFrame({'id':[1, np.nan]})

df['id'] = df['id'].astype(pd.Int64Dtype())

Output:

     id
0 1
1 <NA>

Another option, is use apply, but then the dtype of the column will be object rather than numeric/int:

df['id'] = df['id'].apply(lambda x: x if np.isnan(x) else int(x))

Pandas: convert dtype 'object' to int

Documenting the answer that worked for me based on the comment by @piRSquared.

I needed to convert to a string first, then an integer.

>>> df['purchase'].astype(str).astype(int)


Related Topics



Leave a reply



Submit