Numpy or Pandas: Keeping Array Type as Integer While Having a Nan Value

NumPy or Pandas: Keeping array type as integer while having a NaN value

This capability has been added to pandas (beginning with version 0.24):
https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support

At this point, it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lowercase).

convert pandas values to int and when containing nan values

Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'.

You can specify the dtype when you create the dataframe or after the fact

import pandas as pd
import numpy as np

ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5

Giving:

   users
0 1
1 <NA>
2 3
3 4
4 5

Convert Pandas column containing NaNs to dtype `int`

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.

Numpy integer nan

No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.

There are talks about introducing a special bit that would allow non-float arrays to store what in practice would correspond to a nan, but so far (2012/10), it's only talks.

In the meantime, you may want to consider the numpy.ma package: instead of picking an invalid integer like -99999, you could use the special numpy.ma.masked value to represent an invalid value.

a = np.ma.array([1,2,3,4,5], dtype=int)
a[1] = np.ma.masked
masked_array(data = [1 -- 3 4 5],
mask = [False True False False False],
fill_value = 999999)

Numpy : check for integer NaN

np.nan is float and not an integer. You either have to change your dtype for last column or use a different structure to store your nan as integer.

datadef=[ ('i', '<i4'), ('f', '<f8'), ('g', '<f8'), ('j', '<f4') ]
arr = np.full((4,), np.nan, dtype=datadef)

# fill array with data
arr['i'] = np.array([1, 2, 3, 4])
arr['f'] = np.array([1.3333333333, np.nan, 2.6666666666666666, 5.0])
arr['g'] = np.array([2.77777777777, 5.4, 3.4, np.nan])
# nothing for 'j'

Now try printing np.isnan statement:

print(np.isnan(arr[1][3]))

True

More elegant way to do value comparison while preserving Nan in Pandas and Numpy Python

Only a bit improved/(changed) your solution:

value_comparison = (a["x"]>a["y"])
nan_comparison = a[["x", "y"]].notna().all(axis=1)
#alternative
#nan_comparison = a["x"].notna() & a["y"].notna()
m = value_comparison.where(nan_comparison)
print (m)
0 0.0
1 NaN
2 0.0
3 1.0
dtype: float64

Last is possible convert to nullable boolean:

m = value_comparison.where(nan_comparison).astype('boolean')
print (m)
0 False
1 <NA>
2 False
3 True
dtype: boolean


Related Topics



Leave a reply



Submit