NumPy or Pandas: Keeping array type as integer while having a NaN value
This capability has been added to pandas (beginning with version 0.24):
https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support
At this point, it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lowercase).
convert pandas values to int and when containing nan values
Check out https://stackoverflow.com/a/51997100/11103175. There is a functionality to keep it as a NaN value by using dtype 'Int64'
.
You can specify the dtype when you create the dataframe or after the fact
import pandas as pd
import numpy as np
ind = list(range(5))
values = [1.0,np.nan,3.0,4.0,5.0]
df5 = pd.DataFrame(index=ind, data={'users':values},dtype='Int64')
#df5 = df5.astype('Int64')
df5
Giving:
users
0 1
1 <NA>
2 3
3 4
4 5
Convert Pandas column containing NaNs to dtype `int`
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
Numpy integer nan
No, you can't, at least with current version of NumPy. A nan
is a special value for float arrays only.
There are talks about introducing a special bit that would allow non-float arrays to store what in practice would correspond to a nan
, but so far (2012/10), it's only talks.
In the meantime, you may want to consider the numpy.ma
package: instead of picking an invalid integer like -99999, you could use the special numpy.ma.masked
value to represent an invalid value.
a = np.ma.array([1,2,3,4,5], dtype=int)
a[1] = np.ma.masked
masked_array(data = [1 -- 3 4 5],
mask = [False True False False False],
fill_value = 999999)
Numpy : check for integer NaN
np.nan
is float and not an integer. You either have to change your dtype for last column or use a different structure to store your nan as integer.
datadef=[ ('i', '<i4'), ('f', '<f8'), ('g', '<f8'), ('j', '<f4') ]
arr = np.full((4,), np.nan, dtype=datadef)
# fill array with data
arr['i'] = np.array([1, 2, 3, 4])
arr['f'] = np.array([1.3333333333, np.nan, 2.6666666666666666, 5.0])
arr['g'] = np.array([2.77777777777, 5.4, 3.4, np.nan])
# nothing for 'j'
Now try printing np.isnan statement:
print(np.isnan(arr[1][3]))
True
More elegant way to do value comparison while preserving Nan in Pandas and Numpy Python
Only a bit improved/(changed) your solution:
value_comparison = (a["x"]>a["y"])
nan_comparison = a[["x", "y"]].notna().all(axis=1)
#alternative
#nan_comparison = a["x"].notna() & a["y"].notna()
m = value_comparison.where(nan_comparison)
print (m)
0 0.0
1 NaN
2 0.0
3 1.0
dtype: float64
Last is possible convert to nullable boolean
:
m = value_comparison.where(nan_comparison).astype('boolean')
print (m)
0 False
1 <NA>
2 False
3 True
dtype: boolean
Related Topics
Python Dictionary Comprehension
How to Improve Performance of This Code
Count the Number of Occurrences of a Character in a String
Which Python Memory Profiler Is Recommended
How to Pad a String With Zeroes
How to Convert a .Py to .Exe For Python
Open Web in New Tab Selenium + Python
How to Modify List Entries During For Loop
Loop "Forgets" to Remove Some Items
Best Way to Convert String to Bytes in Python 3
Retrieve Links from Web Page Using Python and Beautifulsoup
Create a Pandas Dataframe by Appending One Row At a Time
How to Pass a Variable Between Flask Pages