How to check if Pandas/NumPy arbitrary object contains or IS NaT/NaN/Null
Wrap x
inside of list and pass to pd.notna
and chain any
. It works because pd.notna
returns numpy ndarray. Therefore, the any
is actually ndarray.any
. The When calling numpy ndarray.any
with out axis
parameter, it will check on all dimensions. Therefore, it works on both list x
or single value x
x = [1,2,3,pd.NaT]
In [369]: pd.notna([x])
Out[369]: array([[ True, True, True, False]]) #it is 2d-array
In [370]: type(pd.notna([x]))
Out[370]: numpy.ndarray
In [373]: pd.notna([x]).any() #`ndarray.any` checks on all dimensions of this 2d-array
Out[373]: True
In [374]: pd.notna([x]).all() #`ndarray.all` checks on all dimensions of this 2d-array
Out[374]: False
On x
is single pd.NaT
x = pd.NaT
In [377]: pd.notna([x])
Out[377]: array([False]) #it is 1d-array
In [378]: pd.notna([x]).any()
Out[378]: False
In [379]: pd.notna([x]).all()
Out[379]: False
NumPy - Testing equality including np.nan, np.nat, np.NZERO and np.PZERO in a vectorized way
It seems comparing the underlying view
does exactly what I want:
def compare(x, y):
x, y = np.broadcast_arrays(x, y)
dtx = x.dtype
dty = y.dtype
if dtx != dty:
return np.zeros(x.shape, dtype=bool)
xv = x.view((np.uint8, x.itemsize))
yv = y.view((np.uint8, y.itemsize))
return np.all(xv == yv, axis=-1)
Efficiently checking if arbitrary object is NaN in Python / numpy / pandas?
pandas.isnull()
(also pd.isna()
, in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:
NaN in numeric arrays, None/NaN in object arrays
Quick example:
import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]:
0 False
1 True
2 False
dtype: bool
The idea of using numpy.nan
to represent missing values is something that pandas
introduced, which is why pandas
has the tools to deal with it.
Datetimes too (if you use pd.NaT
you won't need to specify the dtype)
In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')
In [25]: s
Out[25]:
0 2013-01-01 00:00:00
1 NaT
2 2013-01-02 09:30:00
dtype: datetime64[ns]``
In [26]: pd.isnull(s)
Out[26]:
0 False
1 True
2 False
dtype: bool
Check if NaT changes to datetime and update value
Use np.where after coercing the dates to datetime.
import numpy as np
df_1['date']=pd.to_datetime(df_1['date'])
df_2['date']=pd.to_datetime(df_2['date'])
df=pd.merge(df_2,df_1, how='left', on='order_id',suffixes=('_left', ''))
df=df.assign(date=np.where(df['date'].isna()|df['date_left'].sub(df['date']).dt.days.gt(0),df['date_left'],df['date'])).drop('date_left',1)
order_id date
0 123 2020-01-02
1 456 2021-01-01
2 789 2020-10-11
3 135 2020-06-01
Checking for both NaT or pandas timestamp
You can use isna
or fillna
method on it
import pandas as pd
import numpy as np
time = pd.Series(['2017-12-02 20:40:30','2017-12-02 00:00:00',np.nan])
time = time.apply(lambda x: pd.Timestamp(x))
print(time)
0 2017-12-02 20:40:30
1 2017-12-02 00:00:00
2 NaT
time.isna()
0 False
1 False
2 True
time.fillna("missing")
0 2017-12-02 20:40:30
1 2017-12-02 00:00:00
2 missing
How to properly declare 'NaT' in a python function to be applied on a pandas dataframe?
The below code does what you want. also, I made some changes to your code.
def some_fun(x):
if pd.isnull(x):
return 'something else'
else:
return 'something'
df['new_col'] = [some_fun(x) for x in df['date']]
unfortunately, np.isnat()
failed in my code. so I used pd.isnull()
instead according to this answer. if you think that'll work for you, use np.isnat()
.
Output:
Related Topics
Convert Numpy.Datetime64 to String Object in Python
Python: How to Calculate the Average Word Length in a Sentence Using the .Split Command
Convert a Standard Python Key Value Dictionary List to Pyspark Data Frame
Finding the Two Closest Numbers in a List Using Sorting
How to Pass a .Txt File to a Function in Python
How to Display Index During List Iteration With Django
Flask API Typeerror: Object of Type 'Response' Is Not Json Serializable
Large File Crashing on Jupyter Notebook
Count Duplicates Between 2 Lists
Convert the String 2.90K to 2900 or 5.2M to 5200000 in Pandas Dataframe
How to Get Rid of the B-Prefix in a String in Python
How to Correct Typeerror: Unicode-Objects Must Be Encoded Before Hashing
How to Drop Rows of Pandas Dataframe Whose Value in a Certain Column Is Nan
Fastest Way to Compute Image Dataset Channel Wise Mean and Standard Deviation in Python