Python Pandas .Isnull() Does Not Work on Nat in Object Dtype

Pandas to_json not outputting null for NaT

A bit hacky, but you could do this

 In [13]: s = Series(pd.to_datetime(['20130101',None]))

In [14]: s
0 2013-01-01 00:00:00
1 NaT
dtype: datetime64[ns]

In [15]: def f(x):
if isnull(x):
return 'null'
return x.isoformat() ....:

In [16]: s.apply(f).to_json()

Out[16]:
'{"0":"2013-01-01T00:00:00","1":"null"}'

Pandas DataFrame Replace NaT with None

Make the dtype object

dfTest2 = pd.DataFrame(dict(InvoiceDate=pd.to_datetime(['2017-06-01', pd.NaT])))

dfTest2.InvoiceDate.astype(object).where(dfTest2.InvoiceDate.notnull(), None)

0 2017-06-01 00:00:00
1 None
Name: InvoiceDate, dtype: object

Remove dtype datetime NaT

This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:

df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
'col1': list('ac'),
'col2': ['b', nan],
'date': (Timestamp('2014-08-14'), NaT)
})

df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

print(df1)

yields

  col1 col2        date
0 a b 2014-08-14
1 c

python-pandas: dealing with NaT type values in a date columns of pandas dataframe

Say you start with something like this:

df = pd.DataFrame({
'CUSTOMER_name': ['abc', 'def', 'abc', 'def', 'abc', 'fff'],
'DATE': ['NaT', 'NaT', '2010-04-15 19:09:08', '2011-01-25 15:29:37', '2010-04-10 12:29:02', 'NaT']})
df.DATE = pd.to_datetime(df.DATE)

(note that the only difference is adding fff mapped to NaT).

Then the following does what you ask:

>>> pd.to_datetime(df.DATE.groupby(df.CUSTOMER_name).min())
CUSTOMER_name
abc 2010-04-10 12:29:02
def 2011-01-25 15:29:37
fff NaT
Name: DATE, dtype: datetime64[ns]

This is because groupby-min already excludes missing data where applicable (albeit changing the format of the results), and the final pd.to_datetime coerces the result again to a datetime.


To get the date part of the result (which I think is a separate question), use .dt.date:

>>> pd.to_datetime(df.DATE.groupby(df.CUSTOMER_name).min()).dt.date
Out[19]:
CUSTOMER_name
abc 2010-04-10
def 2011-01-25
fff NaN
Name: DATE, dtype: object


Related Topics



Leave a reply



Submit