Pandas to_json not outputting null for NaT
A bit hacky, but you could do this
In [13]: s = Series(pd.to_datetime(['20130101',None]))
In [14]: s
0 2013-01-01 00:00:00
1 NaT
dtype: datetime64[ns]
In [15]: def f(x):
if isnull(x):
return 'null'
return x.isoformat() ....:
In [16]: s.apply(f).to_json()
Out[16]:
'{"0":"2013-01-01T00:00:00","1":"null"}'
Pandas DataFrame Replace NaT with None
Make the dtype
object
dfTest2 = pd.DataFrame(dict(InvoiceDate=pd.to_datetime(['2017-06-01', pd.NaT])))
dfTest2.InvoiceDate.astype(object).where(dfTest2.InvoiceDate.notnull(), None)
0 2017-06-01 00:00:00
1 None
Name: InvoiceDate, dtype: object
Remove dtype datetime NaT
This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
'col1': list('ac'),
'col2': ['b', nan],
'date': (Timestamp('2014-08-14'), NaT)
})
df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
print(df1)
yields
col1 col2 date
0 a b 2014-08-14
1 c
python-pandas: dealing with NaT type values in a date columns of pandas dataframe
Say you start with something like this:
df = pd.DataFrame({
'CUSTOMER_name': ['abc', 'def', 'abc', 'def', 'abc', 'fff'],
'DATE': ['NaT', 'NaT', '2010-04-15 19:09:08', '2011-01-25 15:29:37', '2010-04-10 12:29:02', 'NaT']})
df.DATE = pd.to_datetime(df.DATE)
(note that the only difference is adding fff
mapped to NaT
).
Then the following does what you ask:
>>> pd.to_datetime(df.DATE.groupby(df.CUSTOMER_name).min())
CUSTOMER_name
abc 2010-04-10 12:29:02
def 2011-01-25 15:29:37
fff NaT
Name: DATE, dtype: datetime64[ns]
This is because groupby
-min
already excludes missing data where applicable (albeit changing the format of the results), and the final pd.to_datetime
coerces the result again to a datetime
.
To get the date part of the result (which I think is a separate question), use .dt.date
:
>>> pd.to_datetime(df.DATE.groupby(df.CUSTOMER_name).min()).dt.date
Out[19]:
CUSTOMER_name
abc 2010-04-10
def 2011-01-25
fff NaN
Name: DATE, dtype: object
Related Topics
How to Change Python Version in Anaconda Spyder
Python - How to Make User Input Not Case Sensitive
Pandas: Subtracting Two Date Columns and the Result Being an Integer
Check Json Data Is None in Python
Pyspark - Sum a Column in Dataframe and Return Results as Int
How to Check If a String Column in Pyspark Dataframe Is All Numeric
What Does Sqlite3.Operationalerror: Near "-": Syntax Error Mean
How to Substitute Value for a Variable in a Json in Python
How to Create a Date Picker in Tkinter
Tensorflow - Valueerror: Failed to Convert a Numpy Array to a Tensor (Unsupported Object Type Float)
How to Update a Pyspark Dataframe With New Values from Another Dataframe
Counting Non Zero Values in Each Column of a Dataframe in Python
How to Map the Differences Between Two Strings
How to Display Last 2 Digits from a Number in Python
Splitting a Phone Number into a List of Digits: Python
Using SQL Server Stored Procedures from Python (Pyodbc)