pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset
Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
In [54]: pd.Timestamp.min
Out[54]: Timestamp('1677-09-22 00:12:43.145225')
In [55]: pd.Timestamp.max
Out[55]: Timestamp('2262-04-11 23:47:16.854775807')
And your value is out of this range 2262-05-01 00:00:00 and hence the outofbounds error
Straight out of: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
Workaround:
This will force the dates which are outside the bounds to NaT
pd.to_datetime(date_col_to_force, errors = 'coerce')
Out of bounds nanosecond timestamp: 1-01-01 00:00:00
You need to pass a format string as a param for to_datetime
:
In[20]:
series.index = pd.to_datetime(series.index, format='%d-%m')
series.index
Out[20]:
DatetimeIndex(['1900-01-01', '1900-02-01', '1900-03-01', '1900-04-01',
'1900-05-01', '1900-06-01', '1900-07-01', '1900-08-01',
'1900-09-01', '1900-10-01', '1900-11-01', '1900-12-01',
'1900-01-02', '1900-02-02', '1900-03-02', '1900-04-02',
'1900-05-02', '1900-06-02', '1900-07-02', '1900-08-02',
'1900-09-02', '1900-10-02', '1900-11-02', '1900-12-02',
'1900-01-03', '1900-02-03', '1900-03-03', '1900-04-03',
'1900-05-03', '1900-06-03', '1900-07-03', '1900-08-03',
'1900-09-03', '1900-10-03', '1900-11-03', '1900-12-03'],
dtype='datetime64[ns]', name='Month', freq=None)
By default it will try to infer the format and it thinks the format is YYYY-MM-DD
so the string 01-01
translates to year 1, month 1 which is out of bounds for nanoseconds
If you want a monotonically increasing index, which is what your data actually already looks like, we can just prepend the string '20'
to the index and then convert:
In[24]:
series.index = '20' + series.index
series.index
Out[24]:
Index(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06',
'2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12',
'2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06',
'2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12',
'2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06',
'2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12'],
dtype='object')
In[25]:
series.index = pd.to_datetime(series.index, format='%Y-%m')
series
Out[25]:
2001-01-01 266.0
2001-02-01 145.9
2001-03-01 183.1
2001-04-01 119.3
2001-05-01 180.3
2001-06-01 168.5
2001-07-01 231.8
2001-08-01 224.5
2001-09-01 192.8
2001-10-01 122.9
2001-11-01 336.5
2001-12-01 185.9
2002-01-01 194.3
2002-02-01 149.5
2002-03-01 210.1
2002-04-01 273.3
2002-05-01 191.4
2002-06-01 287.0
2002-07-01 226.0
2002-08-01 303.6
2002-09-01 289.9
2002-10-01 421.6
2002-11-01 264.5
2002-12-01 342.3
2003-01-01 339.7
2003-02-01 440.4
2003-03-01 315.9
2003-04-01 439.3
2003-05-01 401.3
2003-06-01 437.4
2003-07-01 575.5
2003-08-01 407.6
2003-09-01 682.0
2003-10-01 475.3
2003-11-01 581.3
2003-12-01 646.9
Then your code will work:
In[28]:
X = series.rename("actual").to_frame()
X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill')
X
Out[28]:
actual
2001-01-01 266.0
2001-01-02 266.0
2001-01-03 266.0
2001-01-04 266.0
2001-01-05 266.0
2001-01-06 266.0
2001-01-07 266.0
2001-01-08 266.0
2001-01-09 266.0
2001-01-10 266.0
2001-01-11 266.0
2001-01-12 266.0
2001-01-13 266.0
2001-01-14 266.0
2001-01-15 266.0
2001-01-16 266.0
2001-01-17 266.0
2001-01-18 266.0
2001-01-19 266.0
2001-01-20 266.0
2001-01-21 266.0
2001-01-22 266.0
2001-01-23 266.0
2001-01-24 266.0
2001-01-25 266.0
2001-01-26 266.0
2001-01-27 266.0
2001-01-28 266.0
2001-01-29 266.0
2001-01-30 266.0
...
2003-11-02 581.3
2003-11-03 581.3
2003-11-04 581.3
2003-11-05 581.3
2003-11-06 581.3
2003-11-07 581.3
2003-11-08 581.3
2003-11-09 581.3
2003-11-10 581.3
2003-11-11 581.3
2003-11-12 581.3
2003-11-13 581.3
2003-11-14 581.3
2003-11-15 581.3
2003-11-16 581.3
2003-11-17 581.3
2003-11-18 581.3
2003-11-19 581.3
2003-11-20 581.3
2003-11-21 581.3
2003-11-22 581.3
2003-11-23 581.3
2003-11-24 581.3
2003-11-25 581.3
2003-11-26 581.3
2003-11-27 581.3
2003-11-28 581.3
2003-11-29 581.3
2003-11-30 581.3
2003-12-01 646.9
[1065 rows x 1 columns]
Getting Out of bounds nanosecond timestamp error while using fillna in python?
Problem is in pandas maximal Timestamp is:
print (pd.Timestamp.max)
2262-04-11 23:47:16.854775807
So in pandas is raised error:
print (pd.to_datetime('9999-12-31'))
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
Sample:
df1 = pd.DataFrame({'eventDate0142': [np.nan, np.nan, '2016-04-01'],
'statusDateTi': [np.nan, '2019-01-01', '2017-04-01']})
df3 = df1.apply(pd.to_datetime)
print (df3)
eventDate0142 statusDateTi
0 NaT NaT
1 NaT 2019-01-01
2 2016-04-01 2017-04-01
Possible solution is use pure python, but then all pandas datetimelike methods failed - all data convert to date
s:
from datetime import date
print (date.fromisoformat('9999-12-31'))
9999-12-31
df3['check_date'] = (df3['eventDate0142'].dt.date
.fillna(df3['statusDateTi'].dt.date
.fillna(date.fromisoformat('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01
print (df3.dtypes)
eventDate0142 datetime64[ns]
statusDateTi datetime64[ns]
check_date object
dtype: object
Or convert timestamps to daily periods by Series.dt.to_period
, and then use Periods
for representing out of bounds spans:
print (pd.Period('9999-12-31'))
9999-12-31
df3['check_date'] = (df3['eventDate0142'].dt.to_period('d')
.fillna(df3['statusDateTi'].dt.to_period('d')
.fillna(pd.Period('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01
print (df3.dtypes)
eventDate0142 datetime64[ns]
statusDateTi datetime64[ns]
check_date period[D]
dtype: object
If assign back all columns:
df3['eventDate0142'] = df3['eventDate0142'].dt.to_period('d')
df3['statusDateTi'] = df3['statusDateTi'].dt.to_period('d')
df3['check_date'] = (df3['eventDate0142']
.fillna(df3['statusDateTi']
.fillna(pd.Period('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01
print (df3.dtypes)
eventDate0142 period[D]
statusDateTi period[D]
check_date period[D]
dtype: object
Return rows of df of particular month and year python pandas OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00
You don't have to call month = pd.to_datetime(month).month()
and year = pd.to_datetime(year).year()
.
Also '%B'
returns full month name, eg. January
. To return only abbreviation (Jan
, Feb
, ...), use %b
:
def return_data_month_year(df, month, year):
return df[((df['order_date']).dt.strftime('%b') == month)&((df['order_date']).dt.strftime('%Y') == year)]
# to convert column 'order_date' to datetime:
df['order_date'] = pd.to_datetime( df['order_date'] )
print( return_data_month_year(df, 'Jan','2015') )
Prints:
order_date Type
0 2015-01-01 A
9 2015-01-27 E
Out of bounds nanosecond timestamp
According to the documentation, the dayfirst field defaults to false:
dayfirst : boolean, default False
So it must have decided that there was a malformed date there and tried to interpret it as a time-of-day.
But even then it probably didn't think that 16 point anything could be hours or minutes, so it tried to convert it as seconds. But there is a extra decimal point so it gave up and said I don't like the fractional seconds. (Or something like that.)
I think you can fix it by giving an explicit format string or at least setting dayfirst.
How to convert data in pandas? (0021- to 2021)
You can simmply do:
from datetime import datetime
new['Days'] = (datetime.today() - pd.to_datetime(new['DME'], format=("00%y-%m-%d %H:%M:%S"))
pandas to_datetime() funtion is not converting for date 08-12-1600 in dataframe
That is because the provided dates are outside the range of Timestamp.
pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
Details here
If we need the dates even out of range
Then we can convert them to period using below code
raw_data = {'Event': ['A','B','C','D', 'E'],
'dates': ['08-12-1600','26-09-1400', '04-11-1991','25-03-1991', '10-05-1991']}
df_1 = pd.DataFrame(raw_data, columns = ['Event', 'dates'])
def conv(x):
day,month,year = tuple(x.split('-'))
return pd.Period(year=int(year), month=int(month), day=int(day), freq="D")
df_1['dates'] = df_1.dates.apply(conv)
df_1
Output
Event dates
0 A 1600-12-08
1 B 1400-09-26
2 C 1991-11-04
3 D 1991-03-25
4 E 1991-05-10
If we can ignore dates outside range
df_1['dates'] = pd.to_datetime(df_1.dates, errors='coerce')
df_1
Output
Event dates
0 A NaT
1 B NaT
2 C 1991-04-11
3 D 1991-03-25
4 E 1991-10-05
Bonus Fact
Why timestamp can hold values for around 584 years 1677-2262?
Since timestamps provides nano second precision and is stored in 64-bit integer, hence it can store around 584 years with this nano second resolution in 64-bit int space.
Related Topics
Making Heatmap from Pandas Dataframe
Convert Columns to String in Pandas
How to Convert a Timezone Aware String to Datetime in Python Without Dateutil
Efficient Numpy 2D Array Construction from 1D Array
Conda Command Is Not Recognized on Windows 10
Using Requests with Tls Doesn't Give Sni Support
How to Get the "Id" After Insert into MySQL Database with Python
Check If a Number Is Int or Float
Why am I Getting Importerror: No Module Named Pip ' Right After Installing Pip
How to Write a File or Data to an S3 Object Using Boto3
Salt and Hash a Password in Python
Sorting by a Custom List in Pandas
Check What Files Are Open in Python
How to Count the Nan Values in a Column in Pandas Dataframe