Pandas Out of Bounds Nanosecond Timestamp After Offset Rollforward Plus Adding a Month Offset

pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset

Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years

In [54]: pd.Timestamp.min
Out[54]: Timestamp('1677-09-22 00:12:43.145225')

In [55]: pd.Timestamp.max
Out[55]: Timestamp('2262-04-11 23:47:16.854775807')

And your value is out of this range 2262-05-01 00:00:00 and hence the outofbounds error

Straight out of: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

Workaround:

This will force the dates which are outside the bounds to NaT

pd.to_datetime(date_col_to_force, errors = 'coerce')

Out of bounds nanosecond timestamp: 1-01-01 00:00:00

You need to pass a format string as a param for to_datetime:

In[20]:
series.index = pd.to_datetime(series.index, format='%d-%m')
series.index

Out[20]:
DatetimeIndex(['1900-01-01', '1900-02-01', '1900-03-01', '1900-04-01',
'1900-05-01', '1900-06-01', '1900-07-01', '1900-08-01',
'1900-09-01', '1900-10-01', '1900-11-01', '1900-12-01',
'1900-01-02', '1900-02-02', '1900-03-02', '1900-04-02',
'1900-05-02', '1900-06-02', '1900-07-02', '1900-08-02',
'1900-09-02', '1900-10-02', '1900-11-02', '1900-12-02',
'1900-01-03', '1900-02-03', '1900-03-03', '1900-04-03',
'1900-05-03', '1900-06-03', '1900-07-03', '1900-08-03',
'1900-09-03', '1900-10-03', '1900-11-03', '1900-12-03'],
dtype='datetime64[ns]', name='Month', freq=None)

By default it will try to infer the format and it thinks the format is YYYY-MM-DD so the string 01-01 translates to year 1, month 1 which is out of bounds for nanoseconds

If you want a monotonically increasing index, which is what your data actually already looks like, we can just prepend the string '20' to the index and then convert:

In[24]:
series.index = '20' + series.index
series.index

Out[24]:
Index(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06',
'2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12',
'2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06',
'2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12',
'2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06',
'2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12'],
dtype='object')

In[25]:
series.index = pd.to_datetime(series.index, format='%Y-%m')
series

Out[25]:
2001-01-01 266.0
2001-02-01 145.9
2001-03-01 183.1
2001-04-01 119.3
2001-05-01 180.3
2001-06-01 168.5
2001-07-01 231.8
2001-08-01 224.5
2001-09-01 192.8
2001-10-01 122.9
2001-11-01 336.5
2001-12-01 185.9
2002-01-01 194.3
2002-02-01 149.5
2002-03-01 210.1
2002-04-01 273.3
2002-05-01 191.4
2002-06-01 287.0
2002-07-01 226.0
2002-08-01 303.6
2002-09-01 289.9
2002-10-01 421.6
2002-11-01 264.5
2002-12-01 342.3
2003-01-01 339.7
2003-02-01 440.4
2003-03-01 315.9
2003-04-01 439.3
2003-05-01 401.3
2003-06-01 437.4
2003-07-01 575.5
2003-08-01 407.6
2003-09-01 682.0
2003-10-01 475.3
2003-11-01 581.3
2003-12-01 646.9

Then your code will work:

In[28]:
X = series.rename("actual").to_frame()
X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill')
X

Out[28]:
actual
2001-01-01 266.0
2001-01-02 266.0
2001-01-03 266.0
2001-01-04 266.0
2001-01-05 266.0
2001-01-06 266.0
2001-01-07 266.0
2001-01-08 266.0
2001-01-09 266.0
2001-01-10 266.0
2001-01-11 266.0
2001-01-12 266.0
2001-01-13 266.0
2001-01-14 266.0
2001-01-15 266.0
2001-01-16 266.0
2001-01-17 266.0
2001-01-18 266.0
2001-01-19 266.0
2001-01-20 266.0
2001-01-21 266.0
2001-01-22 266.0
2001-01-23 266.0
2001-01-24 266.0
2001-01-25 266.0
2001-01-26 266.0
2001-01-27 266.0
2001-01-28 266.0
2001-01-29 266.0
2001-01-30 266.0
...
2003-11-02 581.3
2003-11-03 581.3
2003-11-04 581.3
2003-11-05 581.3
2003-11-06 581.3
2003-11-07 581.3
2003-11-08 581.3
2003-11-09 581.3
2003-11-10 581.3
2003-11-11 581.3
2003-11-12 581.3
2003-11-13 581.3
2003-11-14 581.3
2003-11-15 581.3
2003-11-16 581.3
2003-11-17 581.3
2003-11-18 581.3
2003-11-19 581.3
2003-11-20 581.3
2003-11-21 581.3
2003-11-22 581.3
2003-11-23 581.3
2003-11-24 581.3
2003-11-25 581.3
2003-11-26 581.3
2003-11-27 581.3
2003-11-28 581.3
2003-11-29 581.3
2003-11-30 581.3
2003-12-01 646.9

[1065 rows x 1 columns]

Getting Out of bounds nanosecond timestamp error while using fillna in python?

Problem is in pandas maximal Timestamp is:

print (pd.Timestamp.max)
2262-04-11 23:47:16.854775807

So in pandas is raised error:

print (pd.to_datetime('9999-12-31'))
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-12-31 00:00:00

Sample:

df1 = pd.DataFrame({'eventDate0142': [np.nan,  np.nan, '2016-04-01'], 
'statusDateTi': [np.nan, '2019-01-01', '2017-04-01']})

df3 = df1.apply(pd.to_datetime)

print (df3)
eventDate0142 statusDateTi
0 NaT NaT
1 NaT 2019-01-01
2 2016-04-01 2017-04-01

Possible solution is use pure python, but then all pandas datetimelike methods failed - all data convert to dates:

from datetime import  date

print (date.fromisoformat('9999-12-31'))
9999-12-31

df3['check_date'] = (df3['eventDate0142'].dt.date
.fillna(df3['statusDateTi'].dt.date
.fillna(date.fromisoformat('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01

print (df3.dtypes)
eventDate0142 datetime64[ns]
statusDateTi datetime64[ns]
check_date object
dtype: object

Or convert timestamps to daily periods by Series.dt.to_period, and then use Periods for representing out of bounds spans:

print (pd.Period('9999-12-31'))
9999-12-31

df3['check_date'] = (df3['eventDate0142'].dt.to_period('d')
.fillna(df3['statusDateTi'].dt.to_period('d')
.fillna(pd.Period('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01

print (df3.dtypes)
eventDate0142 datetime64[ns]
statusDateTi datetime64[ns]
check_date period[D]
dtype: object

If assign back all columns:

df3['eventDate0142'] = df3['eventDate0142'].dt.to_period('d')
df3['statusDateTi'] = df3['statusDateTi'].dt.to_period('d')
df3['check_date'] = (df3['eventDate0142']
.fillna(df3['statusDateTi']
.fillna(pd.Period('9999-12-31'))))
print (df3)
eventDate0142 statusDateTi check_date
0 NaT NaT 9999-12-31
1 NaT 2019-01-01 2019-01-01
2 2016-04-01 2017-04-01 2016-04-01

print (df3.dtypes)
eventDate0142 period[D]
statusDateTi period[D]
check_date period[D]
dtype: object

Return rows of df of particular month and year python pandas OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

You don't have to call month = pd.to_datetime(month).month() and year = pd.to_datetime(year).year().

Also '%B' returns full month name, eg. January. To return only abbreviation (Jan, Feb, ...), use %b:

def return_data_month_year(df, month, year):
return df[((df['order_date']).dt.strftime('%b') == month)&((df['order_date']).dt.strftime('%Y') == year)]

# to convert column 'order_date' to datetime:
df['order_date'] = pd.to_datetime( df['order_date'] )

print( return_data_month_year(df, 'Jan','2015') )

Prints:

  order_date Type
0 2015-01-01 A
9 2015-01-27 E

Out of bounds nanosecond timestamp

According to the documentation, the dayfirst field defaults to false:

dayfirst : boolean, default False

So it must have decided that there was a malformed date there and tried to interpret it as a time-of-day.

But even then it probably didn't think that 16 point anything could be hours or minutes, so it tried to convert it as seconds. But there is a extra decimal point so it gave up and said I don't like the fractional seconds. (Or something like that.)

I think you can fix it by giving an explicit format string or at least setting dayfirst.

How to convert data in pandas? (0021- to 2021)

You can simmply do:

from datetime import datetime
new['Days'] = (datetime.today() - pd.to_datetime(new['DME'], format=("00%y-%m-%d %H:%M:%S"))

pandas to_datetime() funtion is not converting for date 08-12-1600 in dataframe

That is because the provided dates are outside the range of Timestamp.

pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')

pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')

Details here

If we need the dates even out of range

Then we can convert them to period using below code

raw_data = {'Event': ['A','B','C','D', 'E'],
'dates': ['08-12-1600','26-09-1400', '04-11-1991','25-03-1991', '10-05-1991']}
df_1 = pd.DataFrame(raw_data, columns = ['Event', 'dates'])

def conv(x):
day,month,year = tuple(x.split('-'))
return pd.Period(year=int(year), month=int(month), day=int(day), freq="D")

df_1['dates'] = df_1.dates.apply(conv)
df_1

Output

    Event   dates
0 A 1600-12-08
1 B 1400-09-26
2 C 1991-11-04
3 D 1991-03-25
4 E 1991-05-10

If we can ignore dates outside range

df_1['dates'] = pd.to_datetime(df_1.dates, errors='coerce')
df_1

Output

    Event   dates
0 A NaT
1 B NaT
2 C 1991-04-11
3 D 1991-03-25
4 E 1991-10-05

Bonus Fact

Why timestamp can hold values for around 584 years 1677-2262?

Since timestamps provides nano second precision and is stored in 64-bit integer, hence it can store around 584 years with this nano second resolution in 64-bit int space.



Related Topics



Leave a reply



Submit