Converting Between Datetime, Timestamp and Datetime64

Converting between datetime, Timestamp and datetime64

To convert numpy.datetime64 to datetime object that represents time in UTC on numpy-1.8:

>>> from datetime import datetime
>>> import numpy as np
>>> dt = datetime.utcnow()
>>> dt
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> dt64 = np.datetime64(dt)
>>> ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
>>> ts
1354650685.3624549
>>> datetime.utcfromtimestamp(ts)
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> np.__version__
'1.8.0.dev-7b75899'

The above example assumes that a naive datetime object is interpreted by np.datetime64 as time in UTC.


To convert datetime to np.datetime64 and back (numpy-1.6):

>>> np.datetime64(datetime.utcnow()).astype(datetime)
datetime.datetime(2012, 12, 4, 13, 34, 52, 827542)

It works both on a single np.datetime64 object and a numpy array of np.datetime64.

Think of np.datetime64 the same way you would about np.int8, np.int16, etc and apply the same methods to convert between Python objects such as int, datetime and corresponding numpy objects.

Your "nasty example" works correctly:

>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
datetime.datetime(2002, 6, 28, 0, 0)
>>> numpy.__version__
'1.6.2' # current version available via pip install numpy

I can reproduce the long value on numpy-1.8.0 installed as:

pip install git+https://github.com/numpy/numpy.git#egg=numpy-dev

The same example:

>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
1025222400000000000L
>>> numpy.__version__
'1.8.0.dev-7b75899'

It returns long because for numpy.datetime64 type .astype(datetime) is equivalent to .astype(object) that returns Python integer (long) on numpy-1.8.

To get datetime object you could:

>>> dt64.dtype
dtype('<M8[ns]')
>>> ns = 1e-9 # number of seconds in a nanosecond
>>> datetime.utcfromtimestamp(dt64.astype(int) * ns)
datetime.datetime(2002, 6, 28, 0, 0)

To get datetime64 that uses seconds directly:

>>> dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100', 's')
>>> dt64.dtype
dtype('<M8[s]')
>>> datetime.utcfromtimestamp(dt64.astype(int))
datetime.datetime(2002, 6, 28, 0, 0)

The numpy docs say that the datetime API is experimental and may change in future numpy versions.

How to convert numpy datetime64 into datetime

Borrowing from
Converting between datetime, Timestamp and datetime64

In [220]: x
Out[220]: numpy.datetime64('2012-06-17T23:00:05.453000000-0700')

In [221]: datetime.datetime.utcfromtimestamp(x.tolist()/1e9)
Out[221]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)

Accounting for timezones I think that's right. Looks rather clunky though.

Using int() is more explicit (I think) than tolist()):

In [294]: datetime.datetime.utcfromtimestamp(int(x)/1e9)
Out[294]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)

or to get datetime in local:

In [295]: datetime.datetime.fromtimestamp(x.astype('O')/1e9)

But in the test_datetime.py file
https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_datetime.py

I find some other options - first convert the general datetime64 to one of the format that specifies units:

In [296]: x.astype('M8[D]').astype('O')
Out[296]: datetime.date(2012, 6, 18)

In [297]: x.astype('M8[ms]').astype('O')
Out[297]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)

This works for arrays:

In [303]: np.array([[x,x],[x,x]],dtype='M8[ms]').astype('O')[0,1]
Out[303]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)

Error when converting numpy.datetime64 to int

The difference is whether it include time values, such as hours, minutes, and seconds.

When you try to convert datetime (or np.datetime64) to int (or np.int64), the value will be epoch time, which is a value of seconds from 1970-01-01 00:00:00 (utc).

(See epoch time calculator: https://www.epochconverter.com/)

However, if you try to convert "2017-09-26" to int, it is hard to calculate how many seconds from 1970-01-01 00:00:00 because the value does not include hour, minutes, seconds information and timezone information.

To make it convertable, you have to add time information, as follows:

a = np.datetime64('2017-09-26T00:00:00.000000000')
print(int(a)) # 1506384000000000000 --> This is an epoch time for 2017-09-26 00:00:00

a = np.datetime64('2017-09-26','us').astype(np.int64) # not int, use np.int64
print(a) # 1506384000000000 -> This is also a epoch time for 2017-09-26 00:00:00

In addition, please use astype(np.int64) instead of astype(int) to convert it to exact epoch time when your value is saved as datetime64. If you use int, this will return the number of days from 1970-01-01.

a = np.datetime64('2017-09-26T15:20:11.546205184').astype(int)
print(a) # 1072585728 -> not an epoch time, but days from 1970-01-01

a = np.datetime64('2017-09-26T15:20:11.546205184').astype(np.int64)
print(a) # 1506439211546205184 -> a correct epoch time of 2017-09-26 15:20:11 with miliseconds
  • edited with consideration of @FObersteiner 's comment, Thanks!

Why is datetime64 converted to timedelta64 when converting into a YYYY-MM string

Don't use a custom function, use strftime with %-m (the minus strips the leading zeros):

series_nat.dt.strftime('%Y-%-m')

output:

0     2019-4
1 2017-12
2 NaN
dtype: object

%m would keep the leading zeros:

series_nat.dt.strftime('%Y-%m')

output:

0    2019-04
1 2017-12
2 NaN
dtype: object

Why is pd.Timestamp converted to np.datetime64 when calling '.values'?

Found a workaround - using .array instead of .values (docs)

print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

This prevents the conversion and gives me access to the objects I wanted to use.

Pyarrow timestamp keeps converting to 1970

As FObersteiner mentioned, the issue here was because I was telling pyarrow to convert from an assumed microsecond-level timestamp. In case anyone encounters this issue in the future, it's as simple as changing the 'us' above to 's'. And if you want millisecond-level timestamping, you can do it like so:

from datetime import datetime
import pyarrow as pa
import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.uniform(size=(20,10)))
df = pd.DataFrame(data)
df.columns = [str(i) for i in range(data.shape[1])]
schema = [(str(i), pa.float32()) for i in range(data.shape[1])]
schema = pa.schema(schema)

ts = datetime.now().timestamp()*1000
print('DateTime timestamp:', ts)
table = pa.Table.from_pandas(df, schema)
pa_ts = pa.array([ts] * len(table), pa.timestamp('ms'))
print('PyArrow timestamp:', pa_ts)

Converting between datetime and Pandas Timestamp objects

You can use the to_pydatetime method to be more explicit:

In [11]: ts = pd.Timestamp('2014-01-23 00:00:00', tz=None)

In [12]: ts.to_pydatetime()
Out[12]: datetime.datetime(2014, 1, 23, 0, 0)

It's also available on a DatetimeIndex:

In [13]: rng = pd.date_range('1/10/2011', periods=3, freq='D')

In [14]: rng.to_pydatetime()
Out[14]:
array([datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 11, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0)], dtype=object)


Related Topics



Leave a reply



Submit