Converting between datetime, Timestamp and datetime64
To convert numpy.datetime64
to datetime
object that represents time in UTC on numpy-1.8
:
>>> from datetime import datetime
>>> import numpy as np
>>> dt = datetime.utcnow()
>>> dt
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> dt64 = np.datetime64(dt)
>>> ts = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
>>> ts
1354650685.3624549
>>> datetime.utcfromtimestamp(ts)
datetime.datetime(2012, 12, 4, 19, 51, 25, 362455)
>>> np.__version__
'1.8.0.dev-7b75899'
The above example assumes that a naive datetime
object is interpreted by np.datetime64
as time in UTC.
To convert datetime
to np.datetime64
and back (numpy-1.6
):
>>> np.datetime64(datetime.utcnow()).astype(datetime)
datetime.datetime(2012, 12, 4, 13, 34, 52, 827542)
It works both on a single np.datetime64
object and a numpy array of np.datetime64
.
Think of np.datetime64
the same way you would about np.int8
, np.int16
, etc and apply the same methods to convert between Python objects such as int
, datetime
and corresponding numpy objects.
Your "nasty example" works correctly:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
datetime.datetime(2002, 6, 28, 0, 0)
>>> numpy.__version__
'1.6.2' # current version available via pip install numpy
I can reproduce the long
value on numpy-1.8.0
installed as:
pip install git+https://github.com/numpy/numpy.git#egg=numpy-dev
The same example:
>>> from datetime import datetime
>>> import numpy
>>> numpy.datetime64('2002-06-28T01:00:00.000000000+0100').astype(datetime)
1025222400000000000L
>>> numpy.__version__
'1.8.0.dev-7b75899'
It returns long
because for numpy.datetime64
type .astype(datetime)
is equivalent to .astype(object)
that returns Python integer (long
) on numpy-1.8
.
To get datetime
object you could:
>>> dt64.dtype
dtype('<M8[ns]')
>>> ns = 1e-9 # number of seconds in a nanosecond
>>> datetime.utcfromtimestamp(dt64.astype(int) * ns)
datetime.datetime(2002, 6, 28, 0, 0)
To get datetime64
that uses seconds directly:
>>> dt64 = numpy.datetime64('2002-06-28T01:00:00.000000000+0100', 's')
>>> dt64.dtype
dtype('<M8[s]')
>>> datetime.utcfromtimestamp(dt64.astype(int))
datetime.datetime(2002, 6, 28, 0, 0)
The numpy docs say that the datetime API is experimental and may change in future numpy versions.
How to convert numpy datetime64 into datetime
Borrowing from
Converting between datetime, Timestamp and datetime64
In [220]: x
Out[220]: numpy.datetime64('2012-06-17T23:00:05.453000000-0700')
In [221]: datetime.datetime.utcfromtimestamp(x.tolist()/1e9)
Out[221]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)
Accounting for timezones I think that's right. Looks rather clunky though.
Using int()
is more explicit (I think) than tolist())
:
In [294]: datetime.datetime.utcfromtimestamp(int(x)/1e9)
Out[294]: datetime.datetime(2012, 6, 18, 6, 0, 5, 452999)
or to get datetime in local:
In [295]: datetime.datetime.fromtimestamp(x.astype('O')/1e9)
But in the test_datetime.py
file
https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_datetime.py
I find some other options - first convert the general datetime64
to one of the format that specifies units:
In [296]: x.astype('M8[D]').astype('O')
Out[296]: datetime.date(2012, 6, 18)
In [297]: x.astype('M8[ms]').astype('O')
Out[297]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)
This works for arrays:
In [303]: np.array([[x,x],[x,x]],dtype='M8[ms]').astype('O')[0,1]
Out[303]: datetime.datetime(2012, 6, 18, 6, 0, 5, 453000)
Error when converting numpy.datetime64 to int
The difference is whether it include time values, such as hours, minutes, and seconds.
When you try to convert datetime (or np.datetime64
) to int
(or np.int64
), the value will be epoch time, which is a value of seconds from 1970-01-01 00:00:00 (utc).
(See epoch time calculator: https://www.epochconverter.com/)
However, if you try to convert "2017-09-26" to int, it is hard to calculate how many seconds from 1970-01-01 00:00:00 because the value does not include hour, minutes, seconds information and timezone information.
To make it convertable, you have to add time information, as follows:
a = np.datetime64('2017-09-26T00:00:00.000000000')
print(int(a)) # 1506384000000000000 --> This is an epoch time for 2017-09-26 00:00:00
a = np.datetime64('2017-09-26','us').astype(np.int64) # not int, use np.int64
print(a) # 1506384000000000 -> This is also a epoch time for 2017-09-26 00:00:00
In addition, please use astype(np.int64)
instead of astype(int)
to convert it to exact epoch time when your value is saved as datetime64
. If you use int
, this will return the number of days from 1970-01-01.
a = np.datetime64('2017-09-26T15:20:11.546205184').astype(int)
print(a) # 1072585728 -> not an epoch time, but days from 1970-01-01
a = np.datetime64('2017-09-26T15:20:11.546205184').astype(np.int64)
print(a) # 1506439211546205184 -> a correct epoch time of 2017-09-26 15:20:11 with miliseconds
- edited with consideration of @FObersteiner 's comment, Thanks!
Why is datetime64 converted to timedelta64 when converting into a YYYY-MM string
Don't use a custom function, use strftime
with %-m
(the minus strips the leading zeros):
series_nat.dt.strftime('%Y-%-m')
output:
0 2019-4
1 2017-12
2 NaN
dtype: object
%m
would keep the leading zeros:
series_nat.dt.strftime('%Y-%m')
output:
0 2019-04
1 2017-12
2 NaN
dtype: object
Why is pd.Timestamp converted to np.datetime64 when calling '.values'?
Found a workaround - using .array
instead of .values
(docs)
print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
This prevents the conversion and gives me access to the objects I wanted to use.
Pyarrow timestamp keeps converting to 1970
As FObersteiner mentioned, the issue here was because I was telling pyarrow to convert from an assumed microsecond-level timestamp. In case anyone encounters this issue in the future, it's as simple as changing the 'us' above to 's'. And if you want millisecond-level timestamping, you can do it like so:
from datetime import datetime
import pyarrow as pa
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.uniform(size=(20,10)))
df = pd.DataFrame(data)
df.columns = [str(i) for i in range(data.shape[1])]
schema = [(str(i), pa.float32()) for i in range(data.shape[1])]
schema = pa.schema(schema)
ts = datetime.now().timestamp()*1000
print('DateTime timestamp:', ts)
table = pa.Table.from_pandas(df, schema)
pa_ts = pa.array([ts] * len(table), pa.timestamp('ms'))
print('PyArrow timestamp:', pa_ts)
Converting between datetime and Pandas Timestamp objects
You can use the to_pydatetime method to be more explicit:
In [11]: ts = pd.Timestamp('2014-01-23 00:00:00', tz=None)
In [12]: ts.to_pydatetime()
Out[12]: datetime.datetime(2014, 1, 23, 0, 0)
It's also available on a DatetimeIndex:
In [13]: rng = pd.date_range('1/10/2011', periods=3, freq='D')
In [14]: rng.to_pydatetime()
Out[14]:
array([datetime.datetime(2011, 1, 10, 0, 0),
datetime.datetime(2011, 1, 11, 0, 0),
datetime.datetime(2011, 1, 12, 0, 0)], dtype=object)
Related Topics
Saving Utf-8 Texts With Json.Dumps as Utf8, Not as \U Escape Sequence
In Python, How to Determine If an Object Is Iterable
Word Boundary With Words Starting or Ending With Special Characters Gives Unexpected Results
Spawning Multiple Instances of the Same Object Concurrently in Python
How to Compare Two Lists in Python and Return Matches
How to Find All Matches to a Regular Expression in Python
Python Pandas Error Tokenizing Data
Pretty-Print a Numpy Array Without Scientific Notation and With Given Precision
How to Format a Floating Number to Fixed Width in Python
Retrieving the Output of Subprocess.Call()
How to Implement Nested Dictionaries
Sort a List by Multiple Attributes
How to Iterate Over Files in a Given Directory
Convert a Unicode String to a String in Python (Containing Extra Symbols)
Convert Python Dict into a Dataframe
Why Can't Python'S Raw String Literals End With a Single Backslash