Convert unix time to readable date in pandas dataframe
These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object
How to convert unix epoch time to datetime with timezone in pandas
- The question pertains to
pandas
, the pure python version is Converting unix timestamp string to readable date pandas.Series.dt.tz_localize
&pandas.Series.dt.tz_convert
are both vectorized functions, which don't require using.apply()
.- The vectorized implementation is 8159 times faster than
.apply()
. - The
.dt
accessor must be used.
- The vectorized implementation is 8159 times faster than
- It may be better to use
pd.to_datetime(df['DT'], unit='s', utc=True)
and remove.dt.tz_localize('UTC')
.
import pandas as pd
# test dataframe with 1M rows
df = pd.DataFrame({'DT': [1349720105, 1349806505, 1349892905, 1349979305, 1350065705]})
df['DT'] = pd.to_datetime(df['DT'], unit='s')
df = pd.concat([df]*200000).reset_index(drop=True)
# display(df.head()
DT
2012-10-08 18:15:05
2012-10-09 18:15:05
2012-10-10 18:15:05
2012-10-11 18:15:05
2012-10-12 18:15:05
# convert the column
df['DT'] = df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
# display(df.head())
DT
2012-10-08 20:15:05+02:00
2012-10-09 20:15:05+02:00
2012-10-10 20:15:05+02:00
2012-10-11 20:15:05+02:00
2012-10-12 20:15:05+02:00
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DT 1000000 non-null datetime64[ns, Europe/Amsterdam]
dtypes: datetime64[ns, Europe/Amsterdam](1)
memory usage: 7.6 MB
Alternative
- This option is more concise and localizes to
'UTC'
when converting to adatetime
dtype
withpandas.to_datetime()
.
df['DT'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
- The most time consuming aspect of the original implementation from the OP was
x['dt'].tz_localize('UTC')
within the.apply()
- The following code runs in about the same amount of time, within a few milliseconds.
df['DT_1'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
df['DT_2'] = pd.to_datetime(df['DT'], unit='s', utc=True).apply(lambda x: x.tz_convert('Europe/Amsterdam'))
%%timeit
Testing
- 1M rows
- This tests the comparable vectorized version, against the version with
.apply()
from the OP, where'DT'
has already been converted to adatetime
dtype
.
%%timeit
df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
[out]:
4.4 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.apply(lambda x: x['DT'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
[out]:
35.9 s ± 572 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Convert Pandas time series: UNIX epoch to datetime
The issue was that the elements were strings, and not ints. Apparently, pd.to_datetime()
isn't smart enough to convert from strings to datetime.
My solution was this:
>> val.astype('int').astype("datetime64[s]")
0 2015-08-27 02:51:15
1 2015-08-27 02:56:31
2 2015-08-27 03:20:38
3 2015-08-31 05:25:20
dtype: datetime64[ns]
Convert Unix epoch time to datetime in Pandas
Your problem has to do with converting the values that you've read (looks like seconds after Unix epoch, i.e. January 1, 1970) into datetime
objects. The error you are getting is because your times are just a floating-point number, but that is not how you are trying to handle them.
Assuming these are seconds after Unix epoch, you need to create your datetimes using a timedelta
from a start point defined as the Unix epoch:
from datetime import datetime, timedelta
start = datetime(1970, 1, 1) # Unix epoch start time
df['datetime'] = df.Time.apply(lambda x: start + timedelta(seconds=x))
The last line creates a new column in your dataframe called 'datetime'
and populates it by reading the 'Time'
column in as x
, and calculating the time x
seconds after Unix epoch.
Note: if you want to convert these datetime
objects into the time string that you specified, we can do this by creating a new column with strftime()
:
df['string_time'] = df.datetime.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
Convert unix time stamp since epoch to date in dataframe
Your times are not https://en.wikipedia.org/wiki/ISO_8601.
You can provide the header when creating the dataframe and apply a transformation to your time column:
import pandas as pd
import datetime
data = [[ 1520942700, 174.10, 174.62, 174.33, 174.50, 169.447085],
[ 1520942640, 174.23, 174.46, 174.23, 174.46, 25.634600],
[ 1520942580, 173.56, 174.60, 173.56, 174.52, 298.726679],
[ 1520942520, 173.50, 174.11, 174.11, 173.55, 672.756311],
[ 1520942460, 174.11, 174.81, 174.80, 174.11, 441.636742]]
# create with headers
df = pd.DataFrame(data,None, ['time', 'low', 'high', 'open', 'close', 'volume'])
# convert to datetime (adapted from https://stackoverflow.com/a/26763810/7505395)
df['time'] = df['time'].apply(lambda x:datetime.datetime.fromtimestamp(x))
print(df)
Output:
time low high open close volume
0 2018-03-13 13:05:00 174.10 174.62 174.33 174.50 169.447085
1 2018-03-13 13:04:00 174.23 174.46 174.23 174.46 25.634600
2 2018-03-13 13:03:00 173.56 174.60 173.56 174.52 298.726679
3 2018-03-13 13:02:00 173.50 174.11 174.11 173.55 672.756311
4 2018-03-13 13:01:00 174.11 174.81 174.80 174.11 441.636742
pandas datetime to unix timestamp seconds
I think you misunderstood what the argument is for. The purpose of origin='unix'
is to convert an integer timestamp to datetime
, not the other way.
pd.to_datetime(1.547559e+09, unit='s', origin='unix')
# Timestamp('2019-01-15 13:30:00')
Here are some options:
Option 1: integer division
Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.
pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')
Pros:
- super fast
Cons:
- makes assumptions about how pandas internally stores dates
Option 2: recommended by pandas
Pandas docs recommend using the following method:
# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])
# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1547559000], dtype='int64')
Pros:
- "idiomatic", recommended by the library
Cons:
- unweildy
- not as performant as integer division
Option 3: pd.Timestamp
If you have a single date string, you can use pd.Timestamp
as shown in the other answer:
pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0
If you have to cooerce multiple datetimes (where pd.to_datetime
is your only option), you can initialize and map:
pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')
Pros:
- best method for a single datetime string
- easy to remember
Cons:
- not as performant as integer division
How to properly convert a UNIX timestamp to pd.Timestamp object via pandas?
Problem is quite simple but not obvious. utcnow()
gives you a naive datetime object, meaning that it is not aware of the fact that it represents UTC. Therefor, once you call .timestamp()
, Python assumes local time because the datetime object is naive! Thus converts to UTC first before calculating Unix time, adding any UTC offset that your local tz might have.
Solution: construct a datetime object that is aware of UTC. Same goes for fromtimestamp
: set UTC as tz !
from datetime import datetime, timezone
import pandas as pd
d = datetime.now(timezone.utc)
timestamp = d.timestamp()
assert datetime.fromtimestamp(timestamp, tz=timezone.utc) == d
assert pd.to_datetime(timestamp, unit="s", utc=True).to_pydatetime() == d
pandas
is kind of a different story; naive datetime is treated internally as UTC, so pd.to_datetime(timestamp, unit="s")
gives you the UTC timestamp. But the conversion to Python datetime does not take into account that Python will treat it as local time again... Here, keeping it consistent and setting utc=True
(i.e. using an aware Timestamp) makes it work nicely.
- Further reading: Stop using utcnow and utcfromtimestamp
python dataframe convert epoch to readable datetime hour minutes seconds as zero
If use https://www.epochconverter.com/
is added timezone.
If need add timezones to column use Series.dt.tz_localize
and then Series.dt.tz_convert
:
df['period'] = (pd.to_datetime(df['period'], unit='ms')
.dt.tz_localize('GMT')
.dt.tz_convert('Asia/Kathmandu'))
print (df)
period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45
Related Topics
Trying to Mock Datetime.Date.Today(), But Not Working
Imread Returns None, Violating Assertion !_Src.Empty() in Function 'Cvtcolor' Error
Python MySQLdb: Library Not Loaded: Libmysqlclient.18.Dylib
Emulating Bash 'Source' in Python
How to Add an Image in Tkinter
Converting Epoch Time into the Datetime
Python Function Attributes - Uses and Abuses
How to Obtain the Element-Wise Logical Not of a Pandas Series
String Concatenation Without '+' Operator
How to Disable Log Messages from the Requests Library
Understanding Matplotlib.Subplots Python
Convert Utf-8 with Bom to Utf-8 with No Bom in Python
Collapse Multiple Submodules to One Cython Extension
How to Transform an Xml File Using Xslt in Python