Convert Unix Time to Readable Date in Pandas Dataframe

Convert unix time to readable date in pandas dataframe

These appear to be seconds since epoch.

In [20]: df = DataFrame(data['values'])

In [21]: df.columns = ["date","price"]

In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)

In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')

In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15

In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object

How to convert unix epoch time to datetime with timezone in pandas

  • The question pertains to pandas, the pure python version is Converting unix timestamp string to readable date
  • pandas.Series.dt.tz_localize & pandas.Series.dt.tz_convert are both vectorized functions, which don't require using .apply().
    • The vectorized implementation is 8159 times faster than .apply().
    • The .dt accessor must be used.
  • It may be better to use pd.to_datetime(df['DT'], unit='s', utc=True) and remove .dt.tz_localize('UTC').
import pandas as pd

# test dataframe with 1M rows
df = pd.DataFrame({'DT': [1349720105, 1349806505, 1349892905, 1349979305, 1350065705]})
df['DT'] = pd.to_datetime(df['DT'], unit='s')
df = pd.concat([df]*200000).reset_index(drop=True)

# display(df.head()
DT
2012-10-08 18:15:05
2012-10-09 18:15:05
2012-10-10 18:15:05
2012-10-11 18:15:05
2012-10-12 18:15:05

# convert the column
df['DT'] = df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')

# display(df.head())
DT
2012-10-08 20:15:05+02:00
2012-10-09 20:15:05+02:00
2012-10-10 20:15:05+02:00
2012-10-11 20:15:05+02:00
2012-10-12 20:15:05+02:00

print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DT 1000000 non-null datetime64[ns, Europe/Amsterdam]
dtypes: datetime64[ns, Europe/Amsterdam](1)
memory usage: 7.6 MB

Alternative

  • This option is more concise and localizes to 'UTC' when converting to a datetime dtype with pandas.to_datetime().
df['DT'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
  • The most time consuming aspect of the original implementation from the OP was x['dt'].tz_localize('UTC') within the .apply()
  • The following code runs in about the same amount of time, within a few milliseconds.
df['DT_1'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
df['DT_2'] = pd.to_datetime(df['DT'], unit='s', utc=True).apply(lambda x: x.tz_convert('Europe/Amsterdam'))

%%timeit Testing

  • 1M rows
  • This tests the comparable vectorized version, against the version with .apply() from the OP, where 'DT' has already been converted to a datetime dtype.
%%timeit
df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
[out]:
4.4 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.apply(lambda x: x['DT'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
[out]:
35.9 s ± 572 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Convert Pandas time series: UNIX epoch to datetime

The issue was that the elements were strings, and not ints. Apparently, pd.to_datetime() isn't smart enough to convert from strings to datetime.

My solution was this:

>> val.astype('int').astype("datetime64[s]")
0 2015-08-27 02:51:15
1 2015-08-27 02:56:31
2 2015-08-27 03:20:38
3 2015-08-31 05:25:20
dtype: datetime64[ns]

Convert Unix epoch time to datetime in Pandas

Your problem has to do with converting the values that you've read (looks like seconds after Unix epoch, i.e. January 1, 1970) into datetime objects. The error you are getting is because your times are just a floating-point number, but that is not how you are trying to handle them.

Assuming these are seconds after Unix epoch, you need to create your datetimes using a timedelta from a start point defined as the Unix epoch:

from datetime import datetime, timedelta
start = datetime(1970, 1, 1) # Unix epoch start time
df['datetime'] = df.Time.apply(lambda x: start + timedelta(seconds=x))

The last line creates a new column in your dataframe called 'datetime' and populates it by reading the 'Time' column in as x, and calculating the time x seconds after Unix epoch.

Note: if you want to convert these datetime objects into the time string that you specified, we can do this by creating a new column with strftime():

df['string_time'] = df.datetime.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))

Convert unix time stamp since epoch to date in dataframe

Your times are not https://en.wikipedia.org/wiki/ISO_8601.

You can provide the header when creating the dataframe and apply a transformation to your time column:

import pandas as pd
import datetime

data = [[ 1520942700, 174.10, 174.62, 174.33, 174.50, 169.447085],
[ 1520942640, 174.23, 174.46, 174.23, 174.46, 25.634600],
[ 1520942580, 173.56, 174.60, 173.56, 174.52, 298.726679],
[ 1520942520, 173.50, 174.11, 174.11, 173.55, 672.756311],
[ 1520942460, 174.11, 174.81, 174.80, 174.11, 441.636742]]

# create with headers
df = pd.DataFrame(data,None, ['time', 'low', 'high', 'open', 'close', 'volume'])

# convert to datetime (adapted from https://stackoverflow.com/a/26763810/7505395)
df['time'] = df['time'].apply(lambda x:datetime.datetime.fromtimestamp(x))

print(df)

Output:

                 time     low    high    open   close      volume
0 2018-03-13 13:05:00 174.10 174.62 174.33 174.50 169.447085
1 2018-03-13 13:04:00 174.23 174.46 174.23 174.46 25.634600
2 2018-03-13 13:03:00 173.56 174.60 173.56 174.52 298.726679
3 2018-03-13 13:02:00 173.50 174.11 174.11 173.55 672.756311
4 2018-03-13 13:01:00 174.11 174.81 174.80 174.11 441.636742

pandas datetime to unix timestamp seconds

I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.

pd.to_datetime(1.547559e+09, unit='s', origin='unix') 
# Timestamp('2019-01-15 13:30:00')

Here are some options:

Option 1: integer division

Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.

pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')

Pros:

  • super fast

Cons:

  • makes assumptions about how pandas internally stores dates


Option 2: recommended by pandas

Pandas docs recommend using the following method:

# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])

# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')

[out]:
Int64Index([1547559000], dtype='int64')

Pros:

  • "idiomatic", recommended by the library

Cons:

  • unweildy
  • not as performant as integer division


Option 3: pd.Timestamp

If you have a single date string, you can use pd.Timestamp as shown in the other answer:

pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0

If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:

pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')

Pros:

  • best method for a single datetime string
  • easy to remember

Cons:

  • not as performant as integer division

How to properly convert a UNIX timestamp to pd.Timestamp object via pandas?

Problem is quite simple but not obvious. utcnow() gives you a naive datetime object, meaning that it is not aware of the fact that it represents UTC. Therefor, once you call .timestamp(), Python assumes local time because the datetime object is naive! Thus converts to UTC first before calculating Unix time, adding any UTC offset that your local tz might have.

Solution: construct a datetime object that is aware of UTC. Same goes for fromtimestamp: set UTC as tz !

from datetime import datetime, timezone
import pandas as pd

d = datetime.now(timezone.utc)
timestamp = d.timestamp()

assert datetime.fromtimestamp(timestamp, tz=timezone.utc) == d
assert pd.to_datetime(timestamp, unit="s", utc=True).to_pydatetime() == d

pandas is kind of a different story; naive datetime is treated internally as UTC, so pd.to_datetime(timestamp, unit="s") gives you the UTC timestamp. But the conversion to Python datetime does not take into account that Python will treat it as local time again... Here, keeping it consistent and setting utc=True (i.e. using an aware Timestamp) makes it work nicely.

  • Further reading: Stop using utcnow and utcfromtimestamp

python dataframe convert epoch to readable datetime hour minutes seconds as zero

If use https://www.epochconverter.com/ is added timezone.

If need add timezones to column use Series.dt.tz_localize and then Series.dt.tz_convert:

df['period'] = (pd.to_datetime(df['period'], unit='ms')
.dt.tz_localize('GMT')
.dt.tz_convert('Asia/Kathmandu'))
print (df)
period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45


Related Topics



Leave a reply



Submit