Convert Unix Time to Readable Date in Pandas Dataframe

Convert unix time to readable date in pandas dataframe

These appear to be seconds since epoch.

In [20]: df = DataFrame(data['values'])

In [21]: df.columns = ["date","price"]

In [22]: df
Out[22]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date     358  non-null values
price    358  non-null values
dtypes: float64(1), int64(1)

In [23]: df.head()
Out[23]: 
         date  price
0  1349720105  12.08
1  1349806505  12.35
2  1349892905  12.15
3  1349979305  12.19
4  1350065705  12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')

In [26]: df.head()
Out[26]: 
                 date  price
0 2012-10-08 18:15:05  12.08
1 2012-10-09 18:15:05  12.35
2 2012-10-10 18:15:05  12.15
3 2012-10-11 18:15:05  12.19
4 2012-10-12 18:15:05  12.15

In [27]: df.dtypes
Out[27]: 
date     datetime64[ns]
price           float64
dtype: object

How to convert unix epoch time to datetime with timezone in pandas

The question pertains to pandas, the pure python version is Converting unix timestamp string to readable date
pandas.Series.dt.tz_localize & pandas.Series.dt.tz_convert are both vectorized functions, which don't require using .apply().
- The vectorized implementation is 8159 times faster than .apply().
- The .dt accessor must be used.
It may be better to use pd.to_datetime(df['DT'], unit='s', utc=True) and remove .dt.tz_localize('UTC').

import pandas as pd

# test dataframe with 1M rows
df = pd.DataFrame({'DT': [1349720105, 1349806505, 1349892905, 1349979305, 1350065705]})
df['DT'] = pd.to_datetime(df['DT'], unit='s')
df = pd.concat([df]*200000).reset_index(drop=True)

# display(df.head()
                 DT
2012-10-08 18:15:05
2012-10-09 18:15:05
2012-10-10 18:15:05
2012-10-11 18:15:05
2012-10-12 18:15:05

# convert the column
df['DT'] = df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')

# display(df.head())
                       DT
2012-10-08 20:15:05+02:00
2012-10-09 20:15:05+02:00
2012-10-10 20:15:05+02:00
2012-10-11 20:15:05+02:00
2012-10-12 20:15:05+02:00

print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
 #   Column  Non-Null Count    Dtype                           
---  ------  --------------    -----                           
 0   DT      1000000 non-null  datetime64[ns, Europe/Amsterdam]
dtypes: datetime64[ns, Europe/Amsterdam](1)
memory usage: 7.6 MB

Alternative

This option is more concise and localizes to 'UTC' when converting to a datetime dtype with pandas.to_datetime().

df['DT'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')

The most time consuming aspect of the original implementation from the OP was x['dt'].tz_localize('UTC') within the .apply()
The following code runs in about the same amount of time, within a few milliseconds.

df['DT_1'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
df['DT_2'] = pd.to_datetime(df['DT'], unit='s', utc=True).apply(lambda x: x.tz_convert('Europe/Amsterdam'))

`%%timeit` Testing

1M rows
This tests the comparable vectorized version, against the version with .apply() from the OP, where 'DT' has already been converted to a datetime dtype.

%%timeit
df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
[out]:
4.4 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.apply(lambda x: x['DT'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
[out]:
35.9 s ± 572 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Convert Pandas time series: UNIX epoch to datetime

The issue was that the elements were strings, and not ints. Apparently, pd.to_datetime() isn't smart enough to convert from strings to datetime.

My solution was this:

>> val.astype('int').astype("datetime64[s]")
0   2015-08-27 02:51:15
1   2015-08-27 02:56:31
2   2015-08-27 03:20:38
3   2015-08-31 05:25:20
dtype: datetime64[ns]

Convert Unix epoch time to datetime in Pandas

Your problem has to do with converting the values that you've read (looks like seconds after Unix epoch, i.e. January 1, 1970) into datetime objects. The error you are getting is because your times are just a floating-point number, but that is not how you are trying to handle them.

Assuming these are seconds after Unix epoch, you need to create your datetimes using a timedelta from a start point defined as the Unix epoch:

from datetime import datetime, timedelta
start = datetime(1970, 1, 1)  # Unix epoch start time
df['datetime'] = df.Time.apply(lambda x: start + timedelta(seconds=x))

The last line creates a new column in your dataframe called 'datetime' and populates it by reading the 'Time' column in as x, and calculating the time x seconds after Unix epoch.

Note: if you want to convert these datetime objects into the time string that you specified, we can do this by creating a new column with strftime():

df['string_time'] = df.datetime.apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))

Convert unix time stamp since epoch to date in dataframe

Your times are not https://en.wikipedia.org/wiki/ISO_8601.

You can provide the header when creating the dataframe and apply a transformation to your time column:

import pandas as pd
import datetime  

data = [[  1520942700, 174.10, 174.62, 174.33, 174.50, 169.447085],
        [  1520942640, 174.23, 174.46, 174.23, 174.46,  25.634600],
        [  1520942580, 173.56, 174.60, 173.56, 174.52, 298.726679],
        [  1520942520, 173.50, 174.11, 174.11, 173.55, 672.756311],
        [  1520942460, 174.11, 174.81, 174.80, 174.11, 441.636742]]

# create with headers    
df = pd.DataFrame(data,None, ['time', 'low', 'high', 'open', 'close', 'volume']) 

# convert to datetime (adapted from https://stackoverflow.com/a/26763810/7505395)
df['time'] = df['time'].apply(lambda x:datetime.datetime.fromtimestamp(x))  

print(df)

Output:

                 time     low    high    open   close      volume
0 2018-03-13 13:05:00  174.10  174.62  174.33  174.50  169.447085
1 2018-03-13 13:04:00  174.23  174.46  174.23  174.46   25.634600
2 2018-03-13 13:03:00  173.56  174.60  173.56  174.52  298.726679
3 2018-03-13 13:02:00  173.50  174.11  174.11  173.55  672.756311
4 2018-03-13 13:01:00  174.11  174.81  174.80  174.11  441.636742

pandas datetime to unix timestamp seconds

I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.

pd.to_datetime(1.547559e+09, unit='s', origin='unix') 
# Timestamp('2019-01-15 13:30:00')

Here are some options:

Option 1: integer division

Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 10⁹.

pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
# Float64Index([1547559000.0], dtype='float64')

Pros:

super fast

Cons:

makes assumptions about how pandas internally stores dates

Option 2: recommended by pandas

Pandas docs recommend using the following method:

# create test data
dates = pd.to_datetime(['2019-01-15 13:30:00'])

# calculate unix datetime
(dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')

[out]:
Int64Index([1547559000], dtype='int64')

Pros:

"idiomatic", recommended by the library

Cons:

unweildy
not as performant as integer division

Option 3: `pd.Timestamp`

If you have a single date string, you can use pd.Timestamp as shown in the other answer:

pd.Timestamp('2019-01-15 13:30:00').timestamp()
# 1547559000.0

If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:

pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
# Float64Index([1547559000.0], dtype='float64')

Pros:

best method for a single datetime string
easy to remember

Cons:

not as performant as integer division

How to properly convert a UNIX timestamp to pd.Timestamp object via pandas?

Problem is quite simple but not obvious. utcnow() gives you a naive datetime object, meaning that it is not aware of the fact that it represents UTC. Therefor, once you call .timestamp(), Python assumes local time because the datetime object is naive! Thus converts to UTC first before calculating Unix time, adding any UTC offset that your local tz might have.

Solution: construct a datetime object that is aware of UTC. Same goes for fromtimestamp: set UTC as tz !

from datetime import datetime, timezone
import pandas as pd

d = datetime.now(timezone.utc)
timestamp = d.timestamp()

assert datetime.fromtimestamp(timestamp, tz=timezone.utc) == d
assert pd.to_datetime(timestamp, unit="s", utc=True).to_pydatetime() == d

pandas is kind of a different story; naive datetime is treated internally as UTC, so pd.to_datetime(timestamp, unit="s") gives you the UTC timestamp. But the conversion to Python datetime does not take into account that Python will treat it as local time again... Here, keeping it consistent and setting utc=True (i.e. using an aware Timestamp) makes it work nicely.

Further reading: Stop using utcnow and utcfromtimestamp

python dataframe convert epoch to readable datetime hour minutes seconds as zero

If use https://www.epochconverter.com/ is added timezone.

If need add timezones to column use Series.dt.tz_localize and then Series.dt.tz_convert:

df['period'] = (pd.to_datetime(df['period'], unit='ms')
                  .dt.tz_localize('GMT')
                  .dt.tz_convert('Asia/Kathmandu'))
print (df)
                     period
0 2022-05-04 05:45:00+05:45
1 2022-05-03 05:45:00+05:45
2 2022-05-02 05:45:00+05:45
3 2022-05-01 05:45:00+05:45
4 2022-04-30 05:45:00+05:45
5 2022-04-29 05:45:00+05:45
6 2022-04-28 05:45:00+05:45
7 2022-04-27 05:45:00+05:45

Convert Unix Time to Readable Date in Pandas Dataframe