How to Convert Columns into One Datetime Column in Pandas

How to convert columns into one datetime column in pandas?

In 0.13 (coming very soon), this is heavily optimized and quite fast (but still pretty fast in 0.12); both orders of magnitude faster than looping

In [3]: df
Out[3]:
M D Y Apples Oranges
0 5 6 1990 12 3
1 5 7 1990 14 4
2 5 8 1990 15 34
3 5 9 1990 23 21

In [4]: df.dtypes
Out[4]:
M int64
D int64
Y int64
Apples int64
Oranges int64
dtype: object

# in 0.12, use this
In [5]: pd.to_datetime((df.Y*10000+df.M*100+df.D).apply(str),format='%Y%m%d')

# in 0.13 the above or this will work
In [5]: pd.to_datetime(df.Y*10000+df.M*100+df.D,format='%Y%m%d')
Out[5]:
0 1990-05-06 00:00:00
1 1990-05-07 00:00:00
2 1990-05-08 00:00:00
3 1990-05-09 00:00:00
dtype: datetime64[ns]

Pandas convert two separate columns into a single datetime column?

If converting week of year is necesary define day of week by %w:

%w - Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.

#for Sundays is set value to 0
s = df['WEEK_OF_YEAR'].astype(str) + '-0-' + df['YEAR'].astype(str)
df['date'] = pd.to_datetime(s, format='%W-%w-%Y')
print (df)
WEEK_OF_YEAR YEAR date
0 1 2016 2016-01-10
1 2 2016 2016-01-17
2 52 2016 2017-01-01
3 1 2017 2017-01-08
4 2 2017 2017-01-15
5 3 2017 2017-01-22
6 52 2017 2017-12-31
7 1 2018 2018-01-07

Convert Pandas Column to DateTime

Use the to_datetime function, specifying a format to match your data.

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

change multiple columns in pandas dataframe to datetime

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')

Combine Date and Time columns using pandas

It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv using parse_dates=[['Date', 'Time']].

Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime, which works without specifying the format= parameter

In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object

In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]

Alternatively, without the + ' ', but the format= parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime, however, specifying the exact format is faster.

pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')

Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise argument).

%%timeit

# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)

%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Combine columns to one date time object Python

Solution

df['datetime_col'] = pd.to_datetime(df['date'] + ' ' + df['time'])

With details

>>> df = pd.DataFrame({'date':['2020-04-14', '2020-04-14'], 'time':['06:03', '09:03']})
>>> df
date time
0 2020-04-14 06:03
1 2020-04-14 09:03

>>> df['datetime_col'] = pd.to_datetime(df['date'] + ' ' + df['time'])
>>> df
date time datetime_col
0 2020-04-14 06:03 2020-04-14 06:03:00
1 2020-04-14 09:03 2020-04-14 09:03:00

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 2 non-null object
1 time 2 non-null object
2 datetime_col 2 non-null datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 176.0+ bytes

Combine year, month and day in Python to create a date

Solution

You could use datetime.datetime along with .apply().

import datetime

d = datetime.datetime(2020, 5, 17)
date = d.date()

For pandas.to_datetime(df)

It looks like your code is fine. See pandas.to_datetime documentation and How to convert columns into one datetime column in pandas?.

df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])

Output:

0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

What if your YEAR, MONTH and DAY columns have different headers?

Let's say your YEAR, MONTH and DAY columns are labeled as yy, mm and dd respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.

import pandas as pd

df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)

Output:

0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

Python pandas - join date & time columns into datetime column with timezone

Upon calling read_csv, set dayfirst=True so that the date is parsed correctly. Floor to minutes using dt.floor:

data = pd.read_csv(f'{data_path}/{symbol}.csv', parse_dates=[['Date','Time']], dayfirst=True)

data = data.set_index(data['Date_Time'].dt.floor('min')).tz_localize('Asia/Kolkata')

# need to drop col used as index separately here:
data = data.drop(['Date_Time'], axis=1)


Related Topics



Leave a reply



Submit