How to convert columns into one datetime column in pandas?
In 0.13 (coming very soon), this is heavily optimized and quite fast (but still pretty fast in 0.12); both orders of magnitude faster than looping
In [3]: df
Out[3]:
M D Y Apples Oranges
0 5 6 1990 12 3
1 5 7 1990 14 4
2 5 8 1990 15 34
3 5 9 1990 23 21
In [4]: df.dtypes
Out[4]:
M int64
D int64
Y int64
Apples int64
Oranges int64
dtype: object
# in 0.12, use this
In [5]: pd.to_datetime((df.Y*10000+df.M*100+df.D).apply(str),format='%Y%m%d')
# in 0.13 the above or this will work
In [5]: pd.to_datetime(df.Y*10000+df.M*100+df.D,format='%Y%m%d')
Out[5]:
0 1990-05-06 00:00:00
1 1990-05-07 00:00:00
2 1990-05-08 00:00:00
3 1990-05-09 00:00:00
dtype: datetime64[ns]
Pandas convert two separate columns into a single datetime column?
If converting week of year is necesary define day of week by %w
:
%w - Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.
#for Sundays is set value to 0
s = df['WEEK_OF_YEAR'].astype(str) + '-0-' + df['YEAR'].astype(str)
df['date'] = pd.to_datetime(s, format='%W-%w-%Y')
print (df)
WEEK_OF_YEAR YEAR date
0 1 2016 2016-01-10
1 2 2016 2016-01-17
2 52 2016 2017-01-01
3 1 2017 2017-01-08
4 2 2017 2017-01-15
5 3 2017 2017-01-22
6 52 2017 2017-12-31
7 1 2018 2018-01-07
Convert Pandas Column to DateTime
Use the to_datetime
function, specifying a format to match your data.
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
change multiple columns in pandas dataframe to datetime
You can use apply
to iterate through each column using pd.to_datetime
data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')
As part of the changes in pandas 1.3.0, iloc
/loc
will no longer update the column dtype on assignment. Use column labels directly instead:
cols = data.columns[7:12]
data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')
Combine Date and Time columns using pandas
It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime
, which works without specifying the format=
parameter
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Alternatively, without the + ' '
, but the format=
parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime
, however, specifying the exact format is faster.
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
%%timeit
# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)
%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Combine columns to one date time object Python
Solution
df['datetime_col'] = pd.to_datetime(df['date'] + ' ' + df['time'])
With details
>>> df = pd.DataFrame({'date':['2020-04-14', '2020-04-14'], 'time':['06:03', '09:03']})
>>> df
date time
0 2020-04-14 06:03
1 2020-04-14 09:03
>>> df['datetime_col'] = pd.to_datetime(df['date'] + ' ' + df['time'])
>>> df
date time datetime_col
0 2020-04-14 06:03 2020-04-14 06:03:00
1 2020-04-14 09:03 2020-04-14 09:03:00
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 2 non-null object
1 time 2 non-null object
2 datetime_col 2 non-null datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 176.0+ bytes
Combine year, month and day in Python to create a date
Solution
You could use datetime.datetime
along with .apply()
.
import datetime
d = datetime.datetime(2020, 5, 17)
date = d.date()
For pandas.to_datetime(df)
It looks like your code is fine. See pandas.to_datetime
documentation and How to convert columns into one datetime column in pandas?.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
What if your YEAR, MONTH and DAY columns have different headers?
Let's say your YEAR, MONTH and DAY columns are labeled as yy
, mm
and dd
respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.
import pandas as pd
df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
Python pandas - join date & time columns into datetime column with timezone
Upon calling read_csv
, set dayfirst=True
so that the date is parsed correctly. Floor to minutes using dt.floor
:
data = pd.read_csv(f'{data_path}/{symbol}.csv', parse_dates=[['Date','Time']], dayfirst=True)
data = data.set_index(data['Date_Time'].dt.floor('min')).tz_localize('Asia/Kolkata')
# need to drop col used as index separately here:
data = data.drop(['Date_Time'], axis=1)
Related Topics
Is There a Matplotlib Equivalent of Matlab's Datacursormode
Convert a List of Characters into a String
Get Lat/Long Given Current Point, Distance and Bearing
How to Redirect Stdout and Stderr to Logger in Python
What Is the Most Pythonic Way to Pop a Random Element from a List
How to Call a Shell Script from Python Code
Flask SQLalchemy Query, Specify Column Names
Pandas: Conditional Rolling Count
How to Put Multiple Statements in One Line
Does a File Object Automatically Close When Its Reference Count Hits Zero
How to Print Bold Text in Python
Pip Installing in Global Site-Packages Instead of Virtualenv
Cannot Find Vcvarsall.Bat When Running a Python Script