How to Convert Strings in a Pandas Data Frame to a 'Date' Data Type

How do I convert strings in a Pandas data frame to a 'date' data type?

Use astype

In [31]: df
Out[31]:
a time
0 1 2013-01-01
1 2 2013-01-02
2 3 2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]:
a time
0 1 2013-01-01 00:00:00
1 2 2013-01-02 00:00:00
2 3 2013-01-03 00:00:00

Convert String Column directly to Date format (not Datetime) in Pandas DataFrame

pandas.DataFrame.apply is essentially a native python for loop.

pandas.to_datetime is a vectorized function, meaning it's meant to operate on sequences/lists/arrays/series by doing the inner loop in C

If we start with a larger dataframe:

import pandas
df = pandas.DataFrame({'a': ['2020-01-02', '2020-01-02'] * 5000})

And then do (in a jupyter notebook)

%%timeit
df['a'].apply(pandas.to_datetime).dt.date

We get a pretty slow result:

1.03 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But if we rearrange just slightly to pass the entire column:

%%timeit
pandas.to_datetime(df['a']).dt.date

We get a much faster result:

6.07 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Convert DataFrame column type from string to datetime

The easiest way is to use to_datetime:

df['col'] = pd.to_datetime(df['col'])

It also offers a dayfirst argument for European times (but beware this isn't strict).

Here it is in action:

In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005-05-23 00:00:00
dtype: datetime64[ns]

You can pass a specific format:

In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005-05-23
dtype: datetime64[ns]

Python / Pandas parse string to date and time

Just use pd.to_datetime():

import pandas as pd

df = pd.DataFrame([
['Fri Oct 19 17:42:31 2018'],
['Fri Oct 19 17:42:31 2018'],
['Fri Oct 19 17:42:31 2018'],
['Fri Oct 19 17:42:31 2018'],
['Fri Oct 19 17:42:31 2018']],
columns=['Date'])

df['Date'] = pd.to_datetime(df['Date'])

Yields:

                 Date
0 2018-10-19 17:42:31
1 2018-10-19 17:42:31
2 2018-10-19 17:42:31
3 2018-10-19 17:42:31
4 2018-10-19 17:42:31

Per @ALollz's comment, you can specify the format to improve performance:

df['Date'] = pd.to_datetime(df['Date'], format='%a %b %d %H:%M:%S %Y')

Convert string to datetime - python dataframe

You just need to specify the format parameter to '%d/%m/%Y' to explicitly tell the date format as commented. Or set dayfirst to True. A datetime object actually has information for year, month, day, and time, so to get just month and year displayed, you'll have to convert back to string:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True).dt.strftime('%Y-%m')

Convert string to date format in python pandas dataframe

I think you need to_datetime, but first remove first 4 and last 4 chars by indexing with str and radd for 2017 year:

df['new'] = pd.to_datetime(df['from'].str[4:-4].radd('2017-'), format='%Y-%d. %b, %H:%M')
print (df)
from new
0 Di, 15. Aug, 21:52 Uhr 2017-08-15 21:52:00
1 Di, 15. Aug, 22:46 Uhr 2017-08-15 22:46:00
2 Di, 15. Aug, 22:46 Uhr 2017-08-15 22:46:00
3 Di, 15. Aug, 21:52 Uhr 2017-08-15 21:52:00
4 Di, 15. Aug, 22:46 Uhr 2017-08-15 22:46:00
5 Di, 15. Aug, 21:52 Uhr 2017-08-15 21:52:00
6 Di, 15. Aug, 22:46 Uhr 2017-08-15 22:46:00
7 Di, 15. Aug, 21:52 Uhr 2017-08-15 21:52:00
8 Di, 15. Aug, 22:46 Uhr 2017-08-15 22:46:00

Last for compare with today date use boolean indexing with date for convert pandas datetimes to python dates:

today_date = pd.datetime.today().date()

df1 = df[df['new'].dt.date == today_date]

How to convert string to datetime format in pandas python?

Use to_datetime. There is no need for a format string since the parser is able to handle it:

In [51]:
pd.to_datetime(df['I_DATE'])

Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]

To access the date/day/time component use the dt accessor:

In [54]:
df['I_DATE'].dt.date

Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object

In [56]:
df['I_DATE'].dt.time

Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object

You can use strings to filter as an example:

In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]

Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09

How can convert string to date which only contains year number?

If you choose to have the datetime format for your columns, it is likely to benefit from it. What you see in the column ("2019-01-01") is a representation of the datetime object. The realquestion here is, why do you need to have a datetime object?

Actually, I don't care about datetime type:

Use a string ('2019'), or preferentially an integer (2019) which will enable you to perform sorting, calculations, etc.

I need the datetime type but I really want to see only the year:

Use style to format your column while retaining the underlying type:

df.style.format({'dates': lambda t: t.strftime('%Y')})

This will allow you to keep the type while having a clean visual format

Python/Pandas convert string to time only

These two lines:

dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]

Can be written as:

dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time


Related Topics



Leave a reply



Submit