Pandas To_Datetime Parsing Wrong Year

pandas to_datetime parsing wrong year

That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 - 69:

datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)

datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)

Two digits year ambiguity

So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900

The %y two digits can only go from 00 to 99 which is going to be ambiguous if we start crossing centuries.

If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)

I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 - 1968 (instead of 2017 - 2068).

If you have overlap then you can't process with insufficient year information, unless e.g. you have some ordered data and a reference

If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as '16', that's ambiguous and there isn't sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.

Pandas to_datetime changes year unexpectedly

Use:

df['date'] = pd.to_datetime(df['date'].str[:-2] + '19' + df['date'].str[-2:])

Another solution with replace:

df['date'] = pd.to_datetime(df['date'].str.replace(r'-(\d+)$', r'-19\1'))

Sample:

print (df)
       date
0  01-06-70
1  01-06-69
2  01-06-68
3  01-06-67

df['date'] = pd.to_datetime(df['date'].str.replace(r'-(\d+)$', r'-19\1'))
print (df)
        date
0 1970-01-06
1 1969-01-06
2 1968-01-06
3 1967-01-06

When converting into datetime why is the result parsing wrong year and month using pandas?

You can add origin parameter to to_datetime:

df1['a_final']=pd.to_datetime(df1['a'],unit='D',origin='1899-12-30').dt.strftime("%d/%m/%Y")
print (df1)
           a     a_final
0      44140  05/11/2020
1      44266  11/03/2021
2      44266  11/03/2021
3      44265  10/03/2021
4      44265  10/03/2021
39640  44143  08/11/2020
39641  44109  05/10/2020
39642  44232  05/02/2021
39643  44125  21/10/2020
39644  44222  26/01/2021

pandas to_datetime converting 71 to 2071 instead of 1971

The year column is very ambiguous since a century isn't declared Python's behavior will interpret the dates as such. You can read the reasoning here.

There is a partial solution found here. You would basically offset the years by 100 (a century) to fix this issue. This will be a janky fix. You would want to implement this after getting your second dataframe.

import pandas as pd
import numpy as np

df['Date'] = np.where(df['Date'].dt.year > 2022, df['Date'] - pd.offsets.DateOffset(years=100), df['Date'])
# Anything after 2022 is changed to have 100 years subtracted because 2022 is the current year, change it as the years progress

Pandas pandas.to_datetime(), incorrect parsing

Your format string is wrong:

"%Y%M%d"

%M means minutes which is why your month defaulted to 1 and you have minutes in your datetimes.

Use:

"%Y%m%d"

See the docs for the correct format specifiers

pd.to_datetime errors = 'ignore' strange behavior

If errors is set to ignore, then invalid parsing will return the input. So in your case the input is result["Action"](The entire column).

The solution to this problem is to apply pd.to_datetime rowwise with errors='ignore'. By doing so you will get the same row back if the row does not follow the format.

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'Action': ['Tuesday November 30 2021', 'Appointment time clicked']})
>>> df
                     Action
0  Tuesday November 30 2021
1  Appointment time clicked
>>>
>>> def custom(action):
...     date_time = pd.to_datetime(action, format='%A %B %d %Y', errors='ignore')
...     return date_time
...
>>> df.Action = df.Action.apply(custom)
>>> df
                     Action
0       2021-11-30 00:00:00
1  Appointment time clicked

Pandas Converting to Datetime, dateutilparser error

There are some bad values in time column like 84, so use errors='coerce' for convert them to NaT.

df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')

Pandas - Datetime format change to '%m/%d/%Y'

The reason you have to use errors="ignore" is because not all the dates you are parsing are in the correct format. If you use errors="coerce" like @phi has mentioned then any dates that cannot be converted will be set to NaT. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT as you want.

Example

A dataframe with one item in Date not written as Year/Month/Day (25th Month is wrong):

>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
      ID        Date
0  91060  2017/11/10
1  91061  2022/05/01
2  91062  2022/04/01
3  91063  2055/25/25

>>> df.dtypes
ID       int64
Date    object
dtype: object

Using errors="ignore":

>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
      ID        Date
0  91060  2017/11/10
1  91061  2022/05/01
2  91062  2022/04/01
3  91063  2055/25/25

>>> df.dtypes
ID       int64
Date    object
dtype: object

Column Date is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y") will result in the AttributeError

Using errors="coerce":

>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
      ID       Date
0  91060 2017-11-10
1  91061 2022-05-01
2  91062 2022-04-01
3  91063        NaT

>>> df.dtypes
ID               int64
Date    datetime64[ns]
dtype: object

Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:

>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
      ID        Date
0  91060  11/10/2017
1  91061  05/01/2022
2  91062  04/01/2022
3  91063         NaN

Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.

Pandas To_Datetime Parsing Wrong Year