pandas to_datetime parsing wrong year
That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 - 69:
datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)
datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)
Two digits year ambiguity
So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900
The %y
two digits can only go from 00
to 99
which is going to be ambiguous if we start crossing centuries.
If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)
I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 - 1968 (instead of 2017 - 2068).
If you have overlap then you can't process with insufficient year information, unless e.g. you have some ordered data and a reference
If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as '16', that's ambiguous and there isn't sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.
Pandas to_datetime changes year unexpectedly
Use:
df['date'] = pd.to_datetime(df['date'].str[:-2] + '19' + df['date'].str[-2:])
Another solution with replace
:
df['date'] = pd.to_datetime(df['date'].str.replace(r'-(\d+)$', r'-19\1'))
Sample:
print (df)
date
0 01-06-70
1 01-06-69
2 01-06-68
3 01-06-67
df['date'] = pd.to_datetime(df['date'].str.replace(r'-(\d+)$', r'-19\1'))
print (df)
date
0 1970-01-06
1 1969-01-06
2 1968-01-06
3 1967-01-06
When converting into datetime why is the result parsing wrong year and month using pandas?
You can add origin
parameter to to_datetime
:
df1['a_final']=pd.to_datetime(df1['a'],unit='D',origin='1899-12-30').dt.strftime("%d/%m/%Y")
print (df1)
a a_final
0 44140 05/11/2020
1 44266 11/03/2021
2 44266 11/03/2021
3 44265 10/03/2021
4 44265 10/03/2021
39640 44143 08/11/2020
39641 44109 05/10/2020
39642 44232 05/02/2021
39643 44125 21/10/2020
39644 44222 26/01/2021
pandas to_datetime converting 71 to 2071 instead of 1971
The year column is very ambiguous since a century isn't declared Python's behavior will interpret the dates as such. You can read the reasoning here.
There is a partial solution found here. You would basically offset the years by 100 (a century) to fix this issue. This will be a janky fix. You would want to implement this after getting your second dataframe.
import pandas as pd
import numpy as np
df['Date'] = np.where(df['Date'].dt.year > 2022, df['Date'] - pd.offsets.DateOffset(years=100), df['Date'])
# Anything after 2022 is changed to have 100 years subtracted because 2022 is the current year, change it as the years progress
Pandas pandas.to_datetime(), incorrect parsing
Your format string is wrong:
"%Y%M%d"
%M
means minutes which is why your month defaulted to 1
and you have minutes in your datetimes.
Use:
"%Y%m%d"
See the docs for the correct format specifiers
pd.to_datetime errors = 'ignore' strange behavior
If errors
is set to ignore
, then invalid parsing will return the input. So in your case the input is result["Action"]
(The entire column).
The solution to this problem is to apply pd.to_datetime
rowwise with errors='ignore'
. By doing so you will get the same row back if the row does not follow the format
.
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'Action': ['Tuesday November 30 2021', 'Appointment time clicked']})
>>> df
Action
0 Tuesday November 30 2021
1 Appointment time clicked
>>>
>>> def custom(action):
... date_time = pd.to_datetime(action, format='%A %B %d %Y', errors='ignore')
... return date_time
...
>>> df.Action = df.Action.apply(custom)
>>> df
Action
0 2021-11-30 00:00:00
1 Appointment time clicked
Pandas Converting to Datetime, dateutilparser error
There are some bad values in time
column like 84
, so use errors='coerce'
for convert them to NaT
.
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')
Pandas - Datetime format change to '%m/%d/%Y'
The reason you have to use errors="ignore"
is because not all the dates you are parsing are in the correct format. If you use errors="coerce"
like @phi has mentioned then any dates that cannot be converted will be set to NaT
. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT
as you want.
Example
A dataframe with one item in Date
not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore"
:
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date
is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
will result in the AttributeError
Using errors="coerce"
:
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.
Related Topics
Which Seeds Have to Be Set Where to Realize 100% Reproducibility of Training Results in Tensorflow
Split an Integer into Digits to Compute an Isbn Checksum
Why Doesn't Django's Model.Save() Call Full_Clean()
Python: Tf-Idf-Cosine: to Find Document Similarity
How to Avoid "Permission Denied" When Using Pip with Virtualenv
Duplicate Items in Legend in Matplotlib
How to Use Multiple Requests and Pass Items in Between Them in Scrapy Python
How to Install a Package Inside Virtualenv
Python Selenium: Wait Until Element Is Clickable - Not Working
How to Use 'Cv2.Findcontours' in Different Opencv Versions
Is There a Python Module to Solve Linear Equations
Paging/Scrolling Through Set of 2D Heat Maps in Matplotlib
Python Ungzipping Stream of Bytes
Cannot List Ftp Directory Using Ftplib - But Ftp Client Works
Unicodeencodeerror: 'Ascii' Codec Can't Encode Character '\Xe9' - -When Using Urlib.Request Python3
How to Extract Info Within a #Shadow-Root (Open) Using Selenium Python