Parsing a Date That Can Be in Several Formats in Python

How to format date string via multiple formats in python

Try each format and see if it works:

from datetime import datetime

def try_parsing_date(text):
    for fmt in ('%Y-%m-%d', '%d.%m.%Y', '%d/%m/%Y'):
        try:
            return datetime.strptime(text, fmt)
        except ValueError:
            pass
    raise ValueError('no valid date format found')

Parsing a date that can be in several formats in python

You can use try/except to catch the ValueError that would occur when trying to use a non-matching format. As @Bakuriu mentions, you can stop the iteration when you find a match to avoid the unnecessary parsing, and then define your behavior when my_date doesn't get defined because not matching formats are found:

You can use try/except to catch the ValueError that would occur when trying to use a non-matching format:

from datetime import datetime

DATE_FORMATS = ['%m/%d/%Y %I:%M:%S %p', '%Y/%m/%d %H:%M:%S', '%d/%m/%Y %H:%M', '%m/%d/%Y', '%Y/%m/%d']
test_date = '2012/1/1 12:32:11'

for date_format in DATE_FORMATS:
    try:
        my_date = datetime.strptime(test_date, date_format)
    except ValueError:
        pass
    else:
      break
else:
  my_date = None

print my_date # 2012-01-01 12:32:11
print type(my_date) # <type 'datetime.datetime'>

Dealing with different date formats in python

Try it one way, and if it doesn't work, try it the other way.

try:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")

Of course, instead of WhateverDateParseException use the actual exception that is raised in your code.

Edit: fixed missing "%S"

Parse Date from string present in multiple formats into datetime format

dateutil's parser can help:

from dateutil import parser

for d in ["20200618", "18-june-2020"]:
    print(parser.parse(d))
    
2020-06-18 00:00:00
2020-06-18 00:00:00

How can I parse multiple (unknown) date formats in python?

import re

ss = '''10/02/09
07/22/09
09-08-2008
9/9/2008
11/4/2010
03-07-2009
09/01/2010'''


regx = re.compile('[-/]')
for xd in ss.splitlines():
    m,d,y = regx.split(xd)
    print xd,'   ','/'.join((m.zfill(2),d.zfill(2),'20'+y.zfill(2) if len(y)==2 else y))

result

10/02/09     10/02/2009
07/22/09     07/22/2009
09-08-2008     09/08/2008
9/9/2008     09/09/2008
11/4/2010     11/04/2010
03-07-2009     03/07/2009
09/01/2010     09/01/2010

Edit 1

And Edit 2 : taking account of the information on '{0:0>2}'.format(day) from JBernardo, I added a 4th solution, that appears to be the fastest

import re
from time import clock
iterat = 100

from datetime import datetime
dates = ['10/02/09', '07/22/09', '09-08-2008', '9/9/2008', '11/4/2010',
         ' 03-07-2009', '09/01/2010']

reobj = re.compile(
r"""\s*  # optional whitespace
(\d+)    # Month
[-/]     # separator
(\d+)    # Day
[-/]     # separator
(?:20)?  # century (optional)
(\d+)    # years (YY)
\s*      # optional whitespace""",
re.VERBOSE)

te = clock()
for i in xrange(iterat):
    ndates = (reobj.sub(r"\1/\2/20\3", date) for date in dates)
    fdates1 = [datetime.strftime(datetime.strptime(date,"%m/%d/%Y"), "%m/%d/%Y")
               for date in ndates]
print "Tim's method   ",clock()-te,'seconds'



regx = re.compile('[-/]')


te = clock()
for i in xrange(iterat):
    ndates = (reobj.match(date).groups() for date in dates)
    fdates2 = ['%s/%s/20%s' % tuple(x.zfill(2) for x in tu) for tu in ndates]
print "mixing solution",clock()-te,'seconds'


te = clock()
for i in xrange(iterat):
    ndates = (regx.split(date.strip()) for date in dates)
    fdates3 = ['/'.join((m.zfill(2),d.zfill(2),('20'+y.zfill(2) if len(y)==2 else y)))
              for m,d,y in ndates]
print "eyquem's method",clock()-te,'seconds'



te = clock()
for i in xrange(iterat):
    fdates4 = ['{:0>2}/{:0>2}/20{}'.format(*reobj.match(date).groups()) for date in dates]
print "Tim + format   ",clock()-te,'seconds'


print fdates1==fdates2==fdates3==fdates4

result

number of iteration's turns : 100
Tim's method    0.295053700959 seconds
mixing solution 0.0459111423379 seconds
eyquem's method 0.0192239516475 seconds
Tim + format    0.0153756971906 seconds 
True

The mixing solution is interesting because it combines the speed of my solution and the ability of the regex of Tim Pietzcker to detect dates in a string.

That's still more true for the solution combining Tim's one and the formating with {:0>2}. I cant' combine {:0>2} with mine because regx.split(date.strip()) produces year with 2 OR 4 digits

Parse date string and change format

datetime module could help you with that:

datetime.datetime.strptime(date_string, format1).strftime(format2)

For the specific example you could do

>>> import datetime
>>> datetime.datetime.strptime('Mon Feb 15 2010', '%a %b %d %Y').strftime('%d/%m/%Y')
'15/02/2010'
>>>

Can I parse dates in different formats?

You can use to_datetime:

First format (YYYY-MM-DD):

print (df)
        dates
0  13/11/2016
1  21/01/2017
2  22/01/2017
3  2017-02-02
4  2016-12-11
5  13/11/2016
6  2016-12-12
7  21/01/2017
8  22/01/2017
9  2017-02-02
9  2017-02-25 <- YYYY-MM-DD

dates = pd.to_datetime(df.dates)
print (dates)
0   2016-11-13
1   2017-01-21
2   2017-01-22
3   2017-02-02
4   2016-12-11
5   2016-11-13
6   2016-12-12
7   2017-01-21
8   2017-01-22
9   2017-02-02
9   2017-02-25
Name: dates, dtype: datetime64[ns]

Second format (YYYY-DD-MM)

It is a bit problematic - need parameter format and errors='coerce' in to_datetime, last combine_first or fillna:

print (df)
        dates
0  13/11/2016
1  21/01/2017
2  22/01/2017
3  2017-02-02
4  2016-12-11
5  13/11/2016
6  2016-12-12
7  21/01/2017
8  22/01/2017
9  2017-02-02
9  2017-25-02 <- YYYY-DD-MM

dates1 = pd.to_datetime(df.dates, format='%d/%m/%Y', errors='coerce')
dates2 = pd.to_datetime(df.dates, format='%Y-%d-%m', errors='coerce')

dates = dates1.combine_first(dates2)
#dates = dates1.fillna(dates2)
print (dates)
0   2016-11-13
1   2017-01-21
2   2017-01-22
3   2017-02-02
4   2016-11-12
5   2016-11-13
6   2016-12-12
7   2017-01-21
8   2017-01-22
9   2017-02-02
9   2017-02-25
Name: dates, dtype: datetime64[ns]

parse multiple date format pandas

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

         date date_fixed
0  2001-12-25 2001-12-25
1   2002-9-27 2002-09-27
2   2001-2-24 2001-02-24
3    2001-5-3 2001-05-03
4      200510 2005-10-01
5       20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4          NaT
5          NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4   2005-10-01
5   2007-08-01

then we re-assign to your dataframe.