Parsing a Date That Can Be in Several Formats in Python

How to format date string via multiple formats in python

Try each format and see if it works:

from datetime import datetime

def try_parsing_date(text):
for fmt in ('%Y-%m-%d', '%d.%m.%Y', '%d/%m/%Y'):
try:
return datetime.strptime(text, fmt)
except ValueError:
pass
raise ValueError('no valid date format found')

Parsing a date that can be in several formats in python

You can use try/except to catch the ValueError that would occur when trying to use a non-matching format. As @Bakuriu mentions, you can stop the iteration when you find a match to avoid the unnecessary parsing, and then define your behavior when my_date doesn't get defined because not matching formats are found:

You can use try/except to catch the ValueError that would occur when trying to use a non-matching format:

from datetime import datetime

DATE_FORMATS = ['%m/%d/%Y %I:%M:%S %p', '%Y/%m/%d %H:%M:%S', '%d/%m/%Y %H:%M', '%m/%d/%Y', '%Y/%m/%d']
test_date = '2012/1/1 12:32:11'

for date_format in DATE_FORMATS:
try:
my_date = datetime.strptime(test_date, date_format)
except ValueError:
pass
else:
break
else:
my_date = None

print my_date # 2012-01-01 12:32:11
print type(my_date) # <type 'datetime.datetime'>

Dealing with different date formats in python

Try it one way, and if it doesn't work, try it the other way.

try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")

Of course, instead of WhateverDateParseException use the actual exception that is raised in your code.

Edit: fixed missing "%S"

Parse Date from string present in multiple formats into datetime format

dateutil's parser can help:

from dateutil import parser

for d in ["20200618", "18-june-2020"]:
print(parser.parse(d))

2020-06-18 00:00:00
2020-06-18 00:00:00

How can I parse multiple (unknown) date formats in python?

import re

ss = '''10/02/09
07/22/09
09-08-2008
9/9/2008
11/4/2010
03-07-2009
09/01/2010'''


regx = re.compile('[-/]')
for xd in ss.splitlines():
m,d,y = regx.split(xd)
print xd,' ','/'.join((m.zfill(2),d.zfill(2),'20'+y.zfill(2) if len(y)==2 else y))

result

10/02/09     10/02/2009
07/22/09 07/22/2009
09-08-2008 09/08/2008
9/9/2008 09/09/2008
11/4/2010 11/04/2010
03-07-2009 03/07/2009
09/01/2010 09/01/2010

Edit 1

And Edit 2 : taking account of the information on '{0:0>2}'.format(day) from JBernardo, I added a 4th solution, that appears to be the fastest

import re
from time import clock
iterat = 100

from datetime import datetime
dates = ['10/02/09', '07/22/09', '09-08-2008', '9/9/2008', '11/4/2010',
' 03-07-2009', '09/01/2010']

reobj = re.compile(
r"""\s* # optional whitespace
(\d+) # Month
[-/] # separator
(\d+) # Day
[-/] # separator
(?:20)? # century (optional)
(\d+) # years (YY)
\s* # optional whitespace""",
re.VERBOSE)

te = clock()
for i in xrange(iterat):
ndates = (reobj.sub(r"\1/\2/20\3", date) for date in dates)
fdates1 = [datetime.strftime(datetime.strptime(date,"%m/%d/%Y"), "%m/%d/%Y")
for date in ndates]
print "Tim's method ",clock()-te,'seconds'



regx = re.compile('[-/]')


te = clock()
for i in xrange(iterat):
ndates = (reobj.match(date).groups() for date in dates)
fdates2 = ['%s/%s/20%s' % tuple(x.zfill(2) for x in tu) for tu in ndates]
print "mixing solution",clock()-te,'seconds'


te = clock()
for i in xrange(iterat):
ndates = (regx.split(date.strip()) for date in dates)
fdates3 = ['/'.join((m.zfill(2),d.zfill(2),('20'+y.zfill(2) if len(y)==2 else y)))
for m,d,y in ndates]
print "eyquem's method",clock()-te,'seconds'



te = clock()
for i in xrange(iterat):
fdates4 = ['{:0>2}/{:0>2}/20{}'.format(*reobj.match(date).groups()) for date in dates]
print "Tim + format ",clock()-te,'seconds'


print fdates1==fdates2==fdates3==fdates4

result

number of iteration's turns : 100
Tim's method 0.295053700959 seconds
mixing solution 0.0459111423379 seconds
eyquem's method 0.0192239516475 seconds
Tim + format 0.0153756971906 seconds
True

The mixing solution is interesting because it combines the speed of my solution and the ability of the regex of Tim Pietzcker to detect dates in a string.

That's still more true for the solution combining Tim's one and the formating with {:0>2}. I cant' combine {:0>2} with mine because regx.split(date.strip()) produces year with 2 OR 4 digits

Parse date string and change format

datetime module could help you with that:

datetime.datetime.strptime(date_string, format1).strftime(format2)

For the specific example you could do

>>> import datetime
>>> datetime.datetime.strptime('Mon Feb 15 2010', '%a %b %d %Y').strftime('%d/%m/%Y')
'15/02/2010'
>>>

Can I parse dates in different formats?

You can use to_datetime:

First format (YYYY-MM-DD):

print (df)
dates
0 13/11/2016
1 21/01/2017
2 22/01/2017
3 2017-02-02
4 2016-12-11
5 13/11/2016
6 2016-12-12
7 21/01/2017
8 22/01/2017
9 2017-02-02
9 2017-02-25 <- YYYY-MM-DD

dates = pd.to_datetime(df.dates)
print (dates)
0 2016-11-13
1 2017-01-21
2 2017-01-22
3 2017-02-02
4 2016-12-11
5 2016-11-13
6 2016-12-12
7 2017-01-21
8 2017-01-22
9 2017-02-02
9 2017-02-25
Name: dates, dtype: datetime64[ns]

Second format (YYYY-DD-MM)

It is a bit problematic - need parameter format and errors='coerce' in to_datetime, last combine_first or fillna:

print (df)
dates
0 13/11/2016
1 21/01/2017
2 22/01/2017
3 2017-02-02
4 2016-12-11
5 13/11/2016
6 2016-12-12
7 21/01/2017
8 22/01/2017
9 2017-02-02
9 2017-25-02 <- YYYY-DD-MM

dates1 = pd.to_datetime(df.dates, format='%d/%m/%Y', errors='coerce')
dates2 = pd.to_datetime(df.dates, format='%Y-%d-%m', errors='coerce')

dates = dates1.combine_first(dates2)
#dates = dates1.fillna(dates2)
print (dates)
0 2016-11-13
1 2017-01-21
2 2017-01-22
3 2017-02-02
4 2016-11-12
5 2016-11-13
6 2016-12-12
7 2017-01-21
8 2017-01-22
9 2017-02-02
9 2017-02-25
Name: dates, dtype: datetime64[ns]

parse multiple date format pandas

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01

then we re-assign to your dataframe.



Related Topics



Leave a reply



Submit