Python/Regex - How to Extract Date from Filename Using Regular Expression

Extract date from file name with import re in python

It looks to me like the regex you're using is also at fault, and so it fails when trying to group(0) from the empty return.

Assuming all your dates are stored as digits the following regex i've made seems to work quite well.

(?!.+_)\d+(?=\.xlsx)

The next issue is when formatting the date it experiences an issue with the way you're formatting the date, to me it looks like 12112019 would be the 12/11/2019 obviously this could also be the 11/12/2019 but the basic is that we change the way strftime formats the date.

So for the date / month / year format we would use

# %d%m%Y
event_date_obj = datetime.strptime(event_date, '%d%m%Y')

And we would simply swap %d and %m for the month / date / year format. So your complete code would look something like this:

date = os.path.basename(xls)
pattern = "(?!.+_)\d+(?=\.xlsx)"
event_date = re.search(pattern, date).group(0)
event_date_obj = datetime.strptime (event_date, '%d%m%Y')

For further information on how to use strftime see https://strftime.org/.

How to extract string from a filename containing date using regex in python?

You can use re.search in your if statement:

test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', flow_id)
if test:
your_match = test[1]

test[0] is the whole string

test[1] is the first parentheses

test[2] is the date (as a string)

--

Edit

def get_clean_flow_id(filename):
test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', filename)
if test:
return test[1]
else:
return None

Returns:

>>> get_clean_flow_id("Livongo_Weekly_Enrollment_2019_03_19.csv")
'Livongo_Weekly_Enrollment'
>>> get_clean_flow_id("Omada_weekly_fle_20190319120301.csv")
>>> get_clean_flow_id("tivity_weekly_fle_20190319120301.json")
>>>

Regex to extract date and time from string

You can use email utils to parse date and then convert in the format you wish:

from email import utils
date = utils.parsedate_to_datetime('Thu Jun 07 01:13:25 +0000 2018')

date.strftime('%d/%b/%Y')
date.strftime('%H:%M:%S')

How to extract the date from the file name using python

You can get datetime data like this:

In [1]: import csv

In [2]: import datetime

In [3]: with open('del.csv') as f:
...: csv_reader = csv.reader(f, delimiter=" ")
...: next(csv_reader)
...: for row in csv_reader:
...: dt = datetime.datetime.strptime('-'.join(row[:2]), '%Y_%m_%d-%H:%M:%S')
...: print(dt)

2012-01-01 00:02:14
2012-01-01 00:08:29
2012-01-01 00:14:45

del.csv: Name of your csv file

dt: a Python datetime object



Related Topics



Leave a reply



Submit