Extracting Date from a String in Python

Extracting date from a string in Python

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

import re
from datetime import datetime

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

How to extract date from a string?

Based on your string if you just want 1899-12-30 you could do:

'": "1899-12-30 14:50:00.000"": " "'.split(' ')[1][1:]

if you want the full 1899-12-30 14:50:00.000 you could do

'": "1899-12-30 14:50:00.000"": " "'.split('"')[2]

Explanation:

taking the string we are splitting the string by its characters in the first example a space and in the second a double quote as those characters surround the date element. The split function creates a list in which we access the element that we would like in the first case the second element (0 index list) while the second list we grab the third element. For the first Example printing out the output before we do a slice of the sting would give an extra double quote before the date, therefore we take the first element off of the string to only get the date.

Extract date from a string with a lot of numbers

Although I dont know exactly how your dates are formatted, here's a regex solution that will work with dates separated by '/'. Should work with dates where the months and days are expressed as a single number or if they include a leading zero.

If your dates are separated by hyphens instead, replace the 9th and 18th character of the regex with a hyphen instead of /. (If using the second print statement, replace the 12th and 31st character)

Edit: Added the second print statement with some better regex. That's probably the better way to go.

import re
mystring = r'joasidj9238nlsd93901/01/2021oijweo8939n'
print(re.findall('\d{1,2}\/\d{1,2}\/\d{2,4}', mystring)) # This would probably work in most cases
print(re.findall('[0-1]{0,2}\/[0-3]{0,1}\d{0,1}\/\d{2,4}', mystring)) # This one is probably a better solution. (More protection against weirdness.)

Edit #2: Here's a way to do it with the month name spelled out (in full, or 3-character abbreviation), followed by day, followed by comma, followed by a 2 or 4 digit year.

import re
mystring = r'Jan 1, 2020'
print(re.findall(r'(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d{1,2}\,\s+\d{2,4}',mystring))


Extract date from string in python

With minor tweaks in the aforementioned post, you can get it to work.

import re
from datetime import datetime

text = "Campaign on 01.11.2015"

match = re.search(r'\d{2}.\d{2}.\d{4}', text)
date = datetime.strptime(match.group(), '%d.%m.%Y').date()
print str(date).replace("-", "")
20151101


Related Topics



Leave a reply



Submit