Extract Date Text from String

Extracting date from a string in Python

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

import re
from datetime import datetime

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

How to extract dates from a text string?

We could use parse_date from parsedate - it should be able to parse most of the date format, but 2 digit year can be an issue i.e if the '22' should be parsed as 1922 instead of 2022

library(parsedate)
as.Date( parse_date(unlist(str_extract_all(str1, "\\d+/\\d+/\\d+"))))

-output

[1] "2022-08-22" "2022-08-22" "2022-08-07" "2022-08-15"

data

str1 <- c("08/22/22 FC yusubclavio derecho", "22/08/2022 FC yusubclavio derecho", 
"08/07/2022 FC 08/15/2022 yusubclavio derecho")

How to extract date from text string

You are zero-padding the date values so each term has a fixed length and have a fixed prefix so you do not need to use (slow) regular expressions and can just use simple string functions:

SELECT TO_DATE(SUBSTR(value, 6, 10), 'DD-MM-YYYY')
FROM table_name;

(Note: if you still want it as a string, rather than as a date, then just use SUBSTR without wrapping it in TO_DATE.)

For example:

WITH table_name ( value ) AS (
SELECT 'Date-08-01-2021-Trans-1000008-PH.0000-BA-CR-9999.21' FROM DUAL
)
SELECT TO_DATE(SUBSTR(value, 6, 10), 'DD-MM-YYYY') AS date_value
FROM table_name;

Outputs:













DATE_VALUE
08-JAN-21

How to extract dates from string?

If you want to go purely xpath then you could try to fully validate your pattern dd/mm/yyyy in a few steps:1

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[substring(., 3, 1)= '/'][substring(., 6, 1)= '/'][string-length(translate(., '/' , '')) = 8][translate(., '/' , '')*0=0]"),"dd/mm/e"))
  • "<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>" - Create a valid XML-construct.
  • //s - Select s-nodes where:
    • [substring(., 3, 1)= '/'] - There is a forward slash at the 3rd index;
    • [substring(., 6, 1)= '/'] - There is a forward slash at the 6th index;
    • [string-length(translate(., '/' , '')) = 8] - The remainder of the node when we replace the forward slashes is of length eight.
    • [translate(., '/' , '')*0=0] - The remainder of the node when we replace the forward slashes is numeric.

Needless to say that if your string does not hold any other forward slashes but those in the dates you can simplify the above significantly1:

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[contains(., '/')]"),"dd/mm/e"))

Sample Image

Notice that if "dd/mm/yyyy" is recognized by Excel as dates, the returned array of simply using //s would return the numeric equivalent of these dates. If no other numeric values exist in your string you could benefit from that using Microsoft365 functionality1:

=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),TRANSPOSE(TEXT(FILTER(X,ISNUMBER(X)),"dd/mm/e")))

1: Note that you can remove the nested TEXT() function and numberformat your cells to dd/mm/e too.

How to extract date from string

try

select  
split_part(date, ' ', 2) as month
from table_name

.

How to Extract Date from String in Google Sheets

Try

=regexextract(A1,"\w{3} \d{2} \d{4}")*1

then define the format you wish
Sample Image

for a complete column (as B)

=arrayformula(iferror(regexextract(B1:B,"\w{3} \d{2} \d{4}")*1))


Related Topics



Leave a reply



Submit