Extracting date from a string in Python
If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:
import re
from datetime import datetime
match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()
Otherwise, if the date is given in an arbitrary form, you can't extract it easily.
How to extract dates from a text string?
We could use parse_date
from parsedate
- it should be able to parse most of the date format, but 2 digit year can be an issue i.e if the '22' should be parsed as 1922 instead of 2022
library(parsedate)
as.Date( parse_date(unlist(str_extract_all(str1, "\\d+/\\d+/\\d+"))))
-output
[1] "2022-08-22" "2022-08-22" "2022-08-07" "2022-08-15"
data
str1 <- c("08/22/22 FC yusubclavio derecho", "22/08/2022 FC yusubclavio derecho",
"08/07/2022 FC 08/15/2022 yusubclavio derecho")
How to extract date from text string
You are zero-padding the date values so each term has a fixed length and have a fixed prefix so you do not need to use (slow) regular expressions and can just use simple string functions:
SELECT TO_DATE(SUBSTR(value, 6, 10), 'DD-MM-YYYY')
FROM table_name;
(Note: if you still want it as a string, rather than as a date, then just use SUBSTR
without wrapping it in TO_DATE
.)
For example:
WITH table_name ( value ) AS (
SELECT 'Date-08-01-2021-Trans-1000008-PH.0000-BA-CR-9999.21' FROM DUAL
)
SELECT TO_DATE(SUBSTR(value, 6, 10), 'DD-MM-YYYY') AS date_value
FROM table_name;
Outputs:
DATE_VALUE 08-JAN-21
How to extract dates from string?
If you want to go purely xpath then you could try to fully validate your pattern
dd/mm/yyyy
in a few steps:1=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[substring(., 3, 1)= '/'][substring(., 6, 1)= '/'][string-length(translate(., '/' , '')) = 8][translate(., '/' , '')*0=0]"),"dd/mm/e"))
"<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>"
- Create a valid XML-construct.//s
- Select s-nodes where:
[substring(., 3, 1)= '/']
- There is a forward slash at the 3rd index;[substring(., 6, 1)= '/']
- There is a forward slash at the 6th index;[string-length(translate(., '/' , '')) = 8]
- The remainder of the node when we replace the forward slashes is of length eight.[translate(., '/' , '')*0=0]
- The remainder of the node when we replace the forward slashes is numeric.Needless to say that if your string does not hold any other forward slashes but those in the dates you can simplify the above significantly1:
=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[contains(., '/')]"),"dd/mm/e"))
Notice that if "dd/mm/yyyy" is recognized by Excel as dates, the returned array of simply using
//s
would return the numeric equivalent of these dates. If no other numeric values exist in your string you could benefit from that using Microsoft365 functionality1:=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),TRANSPOSE(TEXT(FILTER(X,ISNUMBER(X)),"dd/mm/e")))
1: Note that you can remove the nested
TEXT()
function and numberformat your cells todd/mm/e
too.How to extract date from string
try
select
split_part(date, ' ', 2) as month
from table_name.
How to Extract Date from String in Google Sheets
Try
=regexextract(A1,"\w{3} \d{2} \d{4}")*1
then define the format you wish
for a complete column (as B)
=arrayformula(iferror(regexextract(B1:B,"\w{3} \d{2} \d{4}")*1))
Related Topics
R Xml - Combining Parent and Child Nodes(W Same Name) into Data Frame
R "For Loop" Error Messages {}
Scatterplot with Alpha Transparent Histograms in R
Hiding Personal Functions in R
Rmarkdown Directing Output File into a Directory
How to Preserve Transparency in Ggplot2
Extract Text from Two-Column PDF with R
Removal of Constant Columns in R
Adding Simple Legend to Plot in R
Parse String with Additional Characters in Format to Date
Dplyr::Select One Column and Output as Vector
Remove Spacing Around Plotting Area in R
Save All Plots Already Present in the Panel of Rstudio
Code Organisation in R Package Development
Really Fast Word Ngram Vectorization in R
Gbm R Function: Get Variable Importance Separately for Each Class