How to convert variable with mixed date formats to one format?
You may try parse_date_time
in package lubridate
which "allows the user to specify several format-orders to handle heterogeneous date-time character representations" using the orders
argument. Something like...
library(lubridate)
parse_date_time(x = df$date,
orders = c("d m y", "d B Y", "m/d/y"),
locale = "eng")
...should be able to handle most of your formats. Please note that b
/B
formats are locale
sensitive.
Other date-time formats which can be used in orders
are listed in the Details section in ?strptime
.
Fixing mixed date formats in data frame?
The parse_date_time
function in lubridate
allows you to parse vectors with heterogeneous formats using the "orders" argument:
require(lubridate)
x <- c("November 2, 2014", "13 August, 2014")
parse_date_time(x, orders = c("mdy", "dmy"))
[1] "2014-11-02 UTC" "2014-08-13 UTC"
convert mixed date formats in a data frame
If you know that you only have those two formats, you can first identify them (the first one has alphabetic characters in it, the other does not) and convert accordingly.
dateDF$date <- as.POSIXlt(
dateDF$date,
format = ifelse(
grepl("[a-z]", d$date),
"%H:%M %d-%b-%y",
"%d/%m/%Y %H:%M"
)
)
How do I change all dates into the same format in R (if columns has bunch of different character formats)?
Try one format and then the other for the values that returned NA
.
y <- as.Date(input)
na <- is.na(y)
y[na] <- as.Date(input[na], "%m/%d/%Y")
y
#[1] "2019-01-22" "2019-04-17" "2009-05-17" "2010-05-17" "2015-05-17"
#[6] "2019-07-30"
How to format multiple date formats into single date in python
In an ideal world, you know the format of your inputs.
Where this is not possible, I recommend you use a 3rd party library for mixed format dates.
Two libraries that come to mind are dateutil
(via dateutil.parser.parse
) and pandas
(via pandas.to_datetime
). Below is an example implementation with the former.
Note the only occasion when parser.parse
was unsuccessful had to be covered with a manual conversion via datetime.strptime
. datetime
is part of the standard Python library.
from datetime import datetime
from dateutil import parser
list1 = ["30-4-1994", "1994-30-04", "30/04/1994",
"30-apr-1994", "30/apr/1994","1994-30-apr"]
def converter(lst):
for i in lst:
try:
yield parser.parse(i)
except ValueError:
try:
yield parser.parse(i, dayfirst=True)
except ValueError:
try:
yield datetime.strptime(i, '%Y-%d-%b')
except:
yield i
res = list(converter(list1))
# [datetime.datetime(1994, 4, 30, 0, 0),
# datetime.datetime(1994, 4, 30, 0, 0),
# datetime.datetime(1994, 4, 30, 0, 0),
# datetime.datetime(1994, 4, 30, 0, 0),
# datetime.datetime(1994, 4, 30, 0, 0),
# datetime.datetime(1994, 4, 30, 0, 0)]
You can then format into strings any way you like using datetime.strptime
:
res_str = [i.strftime('%d-%m-%Y') for i in res]
# ['30-04-1994',
# '30-04-1994',
# '30-04-1994',
# '30-04-1994',
# '30-04-1994',
# '30-04-1994']
Date Field Mixed (Date and Text) Formatting
If you have "1190729 " in A2 then the formula
=DATE(2000+(MID(A2,2,2)),(MID(A2,4,2)),(RIGHT(TRIM(A2),2)))
will produce the date 29th July 2019 in your current date format
Related Topics
How to Add a Diagonal Line to a Plot
Gsub a Every Element After a Keyword in R
Change the Class from Factor to Numeric of Many Columns in a Data Frame
Convert Dataframe Column to 1 or 0 for "True"/"False" Values and Assign to Dataframe
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
How to Deal With "Package 'Xxx' Is Not Available (For R Version X.Y.Z)" Warning
R - Getting Characters After Symbol
Calculate Row Means on Subset of Columns
Duplicate Columns in Spark Dataframe
How to Change Y Axis Limits in Decimal Points in R
Minimum (Or Maximum) Value of Each Row Across Multiple Columns
How to Find the Closest Date to a Given Date
R: Error in Usemethod("Group_By_"):Applied to an Object of Class
Calculate the Area Under a Curve