Add correct century to dates with year provided as Year without century, %y
1) chron. chron uses 30 by default so this will convert them converting first to Date (since chron can't read those sorts of dates) reformatting to character with two digit years into a format that chron can understand and finally back to Date.
library(chron)
xx <- c("01AUG11", "01AUG12", "01AUG13") # sample data
as.Date(chron(format(as.Date(xx, "%d%b%y"), "%m/%d/%y")))
That gives a cutoff of 30 but we can get a cutoff of 13 using chron's chron.year.expand
option:
library(chron)
options(chron.year.expand =
function (y, cut.off = 12, century = c(1900, 2000), ...) {
chron:::year.expand(y, cut.off = cut.off, century = century, ...)
}
)
and then repeating the original conversion. For example assuming we had run this options statement already we would get the following with our xx
:
> as.Date(chron(format(as.Date(xx, "%d%b%y"), "%m/%d/%y")))
[1] "2011-08-01" "2012-08-01" "1913-08-01"
2) Date only. Here is an alternative that does not use chron. You might want to replace "2012-12-31"
with Sys.Date()
if the idea is that otherwise future dates are really to be set 100 years back:
d <- as.Date(xx, "%d%b%y")
as.Date(ifelse(d > "2012-12-31", format(d, "19%y-%m-%d"), format(d)))
EDIT: added Date only solution.
Adding the Century to 2-Digit Year
Try
df$date <- as.Date(with(df, paste(1900+YR, MO, DA,sep="-")), "%Y-%m-%d")
Define year for two digits year date format
We can create a function to do this
library(lubridate)
f1 <- function(x, year=1970){
x <- dmy(x)
m <- year(x) %% 100
year(x) <- ifelse(m > year %% 100, 2000+m, 1900+m)
x
}
f1(character_date)
#[1] "1944-01-19"
If this always have 19
as prefix for year
dmy(sub("-(\\d+)", "-19\\1", character_date))
#[1] "1944-01-19"
posixct time not understanding the '60s
Two-digit years are ambiguous. You can add a "19" using regex then parse with %Y
instead of %y
library(tidyverse)
discharge %>%
rownames_to_column(var="date") %>%
as_tibble() %>%
mutate(date = strptime(sub("^(\\d+/\\d+/)(\\d+)$", "\\119\\2", date),
format = "%m/%d/%Y"))
#> # A tibble: 261 x 2
#> date Original
#> <dttm> <int>
#> 1 1963-04-01 00:00:00 1100
#> 2 1963-05-01 00:00:00 1030
#> 3 1963-06-01 00:00:00 982
#> 4 1963-07-01 00:00:00 703
#> 5 1963-08-01 00:00:00 587
#> 6 1963-09-01 00:00:00 512
#> 7 1963-10-01 00:00:00 606
#> 8 1963-11-01 00:00:00 667
#> 9 1963-12-01 00:00:00 1010
#> 10 1964-01-01 00:00:00 1400
#> # ... with 251 more rows
Created on 2022-04-28 by the reprex package (v2.0.1)
R Lubridate Returns Unwanted Century When Given Two Digit Year
Lubridate v1.7.1 does not have this issue.
How to display the correct date century in Pandas?
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
convert character format to date format in r lubridate, leading year 20 not 19
I think you have to add "19" yourself, unless you want to use hydrostats::four.digit.year
:
hydrostats::four.digit.year(dmy("4/11/64"), year=1900)
(the function is only a few lines long, so you could just copy it if you didn't want to depend on the package)
function (x, year = 1968) {
n <- as.numeric(strftime(x, format = "%y"))%%100
Y <- ifelse(n > year%%100, 1900 + n, 2000 + n)
return(Y)
}
Otherwise, you're stuck with the POSIX standard. "%y"
is the standard tag for converting two- (or one-) digit years, and from ?strptime
:
‘%y’ Year without century (00-99). On input, values 00 to 68 are
prefixed by 20 and 69 to 99 by 19 - that is the behaviour
specified by the 2018 POSIX standard, but it does also say
‘it is expected that in a future version the default century
inferred from a 2-digit year will change’.
The standard itself is available here:
If century is not specified, then values in the range [69,99] shall refer to years 1969 to 1999 inclusive, and values in the range [00,68] shall refer to years 2000 to 2068 inclusive.
See also: why strptime for two digit year for 69 returns 1969 in python?
date format in R
It doesn't look (from the documentation for %y
in ?strptime
) like there's any obvious option for changing the default century inferred from 2-digit years.
Since the objects returned by strptime()
have class POSIXlt, though, it's a pretty simple matter to subtract 100 years from any dates after today (or after any other cutoff date you'd like to use).
# Use strptime() to create object of class POSIXlt
dd <- c("20-Sep-90", "24-Feb-05", "16-Aug-65",
"19-Nov-56", "28-Nov-59", "19-Apr-86")
DD <- strptime(dd, '%d-%b-%y')
# Subtract 100 years from any date after today
DD$year <- ifelse(DD > Sys.time(), DD$year-100, DD$year)
DD
[1] "1990-09-20" "2005-02-24" "1965-08-16" "1956-11-19" "1959-11-28" "1986-04-19"
date import, incorrect century
Dates are stored internal as integer days, so there is only such formatting at the time of input or output. As for input without century information I think you are out of luck. Here's what ?strptime says about the %y format spec: "On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’."
as.Date( "01/01/64", "%m/%d/%y", origin="1970-01-01") -100*365.25
#[1] "1964-01-01"
It might be possible to start a bar fight about programmers who allow removal of century information given that Y2K is so recent in the past.
Since the default is to assume year 00-68 is 2000-2068, it is certainly possible to create an as.Dateshift
as.Date with two-digit years
x = format(as.Date("10.10.61", "%d.%m.%y"), "19%y-%m-%d")
x = as.Date(x)
x
class(x)
Related Topics
How to Convert Posix Date to Day of Year
Ggplot2 Geom_Bar - How to Keep Order of Data.Frame
Replace/Translate Characters in a String
Convert Comma Separated String to Numeric Columns
Programming With Dplyr Using String as Input
Using Functions of Multiple Columns in a Dplyr Mutate_At Call
Offline Install of R Package and Dependencies
Number of Months Between Two Dates
How Does Predict.Lm() Compute Confidence Interval and Prediction Interval
Create Sequence of Repeated Values, in Sequence
Create New Dummy Variable Columns from Categorical Variable
How to Change the Default Time Zone in R
Adding a New Column to Each Element in a List of Tables or Data Frames