Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date
Ok, so I found this function in another post:
age <- function(from, to) {
from_lt = as.POSIXlt(from)
to_lt = as.POSIXlt(to)
age = to_lt$year - from_lt$year
ifelse(to_lt$mon < from_lt$mon |
(to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
age - 1, age)
}
It was posted by @Jim saying "The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers".
It is indeed simpler and it does the trick I was looking for. On average, it is actually faster than the arithmetic method (about 75% faster).
mbm <- microbenchmark(
arithmetic = (givendate - birthdate) / 365.25,
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate),
times = 1000
)
mbm
autoplot(mbm)
And at least in my examples it does not make any mistake (and it should not in any example; it's a pretty straightforward function using ifelse
s).
toy_df <- data.frame(
birthdate = birthdate,
givendate = givendate,
arithmetic = as.numeric((givendate - birthdate) / 365.25),
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate)
)
toy_df[, 3:6] <- floor(toy_df[, 3:6])
toy_df
birthdate givendate arithmetic lubridate eeptools age
1 1978-12-30 2015-12-31 37 37 37 37
2 1978-12-31 2015-12-31 36 37 37 37
3 1979-01-01 2015-12-31 36 37 36 36
4 1962-12-30 2015-12-31 53 53 53 53
5 1962-12-31 2015-12-31 52 53 53 53
6 1963-01-01 2015-12-31 52 53 52 52
7 2000-06-16 2050-06-17 50 50 50 50
8 2000-06-17 2050-06-17 49 50 50 50
9 2000-06-18 2050-06-17 49 50 49 49
10 2007-03-18 2008-03-19 1 1 1 1
11 2007-03-19 2008-03-19 1 1 1 1
12 2007-03-20 2008-03-19 0 1 0 0
13 1968-02-29 2015-02-28 46 47 46 46
14 1968-02-29 2015-03-01 47 47 47 47
15 1968-02-29 2015-03-02 47 47 47 47
I do not consider it as a complete solution because I also wanted to have age in months and weeks, and this function is specific for years. I post it here anyway because it solves the problem for the age in years. I will not accept it because:
- I would wait for @Jim to post it as an answer.
- I will wait to see if someone else come up with a complete solution (efficient, accurate and producing age in years, months or weeks as desired).
How to calculate the mean age from the data stated as 2 digit years in R
We could use lubridate
's make_date()
to turn the individual columns into a date column and then calculate the age. I have shown here how you could take care of the missing 19
/20
in birthyear
, but you might need to tweak it for your data.
library(dplyr)
library(lubridate)
mydf |>
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year
)
Output:
ID birthday birthmonth birthyear date age
1 A 12 8 79 1979-08-12 43
2 B 23 10 62 1962-10-23 59
3 C 2 3 66 1966-03-02 56
4 D 20 9 83 1983-09-20 38
And to get the mean
age with summarise
:
mydf |>
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year
) |>
summarise(mean_age = mean(age))
Output:
mean_age
1 49
Update: It can be non-trivial to get the right age calculation (fast), check e.g. Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date
Calculating age using mutate with lubridate functions
We can use do
df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
do(data.frame(.[setdiff(names(.), "age")],
age = ifelse(!is.na(.$age), .$age$year, .$age)))
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA
As the as.period
comes with period
class, we may need S4 methods to extract it
df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
.$age %>%
.@year %>%
mutate(df, age = .)
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA
change a column from birth date to age in r
From the comments of this blog entry, I found the age_calc
function in the eeptools
package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Getting negative ages using lubridate to calculate age from birth date and current date
If you check the output of dmy
function
head(df$DATE_OF_BIRTH)
#[1] "20/10/01" "15/04/88" "16/12/58" "15/10/91" "09/02/66" "02/07/03"
head(dmy(df$DATE_OF_BIRTH))
#[1] "2001-10-20" "1988-04-15" "2058-12-16" "1991-10-15" "2066-02-09" "2003-07-02"
R interprets years 00 - 68 as 2000 - 2068 and 69 - 99 as 1969 - 1999. Hence, 58 is considered as 2058, 66 is considered to 2066 but 88 is 1988.
From ?strptime
%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change
For negative values you can add 100 to them to get equivalent positive values
library(dplyr)
library(lubridate)
df %>%
mutate(age = interval(start = dmy(DATE_OF_BIRTH), end = dmy('01/07/17')) /
duration(num = 1, units = "years"),
age = if_else(age < 0, age + 100, age))
# DATE_OF_BIRTH age
#1 20/10/01 15.706849
#2 15/04/88 29.230137
#3 16/12/58 58.512329
#4 15/10/91 25.728767
#5 09/02/66 51.356164
#6 02/07/03 14.008219
#7 20/08/96 20.876712
#....
To get difference between dates in years, you could also use interval
like this
df %>%
mutate(age = interval(dmy(DATE_OF_BIRTH), dmy('01/07/17')) / years(1),
age = if_else(age < 0, age + 100, age))
Calculate age (with decimal places) in R
Is this OK?
round(as.numeric((Testday - DOB) / 365.25), 2)
#[1] 3.28 11.18 0.92
In a single chain get a max date - n years
We could use a lambda function
aus_livestock$Month |>
max() |>
as_date() |>
(\(x) x - years(6))()
[1] "2012-12-01"
Related Topics
How to Match by Nearest Date from Two Data Frames
Programmatically Creating Markdown Tables in R with Knitr
Remove Empty Documents from Documenttermmatrix in R Topicmodels
Removing Multiple Columns from R Data.Table with Parameter for Columns to Remove
How to Make Variable Bar Widths in Ggplot2 Not Overlap or Gap
Rstudio Shiny Error: There Is No Package Called "Shinydashboard"
Group by and Filter Data Management Using Dplyr
Removing One Tablegrob When Applied to a Box Plot with a Facet_Wrap
Merging Rows with the Same Id Variable
Randomly Insert Nas into Dataframe Proportionaly
How to Clear Only a Few Specific Objects from the Workspace
How to Extract the Fill Colours from a Ggplot Object
How to Get Unsaved Script Tabs
How to Return Number of Decimal Places in R
Protect/Encrypt R Package Code for Distribution