Efficient and Accurate Age Calculation (In Years, Months, or Weeks) in R Given Birth Date and an Arbitrary Date

Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

Ok, so I found this function in another post:

age <- function(from, to) {
from_lt = as.POSIXlt(from)
to_lt = as.POSIXlt(to)

age = to_lt$year - from_lt$year

ifelse(to_lt$mon < from_lt$mon |
(to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
age - 1, age)
}

It was posted by @Jim saying "The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers".

It is indeed simpler and it does the trick I was looking for. On average, it is actually faster than the arithmetic method (about 75% faster).

mbm <- microbenchmark(
arithmetic = (givendate - birthdate) / 365.25,
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate),
times = 1000
)
mbm
autoplot(mbm)

Sample Image
Sample Image

And at least in my examples it does not make any mistake (and it should not in any example; it's a pretty straightforward function using ifelses).

toy_df <- data.frame(
birthdate = birthdate,
givendate = givendate,
arithmetic = as.numeric((givendate - birthdate) / 365.25),
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate)
)
toy_df[, 3:6] <- floor(toy_df[, 3:6])
toy_df

birthdate givendate arithmetic lubridate eeptools age
1 1978-12-30 2015-12-31 37 37 37 37
2 1978-12-31 2015-12-31 36 37 37 37
3 1979-01-01 2015-12-31 36 37 36 36
4 1962-12-30 2015-12-31 53 53 53 53
5 1962-12-31 2015-12-31 52 53 53 53
6 1963-01-01 2015-12-31 52 53 52 52
7 2000-06-16 2050-06-17 50 50 50 50
8 2000-06-17 2050-06-17 49 50 50 50
9 2000-06-18 2050-06-17 49 50 49 49
10 2007-03-18 2008-03-19 1 1 1 1
11 2007-03-19 2008-03-19 1 1 1 1
12 2007-03-20 2008-03-19 0 1 0 0
13 1968-02-29 2015-02-28 46 47 46 46
14 1968-02-29 2015-03-01 47 47 47 47
15 1968-02-29 2015-03-02 47 47 47 47

I do not consider it as a complete solution because I also wanted to have age in months and weeks, and this function is specific for years. I post it here anyway because it solves the problem for the age in years. I will not accept it because:

  1. I would wait for @Jim to post it as an answer.
  2. I will wait to see if someone else come up with a complete solution (efficient, accurate and producing age in years, months or weeks as desired).

How to calculate the mean age from the data stated as 2 digit years in R

We could use lubridate's make_date() to turn the individual columns into a date column and then calculate the age. I have shown here how you could take care of the missing 19/20 in birthyear, but you might need to tweak it for your data.

library(dplyr)
library(lubridate)

mydf |>
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year
)

Output:

  ID birthday birthmonth birthyear       date age
1 A 12 8 79 1979-08-12 43
2 B 23 10 62 1962-10-23 59
3 C 2 3 66 1966-03-02 56
4 D 20 9 83 1983-09-20 38

And to get the mean age with summarise:

mydf |> 
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year
) |>
summarise(mean_age = mean(age))

Output:

  mean_age
1 49

Update: It can be non-trivial to get the right age calculation (fast), check e.g. Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

Calculating age using mutate with lubridate functions

We can use do

df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
do(data.frame(.[setdiff(names(.), "age")],
age = ifelse(!is.na(.$age), .$age$year, .$age)))
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA

As the as.period comes with period class, we may need S4 methods to extract it

df %>% 
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
.$age %>%
.@year %>%
mutate(df, age = .)
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA

change a column from birth date to age in r

From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.

library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months

[1] 46.73333 224.83118

age_calc(x[1],x[2], units = "years") # but you can set it to years

[1] 3.893151 18.731507

floor(age_calc(x[1],x[2], units = "years"))

[1] 3 18

For your data

yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))

assuming you want age in integer years.

Getting negative ages using lubridate to calculate age from birth date and current date

If you check the output of dmy function

head(df$DATE_OF_BIRTH)
#[1] "20/10/01" "15/04/88" "16/12/58" "15/10/91" "09/02/66" "02/07/03"

head(dmy(df$DATE_OF_BIRTH))
#[1] "2001-10-20" "1988-04-15" "2058-12-16" "1991-10-15" "2066-02-09" "2003-07-02"

R interprets years 00 - 68 as 2000 - 2068 and 69 - 99 as 1969 - 1999. Hence, 58 is considered as 2058, 66 is considered to 2066 but 88 is 1988.

From ?strptime

%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change


For negative values you can add 100 to them to get equivalent positive values

library(dplyr)
library(lubridate)

df %>%
mutate(age = interval(start = dmy(DATE_OF_BIRTH), end = dmy('01/07/17')) /
duration(num = 1, units = "years"),
age = if_else(age < 0, age + 100, age))


# DATE_OF_BIRTH age
#1 20/10/01 15.706849
#2 15/04/88 29.230137
#3 16/12/58 58.512329
#4 15/10/91 25.728767
#5 09/02/66 51.356164
#6 02/07/03 14.008219
#7 20/08/96 20.876712
#....

To get difference between dates in years, you could also use interval like this

df %>%
mutate(age = interval(dmy(DATE_OF_BIRTH), dmy('01/07/17')) / years(1),
age = if_else(age < 0, age + 100, age))

Calculate age (with decimal places) in R

Is this OK?

round(as.numeric((Testday - DOB) / 365.25), 2)
#[1] 3.28 11.18 0.92

In a single chain get a max date - n years

We could use a lambda function

aus_livestock$Month |>
max() |>
as_date() |>
(\(x) x - years(6))()
[1] "2012-12-01"


Related Topics



Leave a reply



Submit