Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

Ok, so I found this function in another post:

age <- function(from, to) {
from_lt = as.POSIXlt(from)
to_lt = as.POSIXlt(to)

age = to_lt$year - from_lt$year

ifelse(to_lt$mon < from_lt$mon |
(to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
age - 1, age)

It was posted by @Jim saying "The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers".

It is indeed simpler and it does the trick I was looking for. On average, it is actually faster than the arithmetic method (about 75% faster).

mbm <- microbenchmark(
arithmetic = (givendate - birthdate) / 365.25,
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate),
times = 1000

And at least in my examples it does not make any mistake (and it should not in any example; it's a pretty straightforward function using ifelses).

toy_df <- data.frame(
birthdate = birthdate,
givendate = givendate,
arithmetic = as.numeric((givendate - birthdate) / 365.25),
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate)
toy_df[, 3:6] <- floor(toy_df[, 3:6])

birthdate givendate arithmetic lubridate eeptools age
1 1978-12-30 2015-12-31 37 37 37 37
2 1978-12-31 2015-12-31 36 37 37 37
3 1979-01-01 2015-12-31 36 37 36 36
4 1962-12-30 2015-12-31 53 53 53 53
5 1962-12-31 2015-12-31 52 53 53 53
6 1963-01-01 2015-12-31 52 53 52 52
7 2000-06-16 2050-06-17 50 50 50 50
8 2000-06-17 2050-06-17 49 50 50 50
9 2000-06-18 2050-06-17 49 50 49 49
10 2007-03-18 2008-03-19 1 1 1 1
11 2007-03-19 2008-03-19 1 1 1 1
12 2007-03-20 2008-03-19 0 1 0 0
13 1968-02-29 2015-02-28 46 47 46 46
14 1968-02-29 2015-03-01 47 47 47 47
15 1968-02-29 2015-03-02 47 47 47 47

I do not consider it as a complete solution because I also wanted to have age in months and weeks, and this function is specific for years. I post it here anyway because it solves the problem for the age in years. I will not accept it because:

  1. I would wait for @Jim to post it as an answer.
  2. I will wait to see if someone else come up with a complete solution (efficient, accurate and producing age in years, months or weeks as desired).

How to calculate the mean age from the data stated as 2 digit years in R

We could use lubridate's make_date() to turn the individual columns into a date column and then calculate the age. I have shown here how you could take care of the missing 19/20 in birthyear, but you might need to tweak it for your data.


mydf |>
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year


  ID birthday birthmonth birthyear       date age
1 A 12 8 79 1979-08-12 43
2 B 23 10 62 1962-10-23 59
3 C 2 3 66 1966-03-02 56
4 D 20 9 83 1983-09-20 38

And to get the mean age with summarise:

mydf |> 
mutate(date = make_date(if_else(birthyear > 21, birthyear+1900, birthyear), birthmonth, birthday),
age = as.period(interval(date, today()))$year
) |>
summarise(mean_age = mean(age))


1 49

Update: It can be non-trivial to get the right age calculation (fast), check e.g. Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date

Calculating age using mutate with lubridate functions

We can use do

df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
do(data.frame(.[setdiff(names(.), "age")],
age = ifelse(!is.na(.$age), .$age$year, .$age)))
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA

As the as.period comes with period class, we may need S4 methods to extract it

df %>% 
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
.$age %>%
.@year %>%
mutate(df, age = .)
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA

change a column from birth date to age in r

From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.

x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months

[1] 46.73333 224.83118

age_calc(x[1],x[2], units = "years") # but you can set it to years

[1] 3.893151 18.731507

floor(age_calc(x[1],x[2], units = "years"))

[1] 3 18

For your data

yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))

assuming you want age in integer years.

Getting negative ages using lubridate to calculate age from birth date and current date

If you check the output of dmy function

#[1] "20/10/01" "15/04/88" "16/12/58" "15/10/91" "09/02/66" "02/07/03"

#[1] "2001-10-20" "1988-04-15" "2058-12-16" "1991-10-15" "2066-02-09" "2003-07-02"

R interprets years 00 - 68 as 2000 - 2068 and 69 - 99 as 1969 - 1999. Hence, 58 is considered as 2058, 66 is considered to 2066 but 88 is 1988.

From ?strptime

Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change

For negative values you can add 100 to them to get equivalent positive values


df %>%
mutate(age = interval(start = dmy(DATE_OF_BIRTH), end = dmy('01/07/17')) /
duration(num = 1, units = "years"),
age = if_else(age < 0, age + 100, age))

#1 20/10/01 15.706849
#2 15/04/88 29.230137
#3 16/12/58 58.512329
#4 15/10/91 25.728767
#5 09/02/66 51.356164
#6 02/07/03 14.008219
#7 20/08/96 20.876712

To get difference between dates in years, you could also use interval like this

df %>%
mutate(age = interval(dmy(DATE_OF_BIRTH), dmy('01/07/17')) / years(1),
age = if_else(age < 0, age + 100, age))

Calculate age (with decimal places) in R

Is this OK?

round(as.numeric((Testday - DOB) / 365.25), 2)
#[1] 3.28 11.18 0.92

In a single chain get a max date - n years

We could use a lambda function

aus_livestock$Month |>
max() |>
as_date() |>
(\(x) x - years(6))()
[1] "2012-12-01"

