change a column from birth date to age in r
From the comments of this blog entry, I found the age_calc
function in the eeptools
package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Convert date of birth to age
"11-10-1969" (month day year or day month year) is not an unambiguous date format. To get it properly converted you will need to specify the format
argument to as.Date()
Note also that a 4-digit year needs a capital Y in the format string: "%d-%m-%Y" (or "%d/%m/%Y" for /). Sys.Date()
is already a Date object, so you don't need the format argument with the /
s in it.
> as.numeric(Sys.Date() - as.Date("11-10-1969", format="%d-%m-%Y")) / 365.25
#> [1] 52.56674
EDIT: use 365.25 to approximate leap years per Henry's suggestion in comment
Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date
Ok, so I found this function in another post:
age <- function(from, to) {
from_lt = as.POSIXlt(from)
to_lt = as.POSIXlt(to)
age = to_lt$year - from_lt$year
ifelse(to_lt$mon < from_lt$mon |
(to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
age - 1, age)
}
It was posted by @Jim saying "The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers".
It is indeed simpler and it does the trick I was looking for. On average, it is actually faster than the arithmetic method (about 75% faster).
mbm <- microbenchmark(
arithmetic = (givendate - birthdate) / 365.25,
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate),
times = 1000
)
mbm
autoplot(mbm)
And at least in my examples it does not make any mistake (and it should not in any example; it's a pretty straightforward function using ifelse
s).
toy_df <- data.frame(
birthdate = birthdate,
givendate = givendate,
arithmetic = as.numeric((givendate - birthdate) / 365.25),
lubridate = interval(start = birthdate, end = givendate) /
duration(num = 1, units = "years"),
eeptools = age_calc(dob = birthdate, enddate = givendate,
units = "years"),
age = age(from = birthdate, to = givendate)
)
toy_df[, 3:6] <- floor(toy_df[, 3:6])
toy_df
birthdate givendate arithmetic lubridate eeptools age
1 1978-12-30 2015-12-31 37 37 37 37
2 1978-12-31 2015-12-31 36 37 37 37
3 1979-01-01 2015-12-31 36 37 36 36
4 1962-12-30 2015-12-31 53 53 53 53
5 1962-12-31 2015-12-31 52 53 53 53
6 1963-01-01 2015-12-31 52 53 52 52
7 2000-06-16 2050-06-17 50 50 50 50
8 2000-06-17 2050-06-17 49 50 50 50
9 2000-06-18 2050-06-17 49 50 49 49
10 2007-03-18 2008-03-19 1 1 1 1
11 2007-03-19 2008-03-19 1 1 1 1
12 2007-03-20 2008-03-19 0 1 0 0
13 1968-02-29 2015-02-28 46 47 46 46
14 1968-02-29 2015-03-01 47 47 47 47
15 1968-02-29 2015-03-02 47 47 47 47
I do not consider it as a complete solution because I also wanted to have age in months and weeks, and this function is specific for years. I post it here anyway because it solves the problem for the age in years. I will not accept it because:
- I would wait for @Jim to post it as an answer.
- I will wait to see if someone else come up with a complete solution (efficient, accurate and producing age in years, months or weeks as desired).
Calculating age in R from dob
First, copy and paste the function age_calc
from the blog post to which you linked into your R console (or RStudio console) and hit 'Enter' to store it.
The function takes 3 arguments: dob, enddate and units. The dob
argument needs to be of class Date
. Units can be days, months or years. Assuming that you want years, this should add a column age
to your data frame:
P4PA$age <- age_calc(as.Date(P4PA$DDN, "%m/%d/%Y"), units = "years")
P4PA
DDN age
1 4/22/1956 60
2 12/26/1964 52
3 4/16/1963 53
4 1/28/1970 47
5 7/15/1972 44
6 1/18/1956 61
In R, how can I calculate age based on birth date using eeptools?
It looks like eeptools has an age_calc()
function.
your_data <- data.frame(stringsAsFactors=FALSE,
Born = c("1946-05-27", "1979-06-19", "1980-04-18", "1958-06-12",
"1948-03-23", "1973-07-24", "1949-09-15", "1950-03-12",
"1952-04-20", "1950-06-20"),
bioguide = c("A000370", "A000371", "A000367", "A000369", "B001291",
"B000213", "B001281", "B001271", "B001292", "B001293")
)
library(eeptools)
#> Loading required package: ggplot2
your_data$age <- eeptools::age_calc(dob = as.Date(your_data$Born),
enddate = Sys.Date(),
units = 'years')
your_data
#> Born bioguide age
#> 1 1946-05-27 A000370 73.62459
#> 2 1979-06-19 A000371 40.56158
#> 3 1980-04-18 A000367 39.73224
#> 4 1958-06-12 A000369 61.58075
#> 5 1948-03-23 B001291 71.80328
#> 6 1973-07-24 B000213 46.46569
#> 7 1949-09-15 B001281 70.32048
#> 8 1950-03-12 B001271 69.83281
#> 9 1952-04-20 B001292 67.72678
#> 10 1950-06-20 B001293 69.55884
Created on 2020-01-10 by the reprex package (v0.3.0)
More on eeptools here: https://github.com/jknowles/eeptools
Calculate age at first record for each ID
We can use difftime
to get the difference in day
s and divide by 365
library(dplyr)
d %>%
group_by(ID) %>%
mutate(age_first_record = as.numeric(difftime(min(service_date),
dob, unit = 'day')/365)) %>%
ungroup
-output
# A tibble: 4 x 4
ID dob service_date age_first_record
<chr> <date> <date> <dbl>
1 a 2004-04-17 2018-01-01 13.7
2 a 2004-04-17 2019-07-12 13.7
3 b 2009-04-24 2014-12-23 5.67
4 b 2009-04-24 2016-04-27 5.67
Related Topics
How to Efficiently Read the First Character from Each Line of a Text File
R Output Without [1], How to Nicely Format
Solving a System of Nonlinear Equations in R
Check to See If a Value Is Within a Range
R/Gis: How to Subset a Shapefile by a Lat-Long Bounding Box
Replace Missing Values with a Value from Another Column
Function Composition in R (And High Level Functions)
Calculate Average Over Multiple Data Frames
How to Melt R Data.Frame and Plot Group by Bar Plot
Subset() a Factor by Its Number of Observation
Change a Column from Birth Date to Age in R
How to Access Browser Session/Cookies from Within Shiny App
Extracting Zip+CSV File from Attachment W/ Image in Body of Email
How to Speed Up or Vectorize a for Loop
Vectorised Rcpp Random Binomial Draws