Check If a Date Is Within an Interval in R

Check if a date is within an interval in R

Everybody has their favourite tool for this, mine happens to be data.table because of what it refers to as its dt[i, j, by] logic.

library(data.table)

dt <- data.table(date = as.IDate(pt))

dt[, YR := 0.0 ] # I am using a numeric for year here...

dt[ date >= as.IDate("2002-09-01") & date <= as.IDate("2003-08-31"), YR := 1 ]
dt[ date >= as.IDate("2003-09-01") & date <= as.IDate("2004-08-31"), YR := 2 ]
dt[ date >= as.IDate("2004-09-01") & date <= as.IDate("2005-08-31"), YR := 3 ]

I create a data.table object, converting your times to date for later comparison. I then set up a new column, defaulting to one.

We then execute three conditional statements: for each of the three intervals (which I just create by hand using the endpoints), we set the YR value to 1, 2 or 3.

This does have the desired effect as we can see from

R> print(dt, topn=5, nrows=10)
date YR
1: 2003-06-11 1
2: 2004-08-11 2
3: 2004-06-03 2
4: 2004-01-20 2
5: 2005-02-25 3
---
96: 2002-08-07 0
97: 2004-02-04 2
98: 2006-04-10 0
99: 2005-03-21 3
100: 2003-12-01 2
R> table(dt[, YR])

0 1 2 3
26 31 31 12
R>

One could have done this also simply by computing date differences and truncating down, but it is also nice to be a little explicit at times.

Edit: A more generic form just uses arithmetic on the dates:

R> dt[, YR2 := trunc(as.numeric(difftime(as.Date(date), 
+ as.Date("2001-09-01"),
+ unit="days"))/365.25)]
R> table(dt[, YR2])

0 1 2 3 4 5 6 7 9
7 31 31 12 9 5 1 2 1
R>

This does the job in one line.

How to Check if a Date is Within a List of Intervals in R?

First, note that you can test whether any element from a vector of "negative" dates falls within the "positive" interval like so:

any(dates.neg %within% interval(dates.pos[1], dates.pos[1] + days(2)))
# [1] FALSE

This suggests the following approach using map2 -- or more usefully, map2_lgl:

df.TOTAL <- df.POS %>%
left_join(df.NEG, by = 'ID') %>%
mutate(TIME = interval(DATE, DATE + days(2)),
RESULT = map2_lgl(data, TIME, ~any(.x$DATE %within% .y)))
# # A tibble: 6 x 5
# ID DATE data TIME RESULT
# <chr> <date> <list> <S4: Interval> <lgl>
# 1 ID_1 2018-02-07 <tibble [5 x 1]> 2018-02-07 UTC--2018-02-09 UTC FALSE
# 2 ID_1 2018-02-12 <tibble [5 x 1]> 2018-02-12 UTC--2018-02-14 UTC FALSE
# 3 ID_1 2018-02-13 <tibble [5 x 1]> 2018-02-13 UTC--2018-02-15 UTC FALSE
# 4 ID_1 2018-02-20 <tibble [5 x 1]> 2018-02-20 UTC--2018-02-22 UTC TRUE
# 5 ID_1 2018-02-21 <tibble [5 x 1]> 2018-02-21 UTC--2018-02-23 UTC TRUE
# 6 ID_1 2018-03-18 <tibble [5 x 1]> 2018-03-18 UTC--2018-03-20 UTC FALSE

Thanks to @ubutun for improving the answer.

How to check if several dates lie within a interval using data.table and lubridate?

An alternative:

DT[, new := rowSums(sapply(.SD, between, initial, final)) > 0,
.SDcols = c("d1", "d2", "d3")]
DT
# d1 d2 d3 initial final new
# <Date> <Date> <Date> <Date> <Date> <lgcl>
# 1: 2019-01-01 2019-03-01 2020-01-01 2020-01-01 2020-03-01 TRUE
# 2: 2019-02-02 2022-02-02 2021-02-02 2022-05-05 2023-01-01 FALSE

Make sure you're using data.table::between ... if you have a conflict between that and dplyr::between, the latter will complain (since it requires that its lower/upper bounds are length-1).

This answer is both vectorized and as efficient as one can be with an arbitrary number of columns. That is, it will call between once per column (vectorized), and rowSums only once regardless of the number of columns. (Also, rowSums(.) is generally faster than apply(., 1, any) or similar canonical R methods.)

Checking if a date falls within an interval in a dataset in long format

You can do the following:

library(dplyr)

tests %>%
left_join(meds) %>%
group_by(id) %>%
mutate(received_med_within = between(med_date, test_date[1], test_date[2])) %>%
tidyr::replace_na(list(received_med_within = FALSE)) %>%
dplyr::select(-4)

# A tibble: 6 x 4
# Groups: id [3]
# id test_date test_result received_med_within
# <dbl> <date> <dbl> <lgl>
# 1 1 2000-01-01 1 TRUE
# 2 1 2000-01-07 1 TRUE
# 3 2 2000-02-14 0 FALSE
# 4 2 2000-03-19 1 FALSE
# 5 3 2000-05-14 0 FALSE
# 6 3 2000-09-30 0 FALSE

How to find if a date is within a given time interval

Assuming your data is already converted with lubridate,

input<- df %>%
mutate(start_date=ymd(start_date)) %>%
mutate(end_date=ymd(end_date)) %>%
mutate(a_date=ymd(a_date)) %>%
mutate(b_date=ymd(b_date)) %>%
mutate(c_date=ymd(c_date)) %>%
mutate(Intrvl=interval(start_date, end_date))

you could use the %within% operator in lubridate

result <- input %>%
mutate(AinIntrvl=if_else(a_date %within% Intrvl,"a","")) %>%
mutate(BinIntrvl=if_else(b_date %within% Intrvl,"b","")) %>%
mutate(CinIntrvl=if_else(c_date %within% Intrvl,"c","")) %>%
mutate(Within_Intrvl=paste(AinIntrvl,BinIntrvl,CinIntrvl,sep="_")) %>%
select(-start_date,-end_date,-Intrvl,-a_date,-b_date,-c_date )

You can format the Within_Intrvl column as you like, and well as decide how you want to deal with NAs



Related Topics



Leave a reply



Submit