How to Filter Rows Based on Difference in Dates Between Rows in R

How to filter rows based on difference in dates between rows in R?

An alternative that uses slice from dplyr is to define the following recursive function:

library(dplyr)
f <- function(d, ind=1) {
  ind.next <- first(which(difftime(d,d[ind], units="days") > 90))
  if (is.na(ind.next))
    return(ind)
  else
    return(c(ind, f(d,ind.next)))
}

This function operates on the date column starting at ind = 1. It then finds the next index ind.next that is the first index for which the date is greater than 90 days (at least 91 days) from the date indexed by ind. Note that if there is no such ind.next, ind.next==NA and we just return ind. Otherwise, we recursively call f starting at ind.next and return its result concatenated with ind. The end result of this function call are the row indices separated by at least 91 days.

With this function, we can do:

result <- df %>% group_by(id) %>% slice(f(as.Date(date, format="%Y-%m-%d")))
##Source: local data frame [4 x 3]
##Groups: id [2]
##
##     id  var1       date
##  <int> <chr>      <chr>
##1     1     A 2006-01-01
##2     1     C 2006-06-02
##3     1     E 2007-12-01
##4     2     F 2007-04-20

The use of this function assumes that the date column is sorted in ascending order by each id group. If not, we can just sort the dates before slicing. Not sure about the efficiency of this or the dangers of recursive calls in R. Hopefully, David Arenburg or others can comment on this.

As suggested by David Arenburg, it is better to convert date to a Date class first instead of by group:

result <- df %>% mutate(date=as.Date(date, format="%Y-%m-%d")) %>%
                 group_by(id) %>% slice(f(date))
##Source: local data frame [4 x 3]
##Groups: id [2]
##
##     id  var1       date
##  <int> <chr>     <date>
##1     1     A 2006-01-01
##2     1     C 2006-06-02
##3     1     E 2007-12-01
##4     2     F 2007-04-20

How to select rows from a dataset between two dates?

The solution to what you are asking is straightforward, because you can in fact filter on dates and compare dates in multiple columns. Please try the code below and confirm for yourself that this works as you would expect. If this approach does not work on your own dataset, please share more about your data and processing because there is probably an error in your code. (One error I already saw: you can't use select(Date < Surgery_date). You need to use filter).

This is how I would approach your problem. As you can see, the code is very straightforward.

df <- data.frame(
  Name = c(rep('Pierre', 3), rep('Paul', 3)),
  Date = c('2016-03-15', '2017-03-26', '2017-08-09', '2016-07-03', '2016-09-30', '2017-04-12'),
  Measurement = c(5.12, 4.16, 5.08, 5.47, 4.98, 4.51),
  Surgery_date = c(rep('2017-03-21', 3), rep('2017-03-25', 3))
) %>%
  mutate(Surgery_date = ymd(Surgery_date),
         Date = ymd(Date))

df %>%
  filter(Date < Surgery_date)

df %>%
  filter(Date > Surgery_date & Date < (Surgery_date + days(5)))

df %>%
  filter(Date > Surgery_date)

Remove rows based on condition and date difference between different events in R with dplyr

This is also probably not the most clean solution but pivoting to wide format and then back to long works:

library(tidyverse)
library(lubridate)

dat %>%
  separate(name, into = c("name", "gest"), fill = "right") %>%
  pivot_wider(names_from = name, values_from = c(date, gest)) %>%
  mutate(date_BREEDING = if_else((date_GESTATION - date_BREEDING) %in% c(34, 35, 36), NA_Date_, date_BREEDING)) %>%
  pivot_longer(cols = c(date_BREEDING, date_OTHER, date_GESTATION), values_to = "date", values_drop_na = T) %>%
  select(-gest_BREEDING, -gest_OTHER) %>%
  mutate(name = str_sub(name, 6))

The output is:

     id gest_GESTATION name      date      
  <dbl> <chr>          <chr>     <date>    
1    10 NA             BREEDING  2019-05-17
2    10 NA             OTHER     2020-01-01
3    11 POSITIF        BREEDING  2020-07-01
4    11 POSITIF        GESTATION 2020-09-01
5    12 NEGATIF        GESTATION 2020-08-01
6    21 POSITIF        OTHER     2018-06-20
7    21 POSITIF        GESTATION 2018-10-15
8    22 POSITIF        GESTATION 2020-09-11

Which has the additional advantage of saving whether "GESTATION" is positive or negative in a separate variable. If you do not need that and want exactly the desired output specified in your question you can add:

%>%
  mutate(name = if_else(is.na(gest_GESTATION), name, str_c(name, gest_GESTATION, sep = " "))) %>%
  select(-gest_GESTATION)

How to find time difference between previous and following rows from specific rows

Using fuzzyjoin might be useful here:

library(dplyr)
library(fuzzyjoin)

df_grp <- df %>% 
  filter(start == "yes") %>% 
  select(time) %>% 
  group_by(grp = row_number()) %>% 
  mutate(begin = time - 5,
         end = time + 5)

First we create a data.frame of your initial values with -5 and +5 values:

# A tibble: 2 x 4
   time   grp begin   end
  <dbl> <int> <dbl> <dbl>
1  2.82     1 -2.17  7.82
2 16.8      2 11.8  21.8

Next we use a fuzzy_join to attach it to the original data.frame and calculate the differences:

df %>% 
  fuzzy_left_join(df_grp, 
                  by = c("time" = "begin", "time" = "end"),
                  match_fun = list(`>`, `<`)) %>% 
  group_by(grp) %>% 
  mutate(diff = time.x - time.y) %>% 
  ungroup()

This returns

# A tibble: 14 x 8
   initiate start time.x time.y   grp begin   end     diff
      <int> <chr>  <dbl>  <dbl> <int> <dbl> <dbl>    <dbl>
 1        0 no      2.82   2.82     1 -2.17  7.82 -0.00250
 2        0 no      2.82   2.82     1 -2.17  7.82 -0.00125
 3        1 yes     2.82   2.82     1 -2.17  7.82  0      
 4        1 no      2.83   2.82     1 -2.17  7.82  0.00125
 5        1 no      2.83   2.82     1 -2.17  7.82  0.00200
 6        1 no      2.83   2.82     1 -2.17  7.82  0.00225
 7        0 no     16.8   16.8      2 11.8  21.8  -0.0137 
 8        0 no     16.8   16.8      2 11.8  21.8  -0.0112 
 9        0 no     16.8   16.8      2 11.8  21.8  -0.00120
10        1 yes    16.8   16.8      2 11.8  21.8   0      
11        1 no     16.8   16.8      2 11.8  21.8   0.00380
12        0 no     16.8   16.8      2 11.8  21.8   0.00500
13        1 no     16.8   16.8      2 11.8  21.8   0.00630
14        1 no     16.8   16.8      2 11.8  21.8   0.00880

Filtering rows in a data frame based on date column

We can use subset

 subset(df1, as.Date(day) > Sys.Date()-21)

Get rows with same date and month for each year

Convert to date object and then you can filter -

library(dplyr)
library(lubridate)

result <- df %>%
  mutate(Date = mdy(Date)) %>%
  filter(month(Date) == 5 & day(Date) == 2)

In base R -

df$Date <- as.Date(df$Date, '%m-%d-%Y')
result <- subset(df, format(Date, '%m-%d') == '05-02')

#        Date      Name
#1 2010-05-02 Alexander
#3 2011-05-02 Alexander
#5 2018-05-02     Chris

R: How to filter/subset a sequence of dates

you could use subset

Generating your sample data:

temp<-
read.table(text="date     sessions
2014-12-01  1932
2014-12-02  1828
2014-12-03  2349
2014-12-04  8192
2014-12-05  3188
2014-12-06  3277", header=T)

Making sure it's in date format:

temp$date <- as.Date(temp$date, format= "%Y-%m-%d")

temp



 #        date sessions
 # 1 2014-12-01     1932
 # 2 2014-12-02     1828
 # 3 2014-12-03     2349
 # 4 2014-12-04     8192
 # 5 2014-12-05     3188
 # 6 2014-12-06     3277

Using subset :

subset(temp, date> "2014-12-03" & date < "2014-12-05")

which gives:

  #        date sessions
  # 4 2014-12-04     8192

you could also use []:

temp[(temp$date> "2014-12-03" & temp$date < "2014-12-05"),]

How to Filter Rows Based on Difference in Dates Between Rows in R