How to filter rows based on difference in dates between rows in R?
An alternative that uses slice
from dplyr
is to define the following recursive function:
library(dplyr)
f <- function(d, ind=1) {
ind.next <- first(which(difftime(d,d[ind], units="days") > 90))
if (is.na(ind.next))
return(ind)
else
return(c(ind, f(d,ind.next)))
}
This function operates on the date
column starting at ind = 1
. It then finds the next index ind.next
that is the first
index for which the date is greater than 90 days (at least 91 days) from the date indexed by ind
. Note that if there is no such ind.next
, ind.next==NA
and we just return ind
. Otherwise, we recursively call f
starting at ind.next
and return its result concatenated with ind
. The end result of this function call are the row indices separated by at least 91 days.
With this function, we can do:
result <- df %>% group_by(id) %>% slice(f(as.Date(date, format="%Y-%m-%d")))
##Source: local data frame [4 x 3]
##Groups: id [2]
##
## id var1 date
## <int> <chr> <chr>
##1 1 A 2006-01-01
##2 1 C 2006-06-02
##3 1 E 2007-12-01
##4 2 F 2007-04-20
The use of this function assumes that the date
column is sorted in ascending order by each id
group. If not, we can just sort the dates before slicing. Not sure about the efficiency of this or the dangers of recursive calls in R. Hopefully, David Arenburg or others can comment on this.
As suggested by David Arenburg, it is better to convert date
to a Date class first instead of by group:
result <- df %>% mutate(date=as.Date(date, format="%Y-%m-%d")) %>%
group_by(id) %>% slice(f(date))
##Source: local data frame [4 x 3]
##Groups: id [2]
##
## id var1 date
## <int> <chr> <date>
##1 1 A 2006-01-01
##2 1 C 2006-06-02
##3 1 E 2007-12-01
##4 2 F 2007-04-20
How to select rows from a dataset between two dates?
The solution to what you are asking is straightforward, because you can in fact filter on dates and compare dates in multiple columns. Please try the code below and confirm for yourself that this works as you would expect. If this approach does not work on your own dataset, please share more about your data and processing because there is probably an error in your code. (One error I already saw: you can't use select(Date < Surgery_date)
. You need to use filter
).
This is how I would approach your problem. As you can see, the code is very straightforward.
df <- data.frame(
Name = c(rep('Pierre', 3), rep('Paul', 3)),
Date = c('2016-03-15', '2017-03-26', '2017-08-09', '2016-07-03', '2016-09-30', '2017-04-12'),
Measurement = c(5.12, 4.16, 5.08, 5.47, 4.98, 4.51),
Surgery_date = c(rep('2017-03-21', 3), rep('2017-03-25', 3))
) %>%
mutate(Surgery_date = ymd(Surgery_date),
Date = ymd(Date))
df %>%
filter(Date < Surgery_date)
df %>%
filter(Date > Surgery_date & Date < (Surgery_date + days(5)))
df %>%
filter(Date > Surgery_date)
Remove rows based on condition and date difference between different events in R with dplyr
This is also probably not the most clean solution but pivoting to wide format and then back to long works:
library(tidyverse)
library(lubridate)
dat %>%
separate(name, into = c("name", "gest"), fill = "right") %>%
pivot_wider(names_from = name, values_from = c(date, gest)) %>%
mutate(date_BREEDING = if_else((date_GESTATION - date_BREEDING) %in% c(34, 35, 36), NA_Date_, date_BREEDING)) %>%
pivot_longer(cols = c(date_BREEDING, date_OTHER, date_GESTATION), values_to = "date", values_drop_na = T) %>%
select(-gest_BREEDING, -gest_OTHER) %>%
mutate(name = str_sub(name, 6))
The output is:
id gest_GESTATION name date
<dbl> <chr> <chr> <date>
1 10 NA BREEDING 2019-05-17
2 10 NA OTHER 2020-01-01
3 11 POSITIF BREEDING 2020-07-01
4 11 POSITIF GESTATION 2020-09-01
5 12 NEGATIF GESTATION 2020-08-01
6 21 POSITIF OTHER 2018-06-20
7 21 POSITIF GESTATION 2018-10-15
8 22 POSITIF GESTATION 2020-09-11
Which has the additional advantage of saving whether "GESTATION" is positive or negative in a separate variable. If you do not need that and want exactly the desired output specified in your question you can add:
%>%
mutate(name = if_else(is.na(gest_GESTATION), name, str_c(name, gest_GESTATION, sep = " "))) %>%
select(-gest_GESTATION)
How to find time difference between previous and following rows from specific rows
Using fuzzyjoin
might be useful here:
library(dplyr)
library(fuzzyjoin)
df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time + 5)
First we create a data.frame of your initial values with -5
and +5
values:
# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8
Next we use a fuzzy_join
to attach it to the original data.frame and calculate the differences:
df %>%
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()
This returns
# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880
Filtering rows in a data frame based on date column
We can use subset
subset(df1, as.Date(day) > Sys.Date()-21)
Get rows with same date and month for each year
Convert to date object and then you can filter
-
library(dplyr)
library(lubridate)
result <- df %>%
mutate(Date = mdy(Date)) %>%
filter(month(Date) == 5 & day(Date) == 2)
In base R -
df$Date <- as.Date(df$Date, '%m-%d-%Y')
result <- subset(df, format(Date, '%m-%d') == '05-02')
# Date Name
#1 2010-05-02 Alexander
#3 2011-05-02 Alexander
#5 2018-05-02 Chris
R: How to filter/subset a sequence of dates
you could use subset
Generating your sample data:
temp<-
read.table(text="date sessions
2014-12-01 1932
2014-12-02 1828
2014-12-03 2349
2014-12-04 8192
2014-12-05 3188
2014-12-06 3277", header=T)
Making sure it's in date format:
temp$date <- as.Date(temp$date, format= "%Y-%m-%d")
temp
# date sessions
# 1 2014-12-01 1932
# 2 2014-12-02 1828
# 3 2014-12-03 2349
# 4 2014-12-04 8192
# 5 2014-12-05 3188
# 6 2014-12-06 3277
Using subset
:
subset(temp, date> "2014-12-03" & date < "2014-12-05")
which gives:
# date sessions
# 4 2014-12-04 8192
you could also use []
:
temp[(temp$date> "2014-12-03" & temp$date < "2014-12-05"),]
Related Topics
How to Match by Nearest Date from Two Data Frames
Is There a Logical Way to Think About List Indexing
Counting Number of Instances of a Condition Per Row R
Aggregate and Reshape from Long to Wide
How to Determine If Date Is a Weekend or Not (Not Using Lubridate)
How to Define More Line Types for Graphs in R (Custom Linetype)
Split One Row into Multiple Rows
Checking If Date Is Between Two Dates in R
Display Exact Value of a Variable in R
How to Find Out Which Package Version Is Loaded in R
Programmatically Creating Markdown Tables in R with Knitr
Passing Several Arguments to Fun of Lapply (And Others *Apply)
Can Dplyr Join on Multiple Columns or Composite Key