Calculate Difference Between Dates by Group in R

how to calculate time difference between dates by group

if we need to group by 'location'

library(dplyr)
df %>%
group_by(location) %>%
mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))

Calculating difference between dates based on grouping one or more columns

We can use diff to subtract Date and select groups where there is at-least one value which is less than equal to 5 days.

library(dplyr)

df %>%
group_by(id, Buyer) %>%
filter(any(diff(Date) <= 5))

# id Date Buyer
# <dbl> <date> <chr>
# 1 9 2018-11-29 Jenny
# 2 9 2018-11-29 Jenny
# 3 9 2018-11-29 Jenny
# 4 5 2018-05-25 Chunfei
# 5 5 2019-02-13 Chunfei
# 6 5 2019-02-16 Chunfei
# 7 5 2019-02-16 Chunfei
# 8 5 2019-02-23 Chunfei
# 9 5 2019-02-25 Chunfei
#10 8 2019-02-28 Chunfei
#11 8 2019-02-28 Chunfei

After re-reading the question I think you might not be looking to filter entire groups but only those rows which have difference of 5 days. We can get indices which have diff value of less than 5 and select it's previous index as well.

df %>%
group_by(id, Buyer) %>%
mutate(diff = c(NA, diff(Date))) %>%
slice({i1 <- which(diff <= 5); unique(c(i1, i1-1))}) %>%
select(-diff)

# id Date Buyer
# <dbl> <date> <chr>
# 1 5 2019-02-16 Chunfei
# 2 5 2019-02-16 Chunfei
# 3 5 2019-02-25 Chunfei
# 4 5 2019-02-13 Chunfei
# 5 5 2019-02-23 Chunfei
# 6 8 2019-02-28 Chunfei
# 7 8 2019-02-28 Chunfei
# 8 9 2018-11-29 Jenny
# 9 9 2018-11-29 Jenny
#10 9 2018-11-29 Jenny

data

df <- structure(list(id = c(9, 9, 9, 4, 4, 4, 5, 5, 5, 5, 5, 5, 8, 
8), Date = structure(c(17864, 17864, 17864, 17681, 17716, 17760,
17676, 17940, 17943, 17943, 17950, 17952, 17955, 17955), class = "Date"),
Buyer = c("Jenny", "Jenny", "Jenny", "Chang", "Chang", "Chang",
"Chunfei", "Chunfei", "Chunfei", "Chunfei", "Chunfei", "Chunfei",
"Chunfei", "Chunfei")), row.names = c(NA, -14L), class = "data.frame")

r group by date difference with respect to first date

Change Date1 to date class.

df$Date1 = as.Date(df$Date1)

You can subtract with the first value in each Id. This can be done using dplyr.

library(dplyr)
df %>% group_by(Id) %>% mutate(Diff = as.integer(Date1 - first(Date1)))

# Id Date1 Cars Diff
# <int> <date> <int> <int>
# 1 1 2007-04-05 72 0
# 2 2 2014-01-07 12 0
# 3 2 2018-07-09 10 1644
# 4 2 2018-07-09 13 1644
# 5 3 2005-11-19 22 0
# 6 3 2005-11-23 13 4
# 7 4 2010-06-17 38 0
# 8 4 2010-09-23 57 98
# 9 4 2010-09-23 41 98
#10 4 2010-10-04 17 109

data.table

setDT(df)[, Diff := as.integer(Date1 - first(Date1)), Id]

OR base R :

df$diff <- with(df, ave(as.integer(Date1), Id, FUN = function(x) x - x[1]))

Replace 0's to NA if you want output as such.

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))

Calculating the difference of two values within a group and given period (R)

You can do as follows:

df$Date <- as.Date(df$Date, format = "%Y-%m-%d")

df1 <- df %>%
group_by(Country) %>%
mutate(diffValue = Value[Date == as.Date("2021-01-04")] - Value[Date == as.Date("2021-01-02")])

It will give you output like this:

df1
# A tibble: 12 x 4
# Groups: Country [3]
Country Date Value diffValue
<chr> <date> <dbl> <dbl>
1 Germany 2021-01-01 5 3
2 Germany 2021-01-02 6 3
3 Germany 2021-01-03 7 3
4 Germany 2021-01-04 9 3
5 USA 2021-01-01 1 2
6 USA 2021-01-02 3 2
7 USA 2021-01-03 4 2
8 USA 2021-01-04 5 2
9 Canada 2021-01-01 0 10
10 Canada 2021-01-02 5 10
11 Canada 2021-01-03 10 10
12 Canada 2021-01-04 15 10

P.S: I've hardcoded the dates in the code, to comply with your question.

EDIT
In order to get the nearest date corresponding to what you are looking for, you can use the birk library. There's a function called which.closest. It will give you the nearest value.

The code looks like below:

df1 <- df %>%
group_by(Country) %>%
mutate(diffValue = Value[Date == as.Date("2021-01-04")] -
Value[Date == as.Date(Date[which.closest(df$Date, as.Date("2020-12-31"))])])

And output:

# A tibble: 12 x 4
# Groups: Country [3]
Country Date Value diffValue
<chr> <date> <dbl> <dbl>
1 Germany 2021-01-01 5 4
2 Germany 2021-01-02 6 4
3 Germany 2021-01-03 7 4
4 Germany 2021-01-04 9 4
5 USA 2021-01-01 1 4
6 USA 2021-01-02 3 4
7 USA 2021-01-03 4 4
8 USA 2021-01-04 5 4
9 Canada 2021-01-01 0 15
10 Canada 2021-01-02 5 15
11 Canada 2021-01-03 10 15
12 Canada 2021-01-04 15 15

In the above example, I have checked the nearest date in the second part and not the first. You can use the same syntax there as well.

Calculate the difference in time between two dates and add them to a new column

You need to make some changes in your code.

First and foremost, don't use $ in dplyr pipes. Pipes (%>%) were created to avoid using df$column_name everytime you want to use variable from the dataframe. Using $ can have unintended consequences when grouping the data or using rowwise as you can see in your case.

Secondly, difftime is vectorised so no need of rowwise here.

Finally, if you want time difference in minutes you should change the values to POSIXct type and not dates. Try the following -

library(dplyr)

df <- df %>%
mutate(trip_duration = difftime(as.POSIXct(`end time`),
as.POSIXct(`start time`), units = "mins"))

Calculate difference b/w dates by group in R - unsolved

It works with just dplyr loaded.

Change summarize to dplyr::summarize to make it unambiguous. I would suggest not using plyr as you can do everything with dplyr and tidyverse.



Related Topics



Leave a reply



Submit