how to calculate time difference between dates by group
if we need to group by 'location'
library(dplyr)
df %>%
group_by(location) %>%
mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))
Calculating difference between dates based on grouping one or more columns
We can use diff
to subtract Date
and select groups where there is at-least one value which is less than equal to 5 days.
library(dplyr)
df %>%
group_by(id, Buyer) %>%
filter(any(diff(Date) <= 5))
# id Date Buyer
# <dbl> <date> <chr>
# 1 9 2018-11-29 Jenny
# 2 9 2018-11-29 Jenny
# 3 9 2018-11-29 Jenny
# 4 5 2018-05-25 Chunfei
# 5 5 2019-02-13 Chunfei
# 6 5 2019-02-16 Chunfei
# 7 5 2019-02-16 Chunfei
# 8 5 2019-02-23 Chunfei
# 9 5 2019-02-25 Chunfei
#10 8 2019-02-28 Chunfei
#11 8 2019-02-28 Chunfei
After re-reading the question I think you might not be looking to filter
entire groups but only those rows which have difference of 5 days. We can get indices which have diff
value of less than 5 and select it's previous index as well.
df %>%
group_by(id, Buyer) %>%
mutate(diff = c(NA, diff(Date))) %>%
slice({i1 <- which(diff <= 5); unique(c(i1, i1-1))}) %>%
select(-diff)
# id Date Buyer
# <dbl> <date> <chr>
# 1 5 2019-02-16 Chunfei
# 2 5 2019-02-16 Chunfei
# 3 5 2019-02-25 Chunfei
# 4 5 2019-02-13 Chunfei
# 5 5 2019-02-23 Chunfei
# 6 8 2019-02-28 Chunfei
# 7 8 2019-02-28 Chunfei
# 8 9 2018-11-29 Jenny
# 9 9 2018-11-29 Jenny
#10 9 2018-11-29 Jenny
data
df <- structure(list(id = c(9, 9, 9, 4, 4, 4, 5, 5, 5, 5, 5, 5, 8,
8), Date = structure(c(17864, 17864, 17864, 17681, 17716, 17760,
17676, 17940, 17943, 17943, 17950, 17952, 17955, 17955), class = "Date"),
Buyer = c("Jenny", "Jenny", "Jenny", "Chang", "Chang", "Chang",
"Chunfei", "Chunfei", "Chunfei", "Chunfei", "Chunfei", "Chunfei",
"Chunfei", "Chunfei")), row.names = c(NA, -14L), class = "data.frame")
r group by date difference with respect to first date
Change Date1
to date class.
df$Date1 = as.Date(df$Date1)
You can subtract with the first value in each Id
. This can be done using dplyr
.
library(dplyr)
df %>% group_by(Id) %>% mutate(Diff = as.integer(Date1 - first(Date1)))
# Id Date1 Cars Diff
# <int> <date> <int> <int>
# 1 1 2007-04-05 72 0
# 2 2 2014-01-07 12 0
# 3 2 2018-07-09 10 1644
# 4 2 2018-07-09 13 1644
# 5 3 2005-11-19 22 0
# 6 3 2005-11-23 13 4
# 7 4 2010-06-17 38 0
# 8 4 2010-09-23 57 98
# 9 4 2010-09-23 41 98
#10 4 2010-10-04 17 109
data.table
setDT(df)[, Diff := as.integer(Date1 - first(Date1)), Id]
OR base R :
df$diff <- with(df, ave(as.integer(Date1), Id, FUN = function(x) x - x[1]))
Replace 0's to NA
if you want output as such.
Calculate difference between multiple rows by a group in R
You can use match
to get the corresponding sbd
value at wk
1 and 2.
library(dplyr)
df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34
If you want to add a new column in the dataframe keeping the rows same, use mutate
instead of summarise
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))
Calculating the difference of two values within a group and given period (R)
You can do as follows:
df$Date <- as.Date(df$Date, format = "%Y-%m-%d")
df1 <- df %>%
group_by(Country) %>%
mutate(diffValue = Value[Date == as.Date("2021-01-04")] - Value[Date == as.Date("2021-01-02")])
It will give you output like this:
df1
# A tibble: 12 x 4
# Groups: Country [3]
Country Date Value diffValue
<chr> <date> <dbl> <dbl>
1 Germany 2021-01-01 5 3
2 Germany 2021-01-02 6 3
3 Germany 2021-01-03 7 3
4 Germany 2021-01-04 9 3
5 USA 2021-01-01 1 2
6 USA 2021-01-02 3 2
7 USA 2021-01-03 4 2
8 USA 2021-01-04 5 2
9 Canada 2021-01-01 0 10
10 Canada 2021-01-02 5 10
11 Canada 2021-01-03 10 10
12 Canada 2021-01-04 15 10
P.S: I've hardcoded the dates in the code, to comply with your question.
EDIT
In order to get the nearest date corresponding to what you are looking for, you can use the birk
library. There's a function called which.closest
. It will give you the nearest value.
The code looks like below:
df1 <- df %>%
group_by(Country) %>%
mutate(diffValue = Value[Date == as.Date("2021-01-04")] -
Value[Date == as.Date(Date[which.closest(df$Date, as.Date("2020-12-31"))])])
And output:
# A tibble: 12 x 4
# Groups: Country [3]
Country Date Value diffValue
<chr> <date> <dbl> <dbl>
1 Germany 2021-01-01 5 4
2 Germany 2021-01-02 6 4
3 Germany 2021-01-03 7 4
4 Germany 2021-01-04 9 4
5 USA 2021-01-01 1 4
6 USA 2021-01-02 3 4
7 USA 2021-01-03 4 4
8 USA 2021-01-04 5 4
9 Canada 2021-01-01 0 15
10 Canada 2021-01-02 5 15
11 Canada 2021-01-03 10 15
12 Canada 2021-01-04 15 15
In the above example, I have checked the nearest date in the second part and not the first. You can use the same syntax there as well.
Calculate the difference in time between two dates and add them to a new column
You need to make some changes in your code.
First and foremost, don't use $
in dplyr
pipes. Pipes (%>%
) were created to avoid using df$column_name
everytime you want to use variable from the dataframe. Using $
can have unintended consequences when grouping the data or using rowwise
as you can see in your case.
Secondly, difftime
is vectorised so no need of rowwise
here.
Finally, if you want time difference in minutes you should change the values to POSIXct
type and not dates. Try the following -
library(dplyr)
df <- df %>%
mutate(trip_duration = difftime(as.POSIXct(`end time`),
as.POSIXct(`start time`), units = "mins"))
Calculate difference b/w dates by group in R - unsolved
It works with just dplyr
loaded.
Change summarize
to dplyr::summarize
to make it unambiguous. I would suggest not using plyr
as you can do everything with dplyr
and tidyverse.
Related Topics
How to Pop Up the Graphics Window from Rscript
Convert Numeric Vector to Binary (0/1) Based on Limit
R Bnlearn Eval Inside Function
How to Create a Dropdown List in a Shiny Table Using Datatable When Editing the Table
Choose Specific Number with Probability
Making Sure a Function Does Not Use a Global Variable
Add Months of Zero Demand to Zoo Time Series
R Cmd Check Not Looking for Gcc in Rtools Directory
How to Highlight Area Between Two Lines? Ggplot
Change Values in Row Based on a Column Value R
Twitter Throws Forbidden Error After Entering Twitter API Pin
R Geom_Tile Ggplot2 What Kind of Stat Is Applied
Tls V1.1/Tls V1.2 Support in Rcurl
Format a Vector of Rows in Italic and Red Font in R Dt (Datatable)