Date-time differences between rows in R
Try this (I am assuming that you have your data in a data.frame
called mydf
) and that you want the difference between the first time stamp and all subsequent timestamps:
c_time <- as.POSIXlt( mydf$c_time )
difftime( c_time[1] , c_time[2:length(c_time)] )
#Time differences in secs
#[1] -59.886 -120.373
#attr(,"tzone")
#[1] ""
Edit
But in case you want the delta difference between subsequent timestamps you need to reverse your obsevations (because the first way round you get time1 - time2 which will be negative), so you can just use instead:
c_time <- rev( c_time )
difftime(c_time[1:(length(c_time)-1)] , c_time[2:length(c_time)])
#Time differences in secs
#[1] 60.487 59.886
#attr(,"tzone")
#[1] ""
Time difference between rows in R dplyr, different units
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))
Convert the time difference to a numeric value. You can use units
argument to make the return values consistent.
How to calculate time difference in consecutive rows
When you just add default = strptime(v_time, "%d/%m/%Y %H:%M")[1]
to the lag
part:
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
diff_secs = as.numeric(diff, units = 'secs'))
you get the result you expect:
> df
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25. 2. 0 0.
2 David 1/2/2018 16:21 25. 5. 540 540.
3 Jack 1/2/2018 16:07 35. 3. 0 0.
4 Jack 1/2/2018 16:09 160. 1. 120 120.
5 Jack 1/2/2018 16:32 85. 5. 1380 1380.
6 Jack 1/2/2018 16:55 6. 2. 1380 1380.
7 Kate 1/2/2018 16:16 3. 3. 0 0.
8 Kate 1/2/2018 16:33 639. 3. 1020 1020.
Another option is to use difftime
:
df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
diff_secs = as.numeric(diff, units = 'secs'))
now the diff
-column is in minutes and the diff_sec
-column is in seconds:
> df
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25. 2. 0 0.
2 David 1/2/2018 16:21 25. 5. 9 540.
3 Jack 1/2/2018 16:07 35. 3. 0 0.
4 Jack 1/2/2018 16:09 160. 1. 2 120.
5 Jack 1/2/2018 16:32 85. 5. 23 1380.
6 Jack 1/2/2018 16:55 6. 2. 23 1380.
7 Kate 1/2/2018 16:16 3. 3. 0 0.
8 Kate 1/2/2018 16:33 639. 3. 17 1020.
You can now save the result again with write.csv(df,"C:/output.csv", row.names = FALSE)
Calculate the difference in time between two dates and add them to a new column
You need to make some changes in your code.
First and foremost, don't use $
in dplyr
pipes. Pipes (%>%
) were created to avoid using df$column_name
everytime you want to use variable from the dataframe. Using $
can have unintended consequences when grouping the data or using rowwise
as you can see in your case.
Secondly, difftime
is vectorised so no need of rowwise
here.
Finally, if you want time difference in minutes you should change the values to POSIXct
type and not dates. Try the following -
library(dplyr)
df <- df %>%
mutate(trip_duration = difftime(as.POSIXct(`end time`),
as.POSIXct(`start time`), units = "mins"))
R Difference in time between rows
You can use lag
and difftime
(per Hadley):
df %>%
mutate(time = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
group_by(id) %>%
mutate(diff = difftime(time, lag(time)))
# A tibble: 6 x 4
# Groups: id [2]
id start time diff
<dbl> <fct> <dttm> <time>
1 1. 1/31/17 10:00 2017-01-31 10:00:00 <NA>
2 1. 1/31/17 10:02 2017-01-31 10:02:00 2
3 1. 1/31/17 10:45 2017-01-31 10:45:00 43
4 2. 2/10/17 12:00 2017-02-10 12:00:00 <NA>
5 2. 2/10/17 12:20 2017-02-10 12:20:00 20
6 2. 2/11/17 09:40 2017-02-11 09:40:00 1280
How to find time difference between previous and following rows from specific rows
Using fuzzyjoin
might be useful here:
library(dplyr)
library(fuzzyjoin)
df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time + 5)
First we create a data.frame of your initial values with -5
and +5
values:
# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8
Next we use a fuzzy_join
to attach it to the original data.frame and calculate the differences:
df %>%
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()
This returns
# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880
R calculating time differences in a (layered) long dataset
Using base R (no extra packages):
- sort the data, ordering by customer Id, then by timestamp.
- calculate the time difference between consecutive rows (using the
diff()
function), grouping by customer id (tapply()
does the grouping). - find the average
- squish that into a
data.frame
.
# 1 sort the data
df$Timestamp <- as.POSIXct(df$Timestamp)
# not debugged
df <- df[order(df$Customer, df$Timestamp),]
# 2 apply a diff.
# if you want to force the time units to seconds, convert
# the timestamp to numeric first.
# without conversion
diffs <- tapply(df$Timestamp, df$Customer, diff)
# ======OR======
# convert to seconds
diffs <- tapply(as.numeric(df$Timestamp), df$Customer, diff)
# 3 find the averages
diffs.mean <- lapply(diffs, mean)
# 4 squish that into a data.frame
diffs.df <- data.frame(do.call(rbind, diffs.mean))
diffs.df$Customer <- names(diffs.mean)
# 4a tidy up the data.frame names
names(diffs.df)[1] <- "Avg_Interval"
diffs.df
You haven't shown your timestamp strings, but when you need to wrangle them, the lubridate
package is your friend.
How to calculate difference in time between variable rows in R?
Here is a solution using data.table
:
work[status %in% c("start", "end"),
time.diff := ifelse(status == "start",
difftime(shift(dt, fill = NA, type = "lead"), dt, units = "hours"), NA),
by = worker][status == "start", sum(time.diff), worker]
we get:
worker V1
1: VOuRp 580.4989
2: u8zw5 540.0453
>
where V1
has the sum of all hours from start-end interval for each worker.
Let's explain it step by step for better understanding.
STEP 1. Select all rows with start
or end
status:
work.se <- work[status %in% c("start", "end")]
dt worker status
1: 2012-01-04 23:11:31 VOuRp start
2: 2012-01-20 16:27:31 VOuRp end
3: 2012-01-22 15:34:05 VOuRp start
4: 2012-01-31 02:48:01 VOuRp end
5: 2012-01-04 10:24:38 u8zw5 start
6: 2012-01-18 03:53:15 u8zw5 end
7: 2012-01-21 03:48:08 u8zw5 start
8: 2012-01-29 22:22:14 u8zw5 end
>
STEP 2: Create a function for calculating the time differences between the current row and the next one. This function will be invoked inside the data.table
object. We use the shift
function from the same package:
getDiff <- function(x) {
difftime(shift(x, fill = NA, type = "lead"), x, units = "hours")
}
getDiff
computes the time difference from the next record (within the group) and the current one. It assigns NA
for the last row because there is no next value. Then we exclude the NA
values from the calculation.
STEP 3: Invoke it within the data.table
syntax:
work.result <- work.se[, time.diff := ifelse(status == "start",
getDiff(dt), NA), by = worker]
we get this:
dt worker status time.diff
1: 2012-01-04 23:11:31 VOuRp start 377.2667
2: 2012-01-20 16:27:31 VOuRp end NA
3: 2012-01-22 15:34:05 VOuRp start 203.2322
4: 2012-01-31 02:48:01 VOuRp end NA
5: 2012-01-04 10:24:38 u8zw5 start 329.4769
6: 2012-01-18 03:53:15 u8zw5 end NA
7: 2012-01-21 03:48:08 u8zw5 start 210.5683
8: 2012-01-29 22:22:14 u8zw5 end NA
STEP 4: Sum the non-NA
values for time.diff
column for each worker:
> work.result[status == "start", sum(time.diff), worker]
worker V1
1: VOuRp 580.4989
2: u8zw5 540.0453
>
data.table
object can be concatenated via []
appended, therefore it can be consolidated into one single sentence for the last part:
work.se[, time.diff := ifelse(status == "start",
getDiff(dt), NA), by = worker][status == "start", sum(time.diff), worker]
FINAL: Putting all together into one single sentence:
work[status %in% c("start", "end"),
time.diff := ifelse(status == "start",
difftime(shift(dt, fill = NA, type = "lead"), dt, units = "hours"), NA),
by = worker][status == "start", sum(time.diff), worker]
Check this link for data.table
basic syntax.
I hope this would help, please let us know if it is what you wanted
Related Topics
Installing Package from a Local .Tar.Gz File on Linux
Mutate Multiple/Consecutive Columns (With Dplyr or Base R)
Counting Occurrence of Particular Letter in Vector of Words in R
Knn in R: 'Train and Class Have Different Lengths'
Update Plot Within Observer Loop in Shiny Application
How to Create a Bar and Line Plot with R Dygraphs
Stacked Bar Chart, Reorder by Total (Sum Up of Values) Instead of Value Ggplot2 + Dplyr
Splitting String Based on Letters Case
Ggplot Legend - Scale_Colour_Manual Not Working
How to Insert Missing Observations on a Data Frame
Predict X Values from Simple Fitting and Annoting It in the Plot
Datatype for Linear Model in R