Date-Time Differences Between Rows in R

Date-time differences between rows in R

Try this (I am assuming that you have your data in a data.frame called mydf) and that you want the difference between the first time stamp and all subsequent timestamps:

c_time <- as.POSIXlt( mydf$c_time )
difftime( c_time[1] , c_time[2:length(c_time)] )
#Time differences in secs
#[1] -59.886 -120.373
#attr(,"tzone")
#[1] ""

Edit

But in case you want the delta difference between subsequent timestamps you need to reverse your obsevations (because the first way round you get time1 - time2 which will be negative), so you can just use instead:

c_time <- rev( c_time )
difftime(c_time[1:(length(c_time)-1)] , c_time[2:length(c_time)])
#Time differences in secs
#[1] 60.487 59.886
#attr(,"tzone")
#[1] ""

Time difference between rows in R dplyr, different units

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))

Convert the time difference to a numeric value. You can use units argument to make the return values consistent.

How to calculate time difference in consecutive rows

When you just add default = strptime(v_time, "%d/%m/%Y %H:%M")[1] to the lag part:

df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
diff_secs = as.numeric(diff, units = 'secs'))

you get the result you expect:

> df
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25. 2. 0 0.
2 David 1/2/2018 16:21 25. 5. 540 540.
3 Jack 1/2/2018 16:07 35. 3. 0 0.
4 Jack 1/2/2018 16:09 160. 1. 120 120.
5 Jack 1/2/2018 16:32 85. 5. 1380 1380.
6 Jack 1/2/2018 16:55 6. 2. 1380 1380.
7 Kate 1/2/2018 16:16 3. 3. 0 0.
8 Kate 1/2/2018 16:33 639. 3. 1020 1020.

Another option is to use difftime:

df <- df %>%
arrange(visitor, v_time) %>%
group_by(visitor) %>%
mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
diff_secs = as.numeric(diff, units = 'secs'))

now the diff-column is in minutes and the diff_sec-column is in seconds:

> df
# A tibble: 8 x 6
# Groups: visitor [3]
visitor v_time payment items diff diff_secs
<fct> <fct> <dbl> <dbl> <time> <dbl>
1 David 1/2/2018 16:12 25. 2. 0 0.
2 David 1/2/2018 16:21 25. 5. 9 540.
3 Jack 1/2/2018 16:07 35. 3. 0 0.
4 Jack 1/2/2018 16:09 160. 1. 2 120.
5 Jack 1/2/2018 16:32 85. 5. 23 1380.
6 Jack 1/2/2018 16:55 6. 2. 23 1380.
7 Kate 1/2/2018 16:16 3. 3. 0 0.
8 Kate 1/2/2018 16:33 639. 3. 17 1020.

You can now save the result again with write.csv(df,"C:/output.csv", row.names = FALSE)

Calculate the difference in time between two dates and add them to a new column

You need to make some changes in your code.

First and foremost, don't use $ in dplyr pipes. Pipes (%>%) were created to avoid using df$column_name everytime you want to use variable from the dataframe. Using $ can have unintended consequences when grouping the data or using rowwise as you can see in your case.

Secondly, difftime is vectorised so no need of rowwise here.

Finally, if you want time difference in minutes you should change the values to POSIXct type and not dates. Try the following -

library(dplyr)

df <- df %>%
mutate(trip_duration = difftime(as.POSIXct(`end time`),
as.POSIXct(`start time`), units = "mins"))

R Difference in time between rows

You can use lag and difftime (per Hadley):

df %>%
mutate(time = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
group_by(id) %>%
mutate(diff = difftime(time, lag(time)))

# A tibble: 6 x 4
# Groups: id [2]
id start time diff
<dbl> <fct> <dttm> <time>
1 1. 1/31/17 10:00 2017-01-31 10:00:00 <NA>
2 1. 1/31/17 10:02 2017-01-31 10:02:00 2
3 1. 1/31/17 10:45 2017-01-31 10:45:00 43
4 2. 2/10/17 12:00 2017-02-10 12:00:00 <NA>
5 2. 2/10/17 12:20 2017-02-10 12:20:00 20
6 2. 2/11/17 09:40 2017-02-11 09:40:00 1280

How to find time difference between previous and following rows from specific rows

Using fuzzyjoin might be useful here:

library(dplyr)
library(fuzzyjoin)

df_grp <- df %>%
filter(start == "yes") %>%
select(time) %>%
group_by(grp = row_number()) %>%
mutate(begin = time - 5,
end = time + 5)

First we create a data.frame of your initial values with -5 and +5 values:

# A tibble: 2 x 4
time grp begin end
<dbl> <int> <dbl> <dbl>
1 2.82 1 -2.17 7.82
2 16.8 2 11.8 21.8

Next we use a fuzzy_join to attach it to the original data.frame and calculate the differences:

df %>% 
fuzzy_left_join(df_grp,
by = c("time" = "begin", "time" = "end"),
match_fun = list(`>`, `<`)) %>%
group_by(grp) %>%
mutate(diff = time.x - time.y) %>%
ungroup()

This returns

# A tibble: 14 x 8
initiate start time.x time.y grp begin end diff
<int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0 no 2.82 2.82 1 -2.17 7.82 -0.00250
2 0 no 2.82 2.82 1 -2.17 7.82 -0.00125
3 1 yes 2.82 2.82 1 -2.17 7.82 0
4 1 no 2.83 2.82 1 -2.17 7.82 0.00125
5 1 no 2.83 2.82 1 -2.17 7.82 0.00200
6 1 no 2.83 2.82 1 -2.17 7.82 0.00225
7 0 no 16.8 16.8 2 11.8 21.8 -0.0137
8 0 no 16.8 16.8 2 11.8 21.8 -0.0112
9 0 no 16.8 16.8 2 11.8 21.8 -0.00120
10 1 yes 16.8 16.8 2 11.8 21.8 0
11 1 no 16.8 16.8 2 11.8 21.8 0.00380
12 0 no 16.8 16.8 2 11.8 21.8 0.00500
13 1 no 16.8 16.8 2 11.8 21.8 0.00630
14 1 no 16.8 16.8 2 11.8 21.8 0.00880

R calculating time differences in a (layered) long dataset

Using base R (no extra packages):

  1. sort the data, ordering by customer Id, then by timestamp.
  2. calculate the time difference between consecutive rows (using the diff() function), grouping by customer id (tapply() does the grouping).
  3. find the average
  4. squish that into a data.frame.
# 1 sort the data
df$Timestamp <- as.POSIXct(df$Timestamp)
# not debugged
df <- df[order(df$Customer, df$Timestamp),]

# 2 apply a diff.
# if you want to force the time units to seconds, convert
# the timestamp to numeric first.

# without conversion
diffs <- tapply(df$Timestamp, df$Customer, diff)
# ======OR======
# convert to seconds
diffs <- tapply(as.numeric(df$Timestamp), df$Customer, diff)

# 3 find the averages
diffs.mean <- lapply(diffs, mean)

# 4 squish that into a data.frame
diffs.df <- data.frame(do.call(rbind, diffs.mean))
diffs.df$Customer <- names(diffs.mean)

# 4a tidy up the data.frame names
names(diffs.df)[1] <- "Avg_Interval"
diffs.df

You haven't shown your timestamp strings, but when you need to wrangle them, the lubridate package is your friend.

How to calculate difference in time between variable rows in R?

Here is a solution using data.table:

 work[status %in% c("start", "end"), 
time.diff := ifelse(status == "start",
difftime(shift(dt, fill = NA, type = "lead"), dt, units = "hours"), NA),
by = worker][status == "start", sum(time.diff), worker]

we get:

 worker       V1
1: VOuRp 580.4989
2: u8zw5 540.0453
>

where V1 has the sum of all hours from start-end interval for each worker.

Let's explain it step by step for better understanding.

STEP 1. Select all rows with start or end status:

work.se <- work[status %in% c("start", "end")]

dt worker status
1: 2012-01-04 23:11:31 VOuRp start
2: 2012-01-20 16:27:31 VOuRp end
3: 2012-01-22 15:34:05 VOuRp start
4: 2012-01-31 02:48:01 VOuRp end
5: 2012-01-04 10:24:38 u8zw5 start
6: 2012-01-18 03:53:15 u8zw5 end
7: 2012-01-21 03:48:08 u8zw5 start
8: 2012-01-29 22:22:14 u8zw5 end
>

STEP 2: Create a function for calculating the time differences between the current row and the next one. This function will be invoked inside the data.table object. We use the shift function from the same package:

getDiff <- function(x) {
difftime(shift(x, fill = NA, type = "lead"), x, units = "hours")
}

getDiff computes the time difference from the next record (within the group) and the current one. It assigns NA for the last row because there is no next value. Then we exclude the NA values from the calculation.

STEP 3: Invoke it within the data.table syntax:

work.result <- work.se[, time.diff := ifelse(status == "start", 
getDiff(dt), NA), by = worker]

we get this:

                    dt worker status time.diff
1: 2012-01-04 23:11:31 VOuRp start 377.2667
2: 2012-01-20 16:27:31 VOuRp end NA
3: 2012-01-22 15:34:05 VOuRp start 203.2322
4: 2012-01-31 02:48:01 VOuRp end NA
5: 2012-01-04 10:24:38 u8zw5 start 329.4769
6: 2012-01-18 03:53:15 u8zw5 end NA
7: 2012-01-21 03:48:08 u8zw5 start 210.5683
8: 2012-01-29 22:22:14 u8zw5 end NA

STEP 4: Sum the non-NA values for time.diff column for each worker:

> work.result[status == "start", sum(time.diff), worker]
worker V1
1: VOuRp 580.4989
2: u8zw5 540.0453
>

data.table object can be concatenated via [] appended, therefore it can be consolidated into one single sentence for the last part:

work.se[, time.diff := ifelse(status == "start", 
getDiff(dt), NA), by = worker][status == "start", sum(time.diff), worker]

FINAL: Putting all together into one single sentence:

work[status %in% c("start", "end"), 
time.diff := ifelse(status == "start",
difftime(shift(dt, fill = NA, type = "lead"), dt, units = "hours"), NA),
by = worker][status == "start", sum(time.diff), worker]

Check this link for data.table basic syntax.
I hope this would help, please let us know if it is what you wanted



Related Topics



Leave a reply



Submit