Calculating the Difference Between Consecutive Rows by Group Using Dplyr

Calculating the difference between consecutive rows by group using dplyr?

Like this:

dat %>% 
  group_by(id) %>% 
  mutate(time.difference = time - lag(time))

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

R Difference between consecutive rows while retaining first row

We should use default = 0.

library(dplyr)

df %>% group_by(id) %>% mutate(rt = time - lag(time, default = 0))

# A tibble: 10 × 3
# Groups:   id [2]
      id   time     rt
   <int>  <dbl>  <dbl>
 1     1  26204  26204
 2     1  46692  20488
 3     1  60268  13576
 4     1  86240  25972
 5     1  91872   5632
 6     2 291242 291242
 7     2 312311  21069
 8     2 333983  21672
 9     2 355122  21139
10     2 364841   9719

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
  group_by(code, tmp) %>%
  summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

#  code  tmp    diff
#  <chr> <chr> <dbl>
#1 abc01 T1    -0.67
#2 abc01 T2     0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1", 
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", 
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83, 
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22, 
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)), 
class = "data.frame", row.names = c(NA, -18L))

Calculate difference between values by group and matched for time

You're nearly there! The key is to convert Tb_Period to an ordered factor, such that PreI is treated as "less than" DayI, which is in turn less than PostI. Once this is established, we can group by each bird and hour, and sort by Tb_Period to ensure that differences are calculated in the correct order:

df <- read.table(text = 'Tb_Period  Hour  Bird_ID  Treatment  meanHourTb
PreI        9      3500       LPS    41.55000
PreI        10     3500       LPS    41.75000       
DayI        9      3500       LPS    40.88182
DayI        10     3500       LPS    41.24000', header = T, stringsAsFactors = F)

df <- df %>% 
  mutate(Tb_Period = factor(Tb_Period, c('PreI', 'DayI', 'PostI'), ordered = T)) %>% 
  group_by(Bird_ID, Hour) %>% 
  mutate(diff = meanHourTb - lag(meanHourTb, 1))

# A tibble: 4 x 6
# Groups:   Bird_ID, Hour [2]
  Tb_Period  Hour Bird_ID Treatment meanHourTb     diff
      <ord> <int>   <int>     <chr>      <dbl>    <dbl>
1      PreI     9    3500       LPS   41.55000       NA
2      PreI    10    3500       LPS   41.75000       NA
3      DayI     9    3500       LPS   40.88182 -0.66818
4      DayI    10    3500       LPS   41.24000 -0.51000

keep first row after calculating difference between rows with dplyr::lag

A possible solution:

library(tidyverse)

df <- structure(list(ind_id = c(1002, 1002, 2340, 2340), wt = c(25, 
15, 30, 52), date = structure(c(6416, 6699, 6285, 7166), class = "Date")), row.names = c(NA, 
-4L), class = "data.frame")

df %>% 
  group_by(ind_id) %>%
  mutate(mass_diff = (wt-lag(wt))) %>% 
  mutate(wt = first(wt)) %>% 
  slice_tail %>% ungroup

#> # A tibble: 2 × 4
#>   ind_id    wt date       mass_diff
#>    <dbl> <dbl> <date>         <dbl>
#> 1   1002    25 1988-05-05       -10
#> 2   2340    30 1989-08-15        22

Divide difference in consecutive rows by number of NA rows in between and reassign fraction to NA rows. R dplyr() mutate() lag()

You could use na.approx from package zoo:

library(dplyr)
library(zoo)

df %>% 
  mutate(diff = na.approx(x) - lag(na.approx(x)))

which gives you

# A tibble: 5 x 3
  date           x  diff
  <date>     <dbl> <dbl>
1 2021-01-01     0    NA
2 2021-01-02    10    10
3 2021-01-02    30    20
4 2021-01-03    NA    15
5 2021-01-04    60    15

With lag(x, default = 0) you can handle the NA at the beginning of your data.frame.

Calculating the Difference Between Consecutive Rows by Group Using Dplyr