Calculating the Difference Between Consecutive Rows by Group Using Dplyr

Calculating the difference between consecutive rows by group using dplyr?

Like this:

dat %>% 
group_by(id) %>%
mutate(time.difference = time - lag(time))

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

R Difference between consecutive rows while retaining first row

We should use default = 0.

library(dplyr)

df %>% group_by(id) %>% mutate(rt = time - lag(time, default = 0))

# A tibble: 10 × 3
# Groups: id [2]
id time rt
<int> <dbl> <dbl>
1 1 26204 26204
2 1 46692 20488
3 1 60268 13576
4 1 86240 25972
5 1 91872 5632
6 2 291242 291242
7 2 312311 21069
8 2 333983 21672
9 2 355122 21139
10 2 364841 9719

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))

Calculate difference between values by group and matched for time

You're nearly there! The key is to convert Tb_Period to an ordered factor, such that PreI is treated as "less than" DayI, which is in turn less than PostI. Once this is established, we can group by each bird and hour, and sort by Tb_Period to ensure that differences are calculated in the correct order:

df <- read.table(text = 'Tb_Period  Hour  Bird_ID  Treatment  meanHourTb
PreI 9 3500 LPS 41.55000
PreI 10 3500 LPS 41.75000
DayI 9 3500 LPS 40.88182
DayI 10 3500 LPS 41.24000', header = T, stringsAsFactors = F)

df <- df %>%
mutate(Tb_Period = factor(Tb_Period, c('PreI', 'DayI', 'PostI'), ordered = T)) %>%
group_by(Bird_ID, Hour) %>%
mutate(diff = meanHourTb - lag(meanHourTb, 1))

# A tibble: 4 x 6
# Groups: Bird_ID, Hour [2]
Tb_Period Hour Bird_ID Treatment meanHourTb diff
<ord> <int> <int> <chr> <dbl> <dbl>
1 PreI 9 3500 LPS 41.55000 NA
2 PreI 10 3500 LPS 41.75000 NA
3 DayI 9 3500 LPS 40.88182 -0.66818
4 DayI 10 3500 LPS 41.24000 -0.51000

keep first row after calculating difference between rows with dplyr::lag

A possible solution:

library(tidyverse)

df <- structure(list(ind_id = c(1002, 1002, 2340, 2340), wt = c(25,
15, 30, 52), date = structure(c(6416, 6699, 6285, 7166), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")

df %>%
group_by(ind_id) %>%
mutate(mass_diff = (wt-lag(wt))) %>%
mutate(wt = first(wt)) %>%
slice_tail %>% ungroup

#> # A tibble: 2 × 4
#> ind_id wt date mass_diff
#> <dbl> <dbl> <date> <dbl>
#> 1 1002 25 1988-05-05 -10
#> 2 2340 30 1989-08-15 22

Divide difference in consecutive rows by number of NA rows in between and reassign fraction to NA rows. R dplyr() mutate() lag()

You could use na.approx from package zoo:

library(dplyr)
library(zoo)

df %>%
mutate(diff = na.approx(x) - lag(na.approx(x)))

which gives you

# A tibble: 5 x 3
date x diff
<date> <dbl> <dbl>
1 2021-01-01 0 NA
2 2021-01-02 10 10
3 2021-01-02 30 20
4 2021-01-03 NA 15
5 2021-01-04 60 15

With lag(x, default = 0) you can handle the NA at the beginning of your data.frame.



Related Topics



Leave a reply



Submit