Calculating the difference between consecutive rows by group using dplyr?
Like this:
dat %>%
group_by(id) %>%
mutate(time.difference = time - lag(time))
Calculate difference between values in consecutive rows by group
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
Or using the lag
function in dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
R Difference between consecutive rows while retaining first row
We should use default = 0
.
library(dplyr)
df %>% group_by(id) %>% mutate(rt = time - lag(time, default = 0))
# A tibble: 10 × 3
# Groups: id [2]
id time rt
<int> <dbl> <dbl>
1 1 26204 26204
2 1 46692 20488
3 1 60268 13576
4 1 86240 25972
5 1 91872 5632
6 2 291242 291242
7 2 312311 21069
8 2 333983 21672
9 2 355122 21139
10 2 364841 9719
Calculate difference between multiple rows by a group in R
You can use match
to get the corresponding sbd
value at wk
1 and 2.
library(dplyr)
df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34
If you want to add a new column in the dataframe keeping the rows same, use mutate
instead of summarise
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))
Calculate difference between values by group and matched for time
You're nearly there! The key is to convert Tb_Period to an ordered factor, such that PreI
is treated as "less than" DayI
, which is in turn less than PostI
. Once this is established, we can group by each bird and hour, and sort by Tb_Period to ensure that differences are calculated in the correct order:
df <- read.table(text = 'Tb_Period Hour Bird_ID Treatment meanHourTb
PreI 9 3500 LPS 41.55000
PreI 10 3500 LPS 41.75000
DayI 9 3500 LPS 40.88182
DayI 10 3500 LPS 41.24000', header = T, stringsAsFactors = F)
df <- df %>%
mutate(Tb_Period = factor(Tb_Period, c('PreI', 'DayI', 'PostI'), ordered = T)) %>%
group_by(Bird_ID, Hour) %>%
mutate(diff = meanHourTb - lag(meanHourTb, 1))
# A tibble: 4 x 6
# Groups: Bird_ID, Hour [2]
Tb_Period Hour Bird_ID Treatment meanHourTb diff
<ord> <int> <int> <chr> <dbl> <dbl>
1 PreI 9 3500 LPS 41.55000 NA
2 PreI 10 3500 LPS 41.75000 NA
3 DayI 9 3500 LPS 40.88182 -0.66818
4 DayI 10 3500 LPS 41.24000 -0.51000
keep first row after calculating difference between rows with dplyr::lag
A possible solution:
library(tidyverse)
df <- structure(list(ind_id = c(1002, 1002, 2340, 2340), wt = c(25,
15, 30, 52), date = structure(c(6416, 6699, 6285, 7166), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")
df %>%
group_by(ind_id) %>%
mutate(mass_diff = (wt-lag(wt))) %>%
mutate(wt = first(wt)) %>%
slice_tail %>% ungroup
#> # A tibble: 2 × 4
#> ind_id wt date mass_diff
#> <dbl> <dbl> <date> <dbl>
#> 1 1002 25 1988-05-05 -10
#> 2 2340 30 1989-08-15 22
Divide difference in consecutive rows by number of NA rows in between and reassign fraction to NA rows. R dplyr() mutate() lag()
You could use na.approx
from package zoo
:
library(dplyr)
library(zoo)
df %>%
mutate(diff = na.approx(x) - lag(na.approx(x)))
which gives you
# A tibble: 5 x 3
date x diff
<date> <dbl> <dbl>
1 2021-01-01 0 NA
2 2021-01-02 10 10
3 2021-01-02 30 20
4 2021-01-03 NA 15
5 2021-01-04 60 15
With lag(x, default = 0)
you can handle the NA
at the beginning of your data.frame.
Related Topics
Automatic Documentation of Datasets
How to Extract Elements from a List with Mixed Elements
Does R Leverage Simd When Doing Vectorized Calculations
Problems Using Foreach Parallelization
Checking Cran Incoming Feasibility ... Note Maintainer
Delete Columns Where All Values Are 0
Efficiently Locf by Groups in a Single R Data.Table
Adding S4 Dispatch to Base R S3 Generic
Fast Replacing Values in Dataframe in R
Replacing Nas in R with Nearest Value
How to Use Plyr to Number Rows
Forcing R Output to Be Scientific Notation with at Most Two Decimals
How to Find the Polygon Nearest to a Point in R
How to Set Unique Row and Column Names of a Matrix When Its Dimension Is Unknown