Calculate difference between values in consecutive rows by group
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
Or using the lag
function in dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
R: Calculate difference between values in rows with group reference
Try the code below
transform(
df,
Diff = ave(value, group, FUN = function(x) c(NA, diff(x)))
)
which gives
group value Diff
1 1 10 NA
2 1 20 10
3 1 25 5
4 2 5 NA
5 2 10 5
6 2 15 5
Difference between rows in long format for R based on other column variables
You don't have to use lag
, but use diff
:
df %>%
group_by(Variable,ID) %>%
mutate(diff = -diff(Value))
Output:
# A tibble: 8 x 5
# Groups: Variable, ID [4]
ID Condition Variable Value diff
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A X 3 -2
2 1 B X 5 -2
3 2 A X 6 0
4 2 B X 6 0
5 1 A Y 3 -5
6 1 B Y 8 -5
7 2 A Y 3 -3
8 2 B Y 6 -3
How to calculate the difference between rows and divide the difference with the value from the previous row in R?
We can use across
with lag
- loop across
all the columns (everything()
), apply the formula, and create new columns by modifying the .names
- i.e. adding suffix _r
with the corresponding column names ({.col}
)
library(dplyr)
df1 <- df1 %>%
mutate(across(everything(), ~ (. - lag(.))/lag(.),
.names = "{.col}_r"))
-output
df1
A B C A_r B_r C_r
1 15 14 12 NA NA NA
2 7 1 6 -0.5333333 -0.9285714 -0.5000000
3 8 22 5 0.1428571 21.0000000 -0.1666667
4 11 5 1 0.3750000 -0.7727273 -0.8000000
5 4 12 4 -0.6363636 1.4000000 3.0000000
Or use base R
with diff
df1[paste0(names(df1), "_r")] <- rbind(NA,
diff(as.matrix(df1)))/rbind(NA, df1[-nrow(df1),])
Calculate difference between values in rows by two grouping variables
You can order
the data first and apply the ave
code :
db <- db[with(db, order(Studynr, Fugroup)), ]
db$FUdiff <- ave(db$FU, db$Studynr, FUN=function(x) c(NA,diff(x)))
You can implement the same logic in dplyr
and data.table
:
#dplyr
library(dplyr)
db %>%
arrange(Studynr, Fugroup) %>%
group_by(Studynr) %>%
mutate(FUdiff = c(NA, diff(FU))) %>%
ungroup -> db
#data.table
library(data.table)
setDT(db)[order(Studynr, Fugroup), FUdiff := c(NA, diff(FU)), Studynr]
Calculate difference between rows in long data
We could also use first
and last
(with ordering by Time) within the groups:
library(dplyr)
DB |>
group_by(ID) |>
mutate(diff = last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time)) |>
ungroup()
Output:
# A tibble: 6 × 4
ID Time Score diff
<dbl> <dbl> <dbl> <dbl>
1 1 1 105 -5
2 1 2 155 -5
3 1 3 100 -5
4 2 1 105 45
5 2 2 150 45
6 2 3 NA 45
Update 2/aug (thanks to @ Sari Katish): In the case where a group has NA
's only, we could add an ifelse
to the mutate
and it'll return NA
for those groups.
mutate(diff = ifelse(all(is.na(Score)), NA_real_, last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time))) |>
Data:
library(readr)
DB <- read_delim("ID | Time | Score
1 | 1 | 105
1 | 2 | 155
1 | 3 | 100
2 | 1 | 105
2 | 2 | 150
2 | 3 | NA ", delim = "|", trim_ws = TRUE)
Calculating the difference between first and last row in each group
(Assuming dplyr
.) Not assuming that date
is guaranteed to be in order; if it is, then one could also use first(.)
/last(.)
for the same results. I tend to prefer not trusting order ...)
If your discount
is always 0/1 and you are looking to group by contiguous same-values, then
dat %>%
group_by(discountgrp = cumsum(discount != lag(discount, default = discount[1]))) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discountgrp change
# <int> <dbl>
# 1 0 -0.871
# 2 1 -0.481
If your discount
is instead a categorical value and can exceed 1, then
dat %>%
group_by(discount) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discount change
# <dbl> <dbl>
# 1 0 -0.871
# 2 1 -0.481
They happen to be the same here, but if the row order were changed such that some of the 1
s occurred in the middle of 0
s (for instance), then the groups would be different.
Calculate difference between multiple rows by a group in R
You can use match
to get the corresponding sbd
value at wk
1 and 2.
library(dplyr)
df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34
If you want to add a new column in the dataframe keeping the rows same, use mutate
instead of summarise
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))
Related Topics
R: Ggplot Stacked Bar Chart with Counts on Y Axis But Percentage as Label
Adding Column If It Does Not Exist
How to Rbind Vectors Matching Their Column Names
Pad with Leading Zeros to Common Width
How to Strip Dollar Signs ($) from Data/ Escape Special Characters in R
Dealing with Very Small Numbers in R
Can Ggplot2 Control Point Size and Line Size (Lineweight) Separately in One Legend
Shinydashboard Some Font Awesome Icons Not Working
Run Sweave or Knitr with Objects from Existing R Session
How to Annotate a Reference Line at the Same Angle as the Reference Line Itself
How to Cross-Paste All Combinations of Two Vectors (Each-To-Each)
Get Connected Components Using Igraph in R
Make Sequential Numeric Column Names Prefixed with a Letter
Possible to Create Latex Multicolumns in Xtable
Optimized Rolling Functions on Irregular Time Series with Time-Based Window
Range Standardization (0 to 1) in R