R How to Calculate Difference Between Rows in a Data Frame

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

R: Calculate difference between values in rows with group reference

Try the code below

transform(
    df,
    Diff = ave(value, group, FUN = function(x) c(NA, diff(x)))
)

which gives

  group value Diff
1     1    10   NA
2     1    20   10
3     1    25    5
4     2     5   NA
5     2    10    5
6     2    15    5

Difference between rows in long format for R based on other column variables

You don't have to use lag, but use diff:

df %>% 
  group_by(Variable,ID) %>% 
  mutate(diff = -diff(Value))

Output:

# A tibble: 8 x 5
# Groups:   Variable, ID [4]
     ID Condition Variable Value  diff
  <dbl> <chr>     <chr>    <dbl> <dbl>
1     1 A         X            3    -2
2     1 B         X            5    -2
3     2 A         X            6     0
4     2 B         X            6     0
5     1 A         Y            3    -5
6     1 B         Y            8    -5
7     2 A         Y            3    -3
8     2 B         Y            6    -3

How to calculate the difference between rows and divide the difference with the value from the previous row in R?

We can use across with lag - loop across all the columns (everything()), apply the formula, and create new columns by modifying the .names - i.e. adding suffix _r with the corresponding column names ({.col})

library(dplyr)
df1 <- df1 %>%
   mutate(across(everything(),  ~ (. - lag(.))/lag(.),
   .names = "{.col}_r"))

-output

df1
   A  B  C        A_r        B_r        C_r
1 15 14 12         NA         NA         NA
2  7  1  6 -0.5333333 -0.9285714 -0.5000000
3  8 22  5  0.1428571 21.0000000 -0.1666667
4 11  5  1  0.3750000 -0.7727273 -0.8000000
5  4 12  4 -0.6363636  1.4000000  3.0000000

Or use base R with diff

df1[paste0(names(df1), "_r")] <- rbind(NA, 
       diff(as.matrix(df1)))/rbind(NA, df1[-nrow(df1),])

Calculate difference between values in rows by two grouping variables

You can order the data first and apply the ave code :

db <- db[with(db, order(Studynr, Fugroup)), ]
db$FUdiff <- ave(db$FU, db$Studynr, FUN=function(x) c(NA,diff(x)))

You can implement the same logic in dplyr and data.table :

#dplyr
library(dplyr)

db %>%
  arrange(Studynr, Fugroup) %>%
  group_by(Studynr) %>%
  mutate(FUdiff = c(NA, diff(FU))) %>%
  ungroup -> db

#data.table
library(data.table)
setDT(db)[order(Studynr, Fugroup), FUdiff := c(NA, diff(FU)), Studynr]

Calculate difference between rows in long data

We could also use first and last (with ordering by Time) within the groups:

library(dplyr)

DB |>
group_by(ID) |>
  mutate(diff = last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time)) |>
ungroup()

Output:

# A tibble: 6 × 4
     ID  Time Score  diff
  <dbl> <dbl> <dbl> <dbl>
1     1     1   105    -5
2     1     2   155    -5
3     1     3   100    -5
4     2     1   105    45
5     2     2   150    45
6     2     3    NA    45

Update 2/aug (thanks to @ Sari Katish): In the case where a group has NA's only, we could add an ifelse to the mutate and it'll return NA for those groups.

mutate(diff = ifelse(all(is.na(Score)), NA_real_, last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time))) |>

Data:

library(readr)

DB <- read_delim("ID   | Time   | Score
         1    | 1      | 105   
1    | 2      | 155   
1    | 3      | 100  
2    | 1      | 105  
2    | 2      | 150  
2    | 3      | NA   ", delim = "|", trim_ws = TRUE)

Calculating the difference between first and last row in each group

(Assuming dplyr.) Not assuming that date is guaranteed to be in order; if it is, then one could also use first(.)/last(.) for the same results. I tend to prefer not trusting order ...)

If your discount is always 0/1 and you are looking to group by contiguous same-values, then

dat %>%
  group_by(discountgrp = cumsum(discount != lag(discount, default = discount[1]))) %>%
  summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
#   discountgrp change
#         <int>  <dbl>
# 1           0 -0.871
# 2           1 -0.481

If your discount is instead a categorical value and can exceed 1, then

dat %>%
  group_by(discount) %>%
  summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
#   discount change
#      <dbl>  <dbl>
# 1        0 -0.871
# 2        1 -0.481

They happen to be the same here, but if the row order were changed such that some of the 1s occurred in the middle of 0s (for instance), then the groups would be different.

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
  group_by(code, tmp) %>%
  summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

#  code  tmp    diff
#  <chr> <chr> <dbl>
#1 abc01 T1    -0.67
#2 abc01 T2     0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1", 
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", 
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83, 
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22, 
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)), 
class = "data.frame", row.names = c(NA, -18L))