How to Find Difference Between Values in Two Rows in an R Dataframe Using Dplyr

How to find difference between values in two rows in an R dataframe using dplyr

In dplyr:

require(dplyr)
df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = cumVol[1]))

Source: local data frame [8 x 5]
Groups: farm

  period farm cumVol other volume
1      1    A      1     1      0
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5      0
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8

Perhaps the desired output should actually be as follows?

df %>%
  group_by(farm) %>%
  mutate(volume = cumVol - lag(cumVol, default = 0))

  period farm cumVol other volume
1      1    A      1     1      1
2      2    A      5     2      4
3      3    A     15     3     10
4      4    A     31     4     16
5      1    B     10     5     10
6      2    B     12     6      2
7      3    B     16     7      4
8      4    B     24     8      8

Edit: Following up on your comments I think you are looking for arrange(). It that is not the case it might be best to start a new question.

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df1 %>% 
  arrange(desc(period), desc(farm)) %>%
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))

Edit: Follow up #2

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df <- df1 %>% 
  arrange(desc(period), desc(farm)) %>% 
  group_by(period, farm) %>% 
  summarise(cumVol=sum(cumCropVol))

ungroup(df) %>% 
  arrange(farm) %>%
  group_by(farm) %>% 
  mutate(volume = cumVol - lag(cumVol, default = 0))

Source: local data frame [8 x 4]
Groups: farm

  period farm cumVol volume
1      1    A     12     12
2      2    A     20      8
3      3    A     40     20
4      4    A     62     22
5      1    B     30     30
6      2    B     34      4
7      3    B     42      8
8      4    B     58     16

Difference between rows in long format for R based on other column variables

You don't have to use lag, but use diff:

df %>% 
  group_by(Variable,ID) %>% 
  mutate(diff = -diff(Value))

Output:

# A tibble: 8 x 5
# Groups:   Variable, ID [4]
     ID Condition Variable Value  diff
  <dbl> <chr>     <chr>    <dbl> <dbl>
1     1 A         X            3    -2
2     1 B         X            5    -2
3     2 A         X            6     0
4     2 B         X            6     0
5     1 A         Y            3    -5
6     1 B         Y            8    -5
7     2 A         Y            3    -3
8     2 B         Y            6    -3

Calculating the difference between consecutive rows by group using dplyr?

Like this:

dat %>% 
  group_by(id) %>% 
  mutate(time.difference = time - lag(time))

Calculate difference between values in rows by two grouping variables

You can order the data first and apply the ave code :

db <- db[with(db, order(Studynr, Fugroup)), ]
db$FUdiff <- ave(db$FU, db$Studynr, FUN=function(x) c(NA,diff(x)))

You can implement the same logic in dplyr and data.table :

#dplyr
library(dplyr)

db %>%
  arrange(Studynr, Fugroup) %>%
  group_by(Studynr) %>%
  mutate(FUdiff = c(NA, diff(FU))) %>%
  ungroup -> db

#data.table
library(data.table)
setDT(db)[order(Studynr, Fugroup), FUdiff := c(NA, diff(FU)), Studynr]

R: Calculate difference between values in rows with group reference

Try the code below

transform(
    df,
    Diff = ave(value, group, FUN = function(x) c(NA, diff(x)))
)

which gives

  group value Diff
1     1    10   NA
2     1    20   10
3     1    25    5
4     2     5   NA
5     2    10    5
6     2    15    5

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
  group_by(code, tmp) %>%
  summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

#  code  tmp    diff
#  <chr> <chr> <dbl>
#1 abc01 T1    -0.67
#2 abc01 T2     0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1", 
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", 
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83, 
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22, 
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)), 
class = "data.frame", row.names = c(NA, -18L))

Finding maximum difference between columns of same name in R

First of all, it's a bit dangerous (and not allowed in some cases) to have non-unique column names, so the first thing I did was to uniqueify the names using base::make.unique(). From there, I used tidyr::pivot_longer() so that the grouping information contained in the column names could be accessed more easily. Here I use a regex inside names_pattern to discard the differentiating parts of the column names so they will be the same again. Then we use dplyr::group_by() followed by dplyr::summarize() to get the largest difference in each id and grp which corresponds to your rows and similar columns in the original data. Finally we use dplyr::slice_max() to return only the largest difference per group.

library(tidyverse)

d <- structure(list(A = c(2L, -3L), A = c(4L, 0L), B = c(5L, 2L), B = 2:3, B = c(1L, 4L), C = c(0L, 2L)), row.names = c(NA, -2L), class = "data.frame")

# give unique names
names(d) <- make.unique(names(d), sep = "_")

d %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id, names_to = "grp", names_pattern = "([A-Z])*") %>% 
  group_by(id, grp) %>% 
  summarise(max_diff = max(value) - min(value)) %>% 
  slice_max(order_by = max_diff, n = 1, with_ties = F)

#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups:   id [2]
#>      id grp   max_diff
#>   <int> <chr>    <int>
#> 1     1 B            4
#> 2     2 A            3

^{Created on 2022-02-14 by the reprex package (v2.0.1)}

How to Find Difference Between Values in Two Rows in an R Dataframe Using Dplyr