How to Find Difference Between Values in Two Rows in an R Dataframe Using Dplyr

How to find difference between values in two rows in an R dataframe using dplyr

In dplyr:

require(dplyr)
df %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = cumVol[1]))

Source: local data frame [8 x 5]
Groups: farm

period farm cumVol other volume
1 1 A 1 1 0
2 2 A 5 2 4
3 3 A 15 3 10
4 4 A 31 4 16
5 1 B 10 5 0
6 2 B 12 6 2
7 3 B 16 7 4
8 4 B 24 8 8

Perhaps the desired output should actually be as follows?

df %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = 0))

period farm cumVol other volume
1 1 A 1 1 1
2 2 A 5 2 4
3 3 A 15 3 10
4 4 A 31 4 16
5 1 B 10 5 10
6 2 B 12 6 2
7 3 B 16 7 4
8 4 B 24 8 8

Edit: Following up on your comments I think you are looking for arrange(). It that is not the case it might be best to start a new question.

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df1 %>%
arrange(desc(period), desc(farm)) %>%
group_by(period, farm) %>%
summarise(cumVol=sum(cumCropVol))

Edit: Follow up #2

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); 
df <- df1 %>%
arrange(desc(period), desc(farm)) %>%
group_by(period, farm) %>%
summarise(cumVol=sum(cumCropVol))

ungroup(df) %>%
arrange(farm) %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = 0))

Source: local data frame [8 x 4]
Groups: farm

period farm cumVol volume
1 1 A 12 12
2 2 A 20 8
3 3 A 40 20
4 4 A 62 22
5 1 B 30 30
6 2 B 34 4
7 3 B 42 8
8 4 B 58 16

Difference between rows in long format for R based on other column variables

You don't have to use lag, but use diff:

df %>% 
group_by(Variable,ID) %>%
mutate(diff = -diff(Value))

Output:

# A tibble: 8 x 5
# Groups: Variable, ID [4]
ID Condition Variable Value diff
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A X 3 -2
2 1 B X 5 -2
3 2 A X 6 0
4 2 B X 6 0
5 1 A Y 3 -5
6 1 B Y 8 -5
7 2 A Y 3 -3
8 2 B Y 6 -3

Calculating the difference between consecutive rows by group using dplyr?

Like this:

dat %>% 
group_by(id) %>%
mutate(time.difference = time - lag(time))

Calculate difference between values in rows by two grouping variables

You can order the data first and apply the ave code :

db <- db[with(db, order(Studynr, Fugroup)), ]
db$FUdiff <- ave(db$FU, db$Studynr, FUN=function(x) c(NA,diff(x)))

You can implement the same logic in dplyr and data.table :

#dplyr
library(dplyr)

db %>%
arrange(Studynr, Fugroup) %>%
group_by(Studynr) %>%
mutate(FUdiff = c(NA, diff(FU))) %>%
ungroup -> db

#data.table
library(data.table)
setDT(db)[order(Studynr, Fugroup), FUdiff := c(NA, diff(FU)), Studynr]

R: Calculate difference between values in rows with group reference

Try the code below

transform(
df,
Diff = ave(value, group, FUN = function(x) c(NA, diff(x)))
)

which gives

  group value Diff
1 1 10 NA
2 1 20 10
3 1 25 5
4 2 5 NA
5 2 10 5
6 2 15 5

Calculate difference between multiple rows by a group in R

You can use match to get the corresponding sbd value at wk 1 and 2.

library(dplyr)

df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])

# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34

If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))

Finding maximum difference between columns of same name in R

First of all, it's a bit dangerous (and not allowed in some cases) to have non-unique column names, so the first thing I did was to uniqueify the names using base::make.unique(). From there, I used tidyr::pivot_longer() so that the grouping information contained in the column names could be accessed more easily. Here I use a regex inside names_pattern to discard the differentiating parts of the column names so they will be the same again. Then we use dplyr::group_by() followed by dplyr::summarize() to get the largest difference in each id and grp which corresponds to your rows and similar columns in the original data. Finally we use dplyr::slice_max() to return only the largest difference per group.

library(tidyverse)

d <- structure(list(A = c(2L, -3L), A = c(4L, 0L), B = c(5L, 2L), B = 2:3, B = c(1L, 4L), C = c(0L, 2L)), row.names = c(NA, -2L), class = "data.frame")

# give unique names
names(d) <- make.unique(names(d), sep = "_")

d %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = "grp", names_pattern = "([A-Z])*") %>%
group_by(id, grp) %>%
summarise(max_diff = max(value) - min(value)) %>%
slice_max(order_by = max_diff, n = 1, with_ties = F)

#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups: id [2]
#> id grp max_diff
#> <int> <chr> <int>
#> 1 1 B 4
#> 2 2 A 3

Created on 2022-02-14 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit