How to find difference between values in two rows in an R dataframe using dplyr
In dplyr:
require(dplyr)
df %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = cumVol[1]))
Source: local data frame [8 x 5]
Groups: farm
period farm cumVol other volume
1 1 A 1 1 0
2 2 A 5 2 4
3 3 A 15 3 10
4 4 A 31 4 16
5 1 B 10 5 0
6 2 B 12 6 2
7 3 B 16 7 4
8 4 B 24 8 8
Perhaps the desired output should actually be as follows?
df %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = 0))
period farm cumVol other volume
1 1 A 1 1 1
2 2 A 5 2 4
3 3 A 15 3 10
4 4 A 31 4 16
5 1 B 10 5 10
6 2 B 12 6 2
7 3 B 16 7 4
8 4 B 24 8 8
Edit: Following up on your comments I think you are looking for arrange(). It that is not the case it might be best to start a new question.
df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) );
df1 %>%
arrange(desc(period), desc(farm)) %>%
group_by(period, farm) %>%
summarise(cumVol=sum(cumCropVol))
Edit: Follow up #2
df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) );
df <- df1 %>%
arrange(desc(period), desc(farm)) %>%
group_by(period, farm) %>%
summarise(cumVol=sum(cumCropVol))
ungroup(df) %>%
arrange(farm) %>%
group_by(farm) %>%
mutate(volume = cumVol - lag(cumVol, default = 0))
Source: local data frame [8 x 4]
Groups: farm
period farm cumVol volume
1 1 A 12 12
2 2 A 20 8
3 3 A 40 20
4 4 A 62 22
5 1 B 30 30
6 2 B 34 4
7 3 B 42 8
8 4 B 58 16
Difference between rows in long format for R based on other column variables
You don't have to use lag
, but use diff
:
df %>%
group_by(Variable,ID) %>%
mutate(diff = -diff(Value))
Output:
# A tibble: 8 x 5
# Groups: Variable, ID [4]
ID Condition Variable Value diff
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A X 3 -2
2 1 B X 5 -2
3 2 A X 6 0
4 2 B X 6 0
5 1 A Y 3 -5
6 1 B Y 8 -5
7 2 A Y 3 -3
8 2 B Y 6 -3
Calculating the difference between consecutive rows by group using dplyr?
Like this:
dat %>%
group_by(id) %>%
mutate(time.difference = time - lag(time))
Calculate difference between values in rows by two grouping variables
You can order
the data first and apply the ave
code :
db <- db[with(db, order(Studynr, Fugroup)), ]
db$FUdiff <- ave(db$FU, db$Studynr, FUN=function(x) c(NA,diff(x)))
You can implement the same logic in dplyr
and data.table
:
#dplyr
library(dplyr)
db %>%
arrange(Studynr, Fugroup) %>%
group_by(Studynr) %>%
mutate(FUdiff = c(NA, diff(FU))) %>%
ungroup -> db
#data.table
library(data.table)
setDT(db)[order(Studynr, Fugroup), FUdiff := c(NA, diff(FU)), Studynr]
R: Calculate difference between values in rows with group reference
Try the code below
transform(
df,
Diff = ave(value, group, FUN = function(x) c(NA, diff(x)))
)
which gives
group value Diff
1 1 10 NA
2 1 20 10
3 1 25 5
4 2 5 NA
5 2 10 5
6 2 15 5
Calculate difference between multiple rows by a group in R
You can use match
to get the corresponding sbd
value at wk
1 and 2.
library(dplyr)
df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34
If you want to add a new column in the dataframe keeping the rows same, use mutate
instead of summarise
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))
Finding maximum difference between columns of same name in R
First of all, it's a bit dangerous (and not allowed in some cases) to have non-unique column names, so the first thing I did was to uniqueify the names using base::make.unique()
. From there, I used tidyr::pivot_longer()
so that the grouping information contained in the column names could be accessed more easily. Here I use a regex inside names_pattern
to discard the differentiating parts of the column names so they will be the same again. Then we use dplyr::group_by()
followed by dplyr::summarize()
to get the largest difference in each id
and grp
which corresponds to your rows and similar columns in the original data. Finally we use dplyr::slice_max()
to return only the largest difference per group.
library(tidyverse)
d <- structure(list(A = c(2L, -3L), A = c(4L, 0L), B = c(5L, 2L), B = 2:3, B = c(1L, 4L), C = c(0L, 2L)), row.names = c(NA, -2L), class = "data.frame")
# give unique names
names(d) <- make.unique(names(d), sep = "_")
d %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = "grp", names_pattern = "([A-Z])*") %>%
group_by(id, grp) %>%
summarise(max_diff = max(value) - min(value)) %>%
slice_max(order_by = max_diff, n = 1, with_ties = F)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups: id [2]
#> id grp max_diff
#> <int> <chr> <int>
#> 1 1 B 4
#> 2 2 A 3
Created on 2022-02-14 by the reprex package (v2.0.1)
Related Topics
Warning in Install.Packages: Unable to Move Temporary Installation
Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File
Geom_Density to Match Geom_Histogram Binwitdh
How to Train a Ml Model in Sparklyr and Predict New Values on Another Dataframe
Number Format, Writing 1E-5 Instead of 0.00001
R Programming: Cache the Inverse of a Matrix
Find All Combinations of Numbers That Sum to a Target
Calculating Sum of Previous 3 Rows in R Data.Table (By Grid-Square)
Find All Positions of All Matches of One Vector of Values in Second Vector
Convert/Export Googleway Output to Data Frame
How to Merge Two Data Frames on Common Columns in R with Sum of Others
What Does the Error "Arguments Imply Differing Number of Rows: X, Y" Mean
Merge Dataframes on Matching A, B and *Closest* C
Increase Space Between Bars in Ggplot
Add Row in Each Group Using Dplyr and Add_Row()
All Possible Combinations of a Set That Sum to a Target Value