Calculate difference between values in consecutive rows by group
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
Or using the lag
function in dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
R programming: How to find a difference in value for every two consecutive dates, given a specific ID
## Order data.frame by IDs, then by increasing sleep_end_dates (if not already sorted)
df <- df[order(df$ID, df$sleep_end_date),]
## Calculate difference in total_sleep with previous entry
df$diff_hours_of_sleep <- c(NA,abs(diff(df$total_sleep)))
## If previous ID is not equal, replace diff_hours_of_sleep with NA
ind <- c(NA, diff(df$ID))
df$diff_hours_of_sleep[ind != 0] <- NA
## And if previous day wasn't yesterday, replace diff_hours_of_sleep with NA
day_ind <- c(NA, diff(df$sleep_end_date))
df$diff_hours_of_sleep[day_ind != 1] <- NA
compute the difference of two values within 1 column
The solution is quite straightforward iff, as your sample suggests, you always have 2 values for each subject:
library(dplyr)
df %>%
group_by(Subject) %>%
mutate(Diff = lead(Response_time) - Response_time) %>%
fill(Diff)
# A tibble: 6 × 3
# Groups: Subject [3]
Subject Response_time Diff
<chr> <dbl> <dbl>
1 Jeff 1000 2000
2 Jeff 3000 2000
3 Amy 2000 11000
4 Amy 13000 11000
5 Ed 1500 300
6 Ed 1800 300
Data:
df <- data.frame(
Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
Response_time = c(1000,3000,2000,13000,1500,1800)
)
Calculating the difference between consecutive rows by group using dplyr?
Like this:
dat %>%
group_by(id) %>%
mutate(time.difference = time - lag(time))
Get the difference between two non consecutive rows
This should do the trick. However the last few rows are omitted from the output so the matrix obtained is smaller than your input.
diff(as.matrix(your_data_frame), lag = 3)
Related Topics
Selecting Only Duplicates Based on Multiple Columns in R
Easier Way to Use Grepl and Ifelse Across Multiple Columns
How to Find the Closest Date to a Given Date
Use Dynamic Name For New Column/Variable in 'Dplyr'
Side-By-Side Plots With Ggplot2
Data.Table VS Dplyr: Can One Do Something Well the Other Can't or Does Poorly
How to Convert Excel Date Format to Proper Date in R
Force the Origin to Start At 0
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source
Split Character Column into Several Binary (0/1) Columns
Conditionally Remove Rows from a Database Using R
Splitting a Large Data Frame into Smaller Segments
Column Name Changes in R for Loop for Defined Data Frame
Drop Unused Factor Levels in a Subsetted Data Frame
How to Read Data When Some Numbers Contain Commas as Thousand Separator