Use a Value from the Previous Row in an R Data.Table Calculation

Use a value from the previous row in an R data.table calculation

With shift() implemented in v1.9.6, this is quite straightforward.

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

From NEWS:


  1. New function shift() implements fast lead/lag of vector, list, data.frames or data.tables. It takes a type argument which can be either "lag" (default) or "lead". It enables very convenient usage along with := or set(). For example: DT[, (cols) := shift(.SD, 1L), by=id]. Please have a look at ?shift for more info.

See history for previous answers.

Fast way to calculate value in cell based on value in previous row (data.table)

Here is a solution using accumulate function from purrr package in case you were interested. In this solution .y represents the current value of var1 that we would like to iterate over and .x represents the accumulated value that we calculate and put in var2 column. As you might have noticed I excluded the first value of var1 as it we don't apply our formula on it.

library(dplyr)
library(purrr)

dt %>%
mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))


var1 var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4: -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238

data.table row value depend on previous value in R

You could use Reduce with accumulate=T option :

library(data.table)

x = data.table(a = c(1:5), b = c(1,0,2,3,6), c = NA)
x$a[1] = NA
x$b[1] = NA

x[,c:=Reduce(f = function(prev,val) ifelse((val$a < val$b & prev<val$b),val$a,val$b),
x = split(.SD[-1],seq_len(.N-1)), init = NA
,accumulate = T)][]

#> a b c
#> <int> <num> <num>
#> 1: NA NA NA
#> 2: 2 0 0
#> 3: 3 2 2
#> 4: 4 3 3
#> 5: 5 6 5

Reduce passes the result of the previous row calculation to calculate the next row.
accumulate=T returns the intermediate results instead of only the last row.

R data.table values depending on previous row

Another option:

cols <- c('a','b','c')
A <- 1; B <- 2; C <- 3
DT[iter==1, (cols) := .(A, B, C)]
DT[iter>1,
(cols) := {
A = A + B
B = B - A
C = A / B + C
.(A, B, C)
},
by=iter]

Reference previous value in data.table calculation

We need a logic with accumulate

library(tidyverse)
dt %>%
mutate(z = tail(accumulate(x, ~ .y * 0.1 + 0.2 * .x, .init = 0), -1))
# x z
#1 1 0.10000
#2 2 0.22000
#3 3 0.34400
#4 5 0.56880
#5 1 0.21376

Or the same logic with Reduce

dt[, z := tail(Reduce(function(x, y)  y * 0.1 + 0.2 * x, x, 
init = 0, accumulate = TRUE), -1)]

Referring to previous row in calculation

1) for loop Normally one would just use a simple loop for this:

MyData <- data.frame(A = c(5, 10, 15, 20))


MyData$b <- 0
n <- nrow(MyData)
if (n > 1) for(i in 2:n) MyData$b[i] <- ( MyData$A[i] + 13 * MyData$b[i-1] )/ 14
MyData$b[1] <- NA

giving:

> MyData
A b
1 5 NA
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586

2) Reduce It would also be possible to use Reduce. One first defines a function f that carries out the body of the loop and then we have Reduce invoke it repeatedly like this:

f <- function(b, A) (A + 13 * b) / 14
MyData$b <- Reduce(f, MyData$A[-1], 0, acc = TRUE)
MyData$b[1] <- NA

giving the same result.

This gives the appearance of being vectorized but in fact if you look at the source of Reduce it does a for loop itself.

3) filter Noting that the form of the problem is a recursive filter with coefficient 13/14 operating on A/14 (but with A[1] replaced with 0) we can write the following. Since filter returns a time series we use c(...) to convert it back to an ordinary vector. This approach actually is vectorized as the filter operation is performed in C.

MyData$b <- c(filter(replace(MyData$A, 1, 0)/14, 13/14, method = "recursive"))
MyData$b[1] <- NA

again giving the same result.

Note: All solutions assume that MyData has at least 1 row.

how to create a new column where the rows are determined by the previous row (calculation)?

You need to group_by to avoid this issue.

# Read in data
inflation <- read.table(text = "CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo", h = T)

# Perform calculation
library(dplyr)

inflation |>
group_by(City) |>
mutate(
cpi_change = lead(CPI) - CPI,
cpi_change_percent = cpi_change / CPI * 100
)

Output:

# A tibble: 10 x 5
# # Groups: City [2]
# CPI Date City cpi_change cpi_change_percent
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 112 2005-01-01 Stockholm 1.5 1.34
# 2 114. 2005-02-01 Stockholm 1.5 1.32
# 3 115 2005-03-01 Stockholm 0.600 0.522
# 4 116. 2005-04-01 Stockholm 0.200 0.173
# 5 116. 2005-05-01 Stockholm NA NA
# 6 106 2005-01-01 Malmo 1.5 1.42
# 7 108. 2005--02-01 Malmo 2.5 2.33
# 8 110 2005-03-01 Malmo 3 2.73
# 9 113 2005-04-01 Malmo 4 3.54
# 10 117 2005-05-01 Malmo NA NA

You will get NAs for the last month as we do not know the rate in the following year. Alternatively you can do it with lag instead of lead if you want to work out change from previous, but then you'll get NAs for the first month.

Use previous row's calculated value to calculate current

In this case I would say use a simple for-loop.
You cannot use lag(g) because you haven't built g column yet.

    res <- rep(0, nrow(df))
for (i in 1:nrow(df)) {
row <- df[i, ]
if (is.na(row["c"]) && is.na(row["f"])) {
res[i] <- row["b"]
} else if (is.na(row["c"])) {
res[i] <- min(row["b"], row["f"])
} else if (!is.na(row["d"])) {
res[i] <- min(row["b"], row["e"], row["f") + res[i-1])
}
}
df$g <- res


Related Topics



Leave a reply



Submit