Use a value from the previous row in an R data.table calculation
With shift()
implemented in v1.9.6, this is quite straightforward.
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
- New function
shift()
implements fastlead/lag
of vector, list, data.frames or data.tables. It takes atype
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with:=
orset()
. For example:DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at?shift
for more info.
See history for previous answers.
Fast way to calculate value in cell based on value in previous row (data.table)
Here is a solution using accumulate
function from purrr
package in case you were interested. In this solution .y
represents the current value of var1
that we would like to iterate over and .x
represents the accumulated value that we calculate and put in var2
column. As you might have noticed I excluded the first value of var1
as it we don't apply our formula on it.
library(dplyr)
library(purrr)
dt %>%
mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))
var1 var2
1: -92186.747 -3.120000
2: -19163.504 -3.088501
3: -18178.840 -3.058620
4: -9844.679 -3.042439
5: -16494.780 -3.015326
6: -17088.058 -2.987238
data.table row value depend on previous value in R
You could use Reduce
with accumulate=T
option :
library(data.table)
x = data.table(a = c(1:5), b = c(1,0,2,3,6), c = NA)
x$a[1] = NA
x$b[1] = NA
x[,c:=Reduce(f = function(prev,val) ifelse((val$a < val$b & prev<val$b),val$a,val$b),
x = split(.SD[-1],seq_len(.N-1)), init = NA
,accumulate = T)][]
#> a b c
#> <int> <num> <num>
#> 1: NA NA NA
#> 2: 2 0 0
#> 3: 3 2 2
#> 4: 4 3 3
#> 5: 5 6 5
Reduce
passes the result of the previous row calculation to calculate the next row.accumulate=T
returns the intermediate results instead of only the last row.
R data.table values depending on previous row
Another option:
cols <- c('a','b','c')
A <- 1; B <- 2; C <- 3
DT[iter==1, (cols) := .(A, B, C)]
DT[iter>1,
(cols) := {
A = A + B
B = B - A
C = A / B + C
.(A, B, C)
},
by=iter]
Reference previous value in data.table calculation
We need a logic with accumulate
library(tidyverse)
dt %>%
mutate(z = tail(accumulate(x, ~ .y * 0.1 + 0.2 * .x, .init = 0), -1))
# x z
#1 1 0.10000
#2 2 0.22000
#3 3 0.34400
#4 5 0.56880
#5 1 0.21376
Or the same logic with Reduce
dt[, z := tail(Reduce(function(x, y) y * 0.1 + 0.2 * x, x,
init = 0, accumulate = TRUE), -1)]
Referring to previous row in calculation
1) for loop Normally one would just use a simple loop for this:
MyData <- data.frame(A = c(5, 10, 15, 20))
MyData$b <- 0
n <- nrow(MyData)
if (n > 1) for(i in 2:n) MyData$b[i] <- ( MyData$A[i] + 13 * MyData$b[i-1] )/ 14
MyData$b[1] <- NA
giving:
> MyData
A b
1 5 NA
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586
2) Reduce It would also be possible to use Reduce
. One first defines a function f
that carries out the body of the loop and then we have Reduce
invoke it repeatedly like this:
f <- function(b, A) (A + 13 * b) / 14
MyData$b <- Reduce(f, MyData$A[-1], 0, acc = TRUE)
MyData$b[1] <- NA
giving the same result.
This gives the appearance of being vectorized but in fact if you look at the source of Reduce
it does a for
loop itself.
3) filter Noting that the form of the problem is a recursive filter with coefficient 13/14 operating on A/14 (but with A[1] replaced with 0) we can write the following. Since filter
returns a time series we use c(...)
to convert it back to an ordinary vector. This approach actually is vectorized as the filter operation is performed in C.
MyData$b <- c(filter(replace(MyData$A, 1, 0)/14, 13/14, method = "recursive"))
MyData$b[1] <- NA
again giving the same result.
Note: All solutions assume that MyData
has at least 1 row.
how to create a new column where the rows are determined by the previous row (calculation)?
You need to group_by
to avoid this issue.
# Read in data
inflation <- read.table(text = "CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo", h = T)
# Perform calculation
library(dplyr)
inflation |>
group_by(City) |>
mutate(
cpi_change = lead(CPI) - CPI,
cpi_change_percent = cpi_change / CPI * 100
)
Output:
# A tibble: 10 x 5
# # Groups: City [2]
# CPI Date City cpi_change cpi_change_percent
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 112 2005-01-01 Stockholm 1.5 1.34
# 2 114. 2005-02-01 Stockholm 1.5 1.32
# 3 115 2005-03-01 Stockholm 0.600 0.522
# 4 116. 2005-04-01 Stockholm 0.200 0.173
# 5 116. 2005-05-01 Stockholm NA NA
# 6 106 2005-01-01 Malmo 1.5 1.42
# 7 108. 2005--02-01 Malmo 2.5 2.33
# 8 110 2005-03-01 Malmo 3 2.73
# 9 113 2005-04-01 Malmo 4 3.54
# 10 117 2005-05-01 Malmo NA NA
You will get NAs for the last month as we do not know the rate in the following year. Alternatively you can do it with lag
instead of lead
if you want to work out change from previous, but then you'll get NAs for the first month.
Use previous row's calculated value to calculate current
In this case I would say use a simple for-loop.
You cannot use lag(g)
because you haven't built g
column yet.
res <- rep(0, nrow(df))
for (i in 1:nrow(df)) {
row <- df[i, ]
if (is.na(row["c"]) && is.na(row["f"])) {
res[i] <- row["b"]
} else if (is.na(row["c"])) {
res[i] <- min(row["b"], row["f"])
} else if (!is.na(row["d"])) {
res[i] <- min(row["b"], row["e"], row["f") + res[i-1])
}
}
df$g <- res
Related Topics
Remove Extra Legends in Ggplot2
Access Variable Value Where the Name of Variable Is Stored in a String
Shiny 4 Small Textinput Boxes Side-By-Side
Addressing X and Y in Aes by Variable Number
Convert Hour:Minute:Second (Hh:Mm:Ss) String to Proper Time Class
Pass Arguments to Dplyr Functions
Workflow For Statistical Analysis and Report Writing
Returning Multiple Objects in an R Function
Use of Ggplot() Within Another Function in R
How to Change the Default Library Path For R Packages
Test If Characters Are in a String
Subset a Dataframe Between 2 Dates
Creating Arbitrary Panes in Ggplot2
Controlling Ggplot2 Legend Display Order
Adding Minor Tick Marks to the X Axis in Ggplot2 (With No Labels)