Recursive function using dplyr
We could use accumulate
from purrr
. With accumulate
, do the recursive sum
of 'x' elements while initiating with a value of 5 (.init = 5
) and remove the first element of accumulate
output ([-1]
)
library(purrr)
library(dplyr)
dat %>%
mutate(y = accumulate(x, ~ .x + .y, .init = 5)[-1])
# A tibble: 11 x 3
# day x y
# <int> <int> <dbl>
# 1 200 4 9.00
# 2 201 3 12.0
# 3 202 -4 8.00
# 4 203 -7 1.00
# 5 204 -3 - 2.00
# 6 205 1 - 1.00
# 7 206 -5 - 6.00
# 8 207 -1 - 7.00
# 9 208 -4 -11.0
#10 209 -2 -13.0
#11 210 4 - 9.00
A similar approach in base R
would be
dat$y <- Reduce(function(u, v) u + v , dat$x, init = 5, accumulate = TRUE)[-1]
dat$y
#[1] 9 12 8 1 -2 -1 -6 -7 -11 -13 -9
Application of a recursive function within a dplyr context in R
I think you can probably get what you need here with a mix of tidyr::fill
to fill NA
values from above, combined with cumprod
to get the cumulative effect of multiplying by the coefficient, and ifelse
to choose when to use it. There's also a "working" column named V which is created and destroyed in the process.
library(dplyr)
df %>%
mutate(V = tidyr::fill(df, VALUE)$VALUE) %>%
group_by(ID) %>%
mutate(VALUE = ifelse(is.na(VALUE),
V * cumprod(ifelse(is.na(VALUE), COEFF, 1)),
VALUE)) %>% select(-V)
#> # A tibble: 10 x 3
#> # Groups: ID [2]
#> ID VALUE COEFF
#> <fct> <dbl> <dbl>
#> 1 a 1 1
#> 2 a 3 2
#> 3 a 3 1
#> 4 a 1.5 0.5
#> 5 a 150 100
#> 6 b 2 1
#> 7 b 2 1
#> 8 b 3 1
#> 9 b 3 1
#> 10 b 3 1
Created on 2020-06-30 by the reprex package (v0.3.0)
Recursion with dplyr
You can use purrr::accumulate()
(or base::Reduce()
if you prefer):
library(dplyr)
library(purrr)
mtcars %>%
as_tibble() %>%
select(mpg, qsec) %>%
head(5) %>%
mutate(new_col = accumulate(tail(mpg + qsec, -1), .f = ~ .y - .x, .init = 10))
# A tibble: 5 × 3
mpg qsec new_col
<dbl> <dbl> <dbl>
1 21 16.5 10
2 21 17.0 28.0
3 22.8 18.6 13.4
4 21.4 19.4 27.4
5 18.7 17.0 8.27
How to add new column and calculate recursive cum using dplyr and shift
To perform such calculation we can use accumulate
from purrr
or Reduce
in base R.
Since you are already using dplyr
we can use accumulate
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(y1 = purrr::accumulate(x[-n()], ~.x * 2 + .y, .init = 1))
# group x y y1
# <chr> <dbl> <dbl> <dbl>
#1 a 1 1 1
#2 a 2 3 3
#3 a 3 8 8
#4 a 4 19 19
#5 a 5 42 42
#6 b 6 1 1
#7 b 7 8 8
#8 b 8 23 23
Recursive sum over two variables using dplyr
Perhaps I would have done it in similar fashion like @27phi9. You may, however, do this without writing any function before hand. I am giving you three approaches (i) baseR, (ii) dplyr only, (iii) dplyr + purrr
df <- structure(list(a = c(0.5, 0.3, 1, 0.2, 0.4, 0.8), b = c(7L, 1L, 9L, 10L, 3L, 2L)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
transform(df, C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]},
seq(nrow(df)),
init = 0,
accumulate = TRUE)[-1]})
#> a b C
#> 1 0.5 7 3.5000
#> 2 0.3 1 3.4500
#> 3 1.0 9 23.9500
#> 4 0.2 10 11.5800
#> 5 0.4 3 28.9920
#> 6 0.8 2 82.7776
library(dplyr)
df %>%
mutate(C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]},
seq(nrow(df)),
init = 0,
accumulate = TRUE)[-1]})
#> # A tibble: 6 x 3
#> a b C
#> <dbl> <int> <dbl>
#> 1 0.5 7 3.5
#> 2 0.3 1 3.45
#> 3 1 9 24.0
#> 4 0.2 10 11.6
#> 5 0.4 3 29.0
#> 6 0.8 2 82.8
library(purrr)
df %>%
mutate(C = {x <- 0; unlist(accumulate2(cumsum(b), a, .init = 0, ~ {x <<- ..1 + x; (..2 + x) * ..3 }))[-1]})
#> # A tibble: 6 x 3
#> a b C
#> <dbl> <int> <dbl>
#> 1 0.5 7 3.5
#> 2 0.3 1 3.45
#> 3 1 9 24.0
#> 4 0.2 10 11.6
#> 5 0.4 3 29.0
#> 6 0.8 2 82.8
Recursive function with condition in dplyr
You just need to redefine the function that you use for accumulate
:
library(tidyverse)
set.seed(123)
dat <- tibble(x = sample(-10:10, size = 11,replace = T))
fn <- function(x, y) pmax(pmin(x + y, 10), 0)
dat %>%
mutate(y = accumulate(x, fn, .init = 2)[-1])
dat
# A tibble: 11 × 2
x y
<int> <dbl>
1 -4 0
2 6 6
3 -2 4
4 8 10
5 9 10
6 -10 0
7 1 1
8 8 9
9 1 10
10 -1 9
11 10 10
How it works: pmax
takes max of the two values, and pmin
takes min, so you wrap the sum of x+y
into upper and lower bounds to cap the result within the limits you need to.
Recursive filtering in R
As Ben suggested to look at this question, it does contain an answer. It did not work for me without any changes though, so I am posting slightly modified aichao's code here
library(rlang)
f <- function(d, ind = 1, minDiff = 7) {
ind.next <- first(which(difftime(d,d[ind], units="days") >= all_of(minDiff))
if (is_empty(ind.next))
return(ind)
else
return(c(ind, f(d,ind.next)))
}
result <- df %>%
group_by(id) %>%
slice(f(date)) %>%
ungroup()
Recursive function in R with ending point varying by group
I don't think what you are describing is really recursive, in that the calculations don't depend on the results of previous iterations. It is, however, fairly complex, and perhaps the best way to fit it into a dplyr
pipeline is to declare a function that takes the necessary variables and returns your answer.
Here is a function that does the trick. It uses the split-lapply-merge paradigm to force the calculations to work properly row-wise. It then uses an sapply to check whether, for each row, the logical conditions are met in any previous row in the group. If so, it overwrites an NA
in that rows p201
value with a non-NA value:
multi_condition <- function(id, v1, v2, v3, v4)
{
unlist(lapply(split(data.frame(v1, v2, v3, v4), id), function(x)
{
if(all(is.na(x$v1))) return(x$v1)
ss <- unlist(c(FALSE, sapply(seq_along(x$v2)[-1], function(i)
{
x$v2[i] %in% x$v2[1:(i - 1)] & any(abs(x$v3[i] - x$v3[1:(i - 1)]) <= x$v4[i])
})))
replace(x$v1, ss, x$v1[!is.na(x$v1)][1])
}))
}
So the function itself is complex, but its use is straightforward:
library(dplyr)
df %>%
group_by(id) %>%
mutate(p201 = multi_condition(id, p201, V2007, V2009, ager))
#> # A tibble: 11 x 5
#> # Groups: id [5]
#> id p201 V2009 ager V2007
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 <NA> 25 2.3 1
#> 2 1 <NA> 11 2 1
#> 3 1 001 63 8.1 1
#> 4 1 001 75 12.1 1
#> 5 2 <NA> 49 5.1 2
#> 6 2 <NA> 14 2 2
#> 7 3 001 32 2.9 1
#> 8 4 001 31 2.8 2
#> 9 5 001 3 2 1
#> 10 5 <NA> 10 2 1
#> 11 5 001 3 2 1
If you prefer a more dplyr
- type solution using group_map
, with the logic perhaps a little clearer, you could try:
multi_select <- function(df, ...)
{
rowwise_logic <- function(i)
{
if(i == 1) return(FALSE)
j <- 1:(i - 1)
df$V2007[i] %in% df$V2007[j] &
any(abs(df$V2009[i] - df$V2009[j]) <= df$ager[i])
}
matching_rows <- sapply(seq(nrow(df)), rowwise_logic)
df$p201[matching_rows] <- first(na.exclude(df$p201))
return(df)
}
Which would work like this:
df %>%
group_by(id) %>%
group_map(multi_select, .keep = TRUE) %>%
bind_rows()
Created on 2020-07-15 by the reprex package (v0.3.0)
Related Topics
How to Suppress R Startup Message
Change Thickness of a Marker in Ggplot2
How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows
Terminating an Apply-Based Function Early (Similar to Break)
How to Add Geo-Spatial Connections on a Ggplot Map
Error with New R 3.1.3 Version
Debugging Package::Function() Although Lazy Evaluation Is Used
R Produces "Unsupported Url Scheme" Error When Getting Data from Https Sites
Make a Boxplot Without Whiskers
Spread with Duplicate Identifiers for Rows
Error Trying to Read a PDF Using Readpdf from The Tm Package
R Script in Power Bi Returns Date as Microsoft.Oledb.Date
Removing "Nul" Characters (Within R)
Using Mutate Rowwise Over a Subset of Columns
Extract Only Folder Name Right Before Filename from Full Path