Recursive Function Using Dplyr

Recursive function using dplyr

We could use accumulate from purrr. With accumulate, do the recursive sum of 'x' elements while initiating with a value of 5 (.init = 5) and remove the first element of accumulate output ([-1])

library(purrr)
library(dplyr)
dat %>%
mutate(y = accumulate(x, ~ .x + .y, .init = 5)[-1])
# A tibble: 11 x 3
# day x y
# <int> <int> <dbl>
# 1 200 4 9.00
# 2 201 3 12.0
# 3 202 -4 8.00
# 4 203 -7 1.00
# 5 204 -3 - 2.00
# 6 205 1 - 1.00
# 7 206 -5 - 6.00
# 8 207 -1 - 7.00
# 9 208 -4 -11.0
#10 209 -2 -13.0
#11 210 4 - 9.00

A similar approach in base R would be

dat$y <- Reduce(function(u, v)  u + v , dat$x, init = 5, accumulate = TRUE)[-1]
dat$y
#[1] 9 12 8 1 -2 -1 -6 -7 -11 -13 -9

Application of a recursive function within a dplyr context in R

I think you can probably get what you need here with a mix of tidyr::fill to fill NA values from above, combined with cumprod to get the cumulative effect of multiplying by the coefficient, and ifelse to choose when to use it. There's also a "working" column named V which is created and destroyed in the process.

library(dplyr)

df %>%
mutate(V = tidyr::fill(df, VALUE)$VALUE) %>%
group_by(ID) %>%
mutate(VALUE = ifelse(is.na(VALUE),
V * cumprod(ifelse(is.na(VALUE), COEFF, 1)),
VALUE)) %>% select(-V)
#> # A tibble: 10 x 3
#> # Groups: ID [2]
#> ID VALUE COEFF
#> <fct> <dbl> <dbl>
#> 1 a 1 1
#> 2 a 3 2
#> 3 a 3 1
#> 4 a 1.5 0.5
#> 5 a 150 100
#> 6 b 2 1
#> 7 b 2 1
#> 8 b 3 1
#> 9 b 3 1
#> 10 b 3 1

Created on 2020-06-30 by the reprex package (v0.3.0)

Recursion with dplyr

You can use purrr::accumulate() (or base::Reduce() if you prefer):

library(dplyr)
library(purrr)

mtcars %>%
as_tibble() %>%
select(mpg, qsec) %>%
head(5) %>%
mutate(new_col = accumulate(tail(mpg + qsec, -1), .f = ~ .y - .x, .init = 10))

# A tibble: 5 × 3
mpg qsec new_col
<dbl> <dbl> <dbl>
1 21 16.5 10
2 21 17.0 28.0
3 22.8 18.6 13.4
4 21.4 19.4 27.4
5 18.7 17.0 8.27

How to add new column and calculate recursive cum using dplyr and shift

To perform such calculation we can use accumulate from purrr or Reduce in base R.

Since you are already using dplyr we can use accumulate :

library(dplyr)

df %>%
group_by(group) %>%
mutate(y1 = purrr::accumulate(x[-n()], ~.x * 2 + .y, .init = 1))

# group x y y1
# <chr> <dbl> <dbl> <dbl>
#1 a 1 1 1
#2 a 2 3 3
#3 a 3 8 8
#4 a 4 19 19
#5 a 5 42 42
#6 b 6 1 1
#7 b 7 8 8
#8 b 8 23 23

Recursive sum over two variables using dplyr

Perhaps I would have done it in similar fashion like @27phi9. You may, however, do this without writing any function before hand. I am giving you three approaches (i) baseR, (ii) dplyr only, (iii) dplyr + purrr

df <- structure(list(a = c(0.5, 0.3, 1, 0.2, 0.4, 0.8), b = c(7L, 1L,  9L, 10L, 3L, 2L)), row.names = c(NA, -6L), class = c("tbl_df",  "tbl", "data.frame"))

transform(df, C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]},
seq(nrow(df)),
init = 0,
accumulate = TRUE)[-1]})
#> a b C
#> 1 0.5 7 3.5000
#> 2 0.3 1 3.4500
#> 3 1.0 9 23.9500
#> 4 0.2 10 11.5800
#> 5 0.4 3 28.9920
#> 6 0.8 2 82.7776


library(dplyr)

df %>%
mutate(C = {x <- 0; Reduce(function(.x, .y){x <<- .x + x; (cumsum(b)[[.y]] + x) * a[[.y]]},
seq(nrow(df)),
init = 0,
accumulate = TRUE)[-1]})
#> # A tibble: 6 x 3
#> a b C
#> <dbl> <int> <dbl>
#> 1 0.5 7 3.5
#> 2 0.3 1 3.45
#> 3 1 9 24.0
#> 4 0.2 10 11.6
#> 5 0.4 3 29.0
#> 6 0.8 2 82.8


library(purrr)
df %>%
mutate(C = {x <- 0; unlist(accumulate2(cumsum(b), a, .init = 0, ~ {x <<- ..1 + x; (..2 + x) * ..3 }))[-1]})
#> # A tibble: 6 x 3
#> a b C
#> <dbl> <int> <dbl>
#> 1 0.5 7 3.5
#> 2 0.3 1 3.45
#> 3 1 9 24.0
#> 4 0.2 10 11.6
#> 5 0.4 3 29.0
#> 6 0.8 2 82.8

Recursive function with condition in dplyr

You just need to redefine the function that you use for accumulate:

library(tidyverse)
set.seed(123)
dat <- tibble(x = sample(-10:10, size = 11,replace = T))
fn <- function(x, y) pmax(pmin(x + y, 10), 0)
dat %>%
mutate(y = accumulate(x, fn, .init = 2)[-1])
dat
# A tibble: 11 × 2
x y
<int> <dbl>
1 -4 0
2 6 6
3 -2 4
4 8 10
5 9 10
6 -10 0
7 1 1
8 8 9
9 1 10
10 -1 9
11 10 10

How it works: pmax takes max of the two values, and pmin takes min, so you wrap the sum of x+y into upper and lower bounds to cap the result within the limits you need to.

Recursive filtering in R

As Ben suggested to look at this question, it does contain an answer. It did not work for me without any changes though, so I am posting slightly modified aichao's code here

library(rlang)
f <- function(d, ind = 1, minDiff = 7) {
ind.next <- first(which(difftime(d,d[ind], units="days") >= all_of(minDiff))
if (is_empty(ind.next))
return(ind)
else
return(c(ind, f(d,ind.next)))
}

result <- df %>%
group_by(id) %>%
slice(f(date)) %>%
ungroup()

Recursive function in R with ending point varying by group

I don't think what you are describing is really recursive, in that the calculations don't depend on the results of previous iterations. It is, however, fairly complex, and perhaps the best way to fit it into a dplyr pipeline is to declare a function that takes the necessary variables and returns your answer.

Here is a function that does the trick. It uses the split-lapply-merge paradigm to force the calculations to work properly row-wise. It then uses an sapply to check whether, for each row, the logical conditions are met in any previous row in the group. If so, it overwrites an NA in that rows p201 value with a non-NA value:

multi_condition <- function(id, v1, v2, v3, v4)
{
unlist(lapply(split(data.frame(v1, v2, v3, v4), id), function(x)
{
if(all(is.na(x$v1))) return(x$v1)

ss <- unlist(c(FALSE, sapply(seq_along(x$v2)[-1], function(i)
{
x$v2[i] %in% x$v2[1:(i - 1)] & any(abs(x$v3[i] - x$v3[1:(i - 1)]) <= x$v4[i])
})))
replace(x$v1, ss, x$v1[!is.na(x$v1)][1])
}))
}

So the function itself is complex, but its use is straightforward:

library(dplyr)

df %>%
group_by(id) %>%
mutate(p201 = multi_condition(id, p201, V2007, V2009, ager))
#> # A tibble: 11 x 5
#> # Groups: id [5]
#> id p201 V2009 ager V2007
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 <NA> 25 2.3 1
#> 2 1 <NA> 11 2 1
#> 3 1 001 63 8.1 1
#> 4 1 001 75 12.1 1
#> 5 2 <NA> 49 5.1 2
#> 6 2 <NA> 14 2 2
#> 7 3 001 32 2.9 1
#> 8 4 001 31 2.8 2
#> 9 5 001 3 2 1
#> 10 5 <NA> 10 2 1
#> 11 5 001 3 2 1

If you prefer a more dplyr - type solution using group_map, with the logic perhaps a little clearer, you could try:

multi_select <- function(df, ...) 
{
rowwise_logic <- function(i)
{
if(i == 1) return(FALSE)
j <- 1:(i - 1)
df$V2007[i] %in% df$V2007[j] &
any(abs(df$V2009[i] - df$V2009[j]) <= df$ager[i])
}

matching_rows <- sapply(seq(nrow(df)), rowwise_logic)
df$p201[matching_rows] <- first(na.exclude(df$p201))

return(df)
}

Which would work like this:

df %>% 
group_by(id) %>%
group_map(multi_select, .keep = TRUE) %>%
bind_rows()

Created on 2020-07-15 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit