Vectorize a Product Calculation Which Depends on Previous Elements

Vectorize a product calculation which depends on previous elements?

ifelse is vectorized and there's a bit of a penalty if you're using it on one element at a time in a for-loop. In your example, you can get a pretty good speedup by using if instead of ifelse.

fun1 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- ifelse(z[i-1]==1, 1, 0)
  }
  z
}

fun2 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- if(z[i-1]==1) 1 else 0
  }
  z
}

z <- c(1,1,0,0,0,0)
identical(fun1(z),fun2(z))
# [1] TRUE
system.time(replicate(10000, fun1(z)))
#   user  system elapsed 
#   1.13    0.00    1.32
system.time(replicate(10000, fun2(z)))
#   user  system elapsed 
#   0.27    0.00    0.26

You can get some additional speed gains out of fun2 by compiling it.

library(compiler)
cfun2 <- cmpfun(fun2)
system.time(replicate(10000, cfun2(z)))
#   user  system elapsed 
#   0.11    0.00    0.11

So there's a 10x speedup without vectorization. As others have said (and some have illustrated) there are ways to vectorize your example, but that may not translate to your actual problem. Hopefully this is general enough to be applicable.

The filter function may be useful to you as well if you can figure out how to express your problem in terms of a autoregressive or moving average process.

How to vectorize a function that depends on a previous calculation in R?

Assuming that you are exploring vectorization to speed up the calculations, here is another option to speed up the calculations using Rccp:

library(Rcpp)
cppFunction("IntegerVector zoning(NumericVector idx) {
    int zone = 1, n = idx.size();
    IntegerVector res = IntegerVector(n);
    double x0 = idx[0];

    for (int i = 1; i < n; i++) {
        res[i] = zone;
        if (idx[i]/x0 < 0.98 || idx[i]/x0 > 1.05) {
            if (i+1 < n) {
                x0 = idx[i+1];
            }
            zone++;
        }
    }

    return res;
}")

df[, zone := zoning(c(1, val))[-1L]]

output:

    date           ret       val zone
 1:    1 -0.0042645381 0.9957355    1
 2:    2  0.0038364332 0.9995555    1
 3:    3 -0.0063562861 0.9932021    1
 4:    4  0.0179528080 1.0110328    1
 5:    5  0.0052950777 1.0163863    1
 6:    6 -0.0062046838 1.0100800    1
 7:    7  0.0068742905 1.0170236    1
 8:    8  0.0093832471 1.0265665    1
 9:    9  0.0077578135 1.0345305    1
10:   10 -0.0010538839 1.0334402    1
11:   11  0.0171178117 1.0511304    1
12:   12  0.0058984324 1.0573304    2
13:   13 -0.0042124058 1.0528765    2
14:   14 -0.0201469989 1.0316642    2
15:   15  0.0132493092 1.0453331    3
16:   16  0.0015506639 1.0469540    3
17:   17  0.0018380974 1.0488784    3
18:   18  0.0114383621 1.0608759    3
19:   19  0.0102122120 1.0717098    3
20:   20  0.0079390132 1.0802181    3
21:   21  0.0111897737 1.0923055    3
22:   22  0.0098213630 1.1030334    3
23:   23  0.0027456498 1.1060620    4
24:   24 -0.0178935170 1.0862706    4
25:   25  0.0081982575 1.0951762    4
26:   26  0.0014387126 1.0967518    4
27:   27  0.0004420449 1.0972366    4
28:   28 -0.0127075238 1.0832934    4
29:   29 -0.0027815006 1.0802803    5
30:   30  0.0061794156 1.0869558    5
31:   31  0.0155867955 1.1038979    5
32:   32  0.0009721227 1.1049710    5
33:   33  0.0058767161 1.1114646    5
34:   34  0.0014619496 1.1130896    5
35:   35 -0.0117705956 1.0999878    5
36:   36 -0.0021499456 1.0976229    5
37:   37 -0.0019428995 1.0954903    5
38:   38  0.0014068660 1.0970316    5
39:   39  0.0130002537 1.1112932    5
40:   40  0.0096317575 1.1219969    5
41:   41  0.0003547640 1.1223950    5
42:   42 -0.0005336168 1.1217961    5
43:   43  0.0089696338 1.1318582    5
44:   44  0.0075666320 1.1404225    5
45:   45 -0.0048875569 1.1348486    6
46:   46 -0.0050749516 1.1290893    6
47:   47  0.0056458196 1.1354640    6
48:   48  0.0096853292 1.1464613    6
49:   49  0.0008765379 1.1474662    6
50:   50  0.0108110773 1.1598716    6
    date           ret       val zone

Courtesy of https://rdrr.io/snippets/

Vectorize numpy code with operation depending on previous value

Offload the computation of probabilities as soon as possible:

possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)

Then you can simply do a lookup to follow your path:

path_trace = [None]*n_frames
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]

Once you have your path, you can compute your trace:

sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Comparing timings:

Pure python for loop

%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

Results:

30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Vectorized lookup:

%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Results:

641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A speedup of approximately 50x.

How can I apply a vectorized function to the previous element of a numpy array?

As I wrote in an answer to basically the same question, you can't:

There is no other way (in general) except for an explicit for loop.
This is because there is no way to parallelize this task across the
rows (since every row depends on some other row).

What makes this even harder is that you can easily generate chaotic
behavior, for example with the seemingly innocent looking
logistic map: x_{n+1} = r * x_n * (1 - x_{n-1}).

You can only find a way around this if you manage to find a closed
form, essentially eliminating the recurrence relation. But this has to
be done for each recurrence relation and I am pretty sure you are not
even guaranteed that a closed form exists...

How can I vectorize access to neighbour vector elements in R?

This sort of thing is probably as vectorized as you're going to get:

v[unlist(sapply(which(v),function(x) {x + c(-2,-1,1,2)},simplify = FALSE))] <- TRUE
> v
 [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

But note that you haven't specified what should happen in the TRUE elements are near the ends of your vector. That would require more work. Nor do you specify what happens if there are two TRUE elements that are closer than two positions from each other.

Alternatively:

v[outer(which(v),c(-2,-1,1,2),"+")] <- TRUE
> v
 [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

At a basic level, we're doing the same thing here, but the second option is certainly more compact, although possibly harder to understand.

Perform an operation on a vector using the previous value after an initial value

It seems like you're looking for a way to do recursive calculations in R. Base R has two ways of doing this which differ by the form of the function used to do the recursion. Both methods could be used for your example.

Reduce can be used with recursion equations of the form v[i+1] = function(v[i], x[i]) where v is the calculated vector and x an input vector; i.e. where the i+1 output depends only the i-th values of the calculated and input vectors and the calculation performed by function(v, x) may be nonlinear. For you case, this would be

    value <- 100
    nout <- 10
# v[i+1]  =  function(v[i], x[i])
    v <- Reduce(function(v, x) .9*v  + 9, x=numeric(nout),  init=value, accumulate=TRUE)
    cbind(step = 0:nout, v)

filter is used with recursion equations of the form y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p] where y is the calculated vector and x an input vector; i.e. where the output can depend linearly upon lagged values of the calculated vector as well as the i-th value of the input vector. For your case, this would be:

    value <- 100
     nout <- 10
# y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
        y <- c(value, stats::filter(x=rep(9, nout), filter=.9, method="recursive", sides=1, init=value))
     cbind(step = 0:nout, y)

For both functions, the length of the output is given by the length of the input vector x.

Both of these approaches give your result.

Vectorize a Product Calculation Which Depends on Previous Elements