Vectorize a product calculation which depends on previous elements?
ifelse
is vectorized and there's a bit of a penalty if you're using it on one element at a time in a for-loop. In your example, you can get a pretty good speedup by using if
instead of ifelse
.
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
z <- c(1,1,0,0,0,0)
identical(fun1(z),fun2(z))
# [1] TRUE
system.time(replicate(10000, fun1(z)))
# user system elapsed
# 1.13 0.00 1.32
system.time(replicate(10000, fun2(z)))
# user system elapsed
# 0.27 0.00 0.26
You can get some additional speed gains out of fun2
by compiling it.
library(compiler)
cfun2 <- cmpfun(fun2)
system.time(replicate(10000, cfun2(z)))
# user system elapsed
# 0.11 0.00 0.11
So there's a 10x speedup without vectorization. As others have said (and some have illustrated) there are ways to vectorize your example, but that may not translate to your actual problem. Hopefully this is general enough to be applicable.
The filter
function may be useful to you as well if you can figure out how to express your problem in terms of a autoregressive or moving average process.
How to vectorize a function that depends on a previous calculation in R?
Assuming that you are exploring vectorization to speed up the calculations, here is another option to speed up the calculations using Rccp
:
library(Rcpp)
cppFunction("IntegerVector zoning(NumericVector idx) {
int zone = 1, n = idx.size();
IntegerVector res = IntegerVector(n);
double x0 = idx[0];
for (int i = 1; i < n; i++) {
res[i] = zone;
if (idx[i]/x0 < 0.98 || idx[i]/x0 > 1.05) {
if (i+1 < n) {
x0 = idx[i+1];
}
zone++;
}
}
return res;
}")
df[, zone := zoning(c(1, val))[-1L]]
output:
date ret val zone
1: 1 -0.0042645381 0.9957355 1
2: 2 0.0038364332 0.9995555 1
3: 3 -0.0063562861 0.9932021 1
4: 4 0.0179528080 1.0110328 1
5: 5 0.0052950777 1.0163863 1
6: 6 -0.0062046838 1.0100800 1
7: 7 0.0068742905 1.0170236 1
8: 8 0.0093832471 1.0265665 1
9: 9 0.0077578135 1.0345305 1
10: 10 -0.0010538839 1.0334402 1
11: 11 0.0171178117 1.0511304 1
12: 12 0.0058984324 1.0573304 2
13: 13 -0.0042124058 1.0528765 2
14: 14 -0.0201469989 1.0316642 2
15: 15 0.0132493092 1.0453331 3
16: 16 0.0015506639 1.0469540 3
17: 17 0.0018380974 1.0488784 3
18: 18 0.0114383621 1.0608759 3
19: 19 0.0102122120 1.0717098 3
20: 20 0.0079390132 1.0802181 3
21: 21 0.0111897737 1.0923055 3
22: 22 0.0098213630 1.1030334 3
23: 23 0.0027456498 1.1060620 4
24: 24 -0.0178935170 1.0862706 4
25: 25 0.0081982575 1.0951762 4
26: 26 0.0014387126 1.0967518 4
27: 27 0.0004420449 1.0972366 4
28: 28 -0.0127075238 1.0832934 4
29: 29 -0.0027815006 1.0802803 5
30: 30 0.0061794156 1.0869558 5
31: 31 0.0155867955 1.1038979 5
32: 32 0.0009721227 1.1049710 5
33: 33 0.0058767161 1.1114646 5
34: 34 0.0014619496 1.1130896 5
35: 35 -0.0117705956 1.0999878 5
36: 36 -0.0021499456 1.0976229 5
37: 37 -0.0019428995 1.0954903 5
38: 38 0.0014068660 1.0970316 5
39: 39 0.0130002537 1.1112932 5
40: 40 0.0096317575 1.1219969 5
41: 41 0.0003547640 1.1223950 5
42: 42 -0.0005336168 1.1217961 5
43: 43 0.0089696338 1.1318582 5
44: 44 0.0075666320 1.1404225 5
45: 45 -0.0048875569 1.1348486 6
46: 46 -0.0050749516 1.1290893 6
47: 47 0.0056458196 1.1354640 6
48: 48 0.0096853292 1.1464613 6
49: 49 0.0008765379 1.1474662 6
50: 50 0.0108110773 1.1598716 6
date ret val zone
Courtesy of https://rdrr.io/snippets/
Vectorize numpy code with operation depending on previous value
Offload the computation of probabilities as soon as possible:
possible_paths = np.vstack(
np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
for curr_state in range(n_states)
)
Then you can simply do a lookup to follow your path:
path_trace = [None]*n_frames
for step in range(n_frames):
path_trace[step] = possible_paths[current_state, step]
current_state = possible_paths[current_state, step]
Once you have your path, you can compute your trace:
sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)
Comparing timings:
Pure python for
loop
%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
Results:
30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Vectorized lookup:
%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
for curr_state in range(n_states)
)
for step in range(n_frames):
path_trace[step] = possible_paths[current_state, step]
current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)
Results:
641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
A speedup of approximately 50x.
How can I apply a vectorized function to the previous element of a numpy array?
As I wrote in an answer to basically the same question, you can't:
There is no other way (in general) except for an explicit
for
loop.
This is because there is no way to parallelize this task across the
rows (since every row depends on some other row).What makes this even harder is that you can easily generate chaotic
behavior, for example with the seemingly innocent looking
logistic map:x_{n+1} = r * x_n * (1 - x_{n-1})
.You can only find a way around this if you manage to find a closed
form, essentially eliminating the recurrence relation. But this has to
be done for each recurrence relation and I am pretty sure you are not
even guaranteed that a closed form exists...
How can I vectorize access to neighbour vector elements in R?
This sort of thing is probably as vectorized as you're going to get:
v[unlist(sapply(which(v),function(x) {x + c(-2,-1,1,2)},simplify = FALSE))] <- TRUE
> v
[1] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
But note that you haven't specified what should happen in the TRUE elements are near the ends of your vector. That would require more work. Nor do you specify what happens if there are two TRUE elements that are closer than two positions from each other.
Alternatively:
v[outer(which(v),c(-2,-1,1,2),"+")] <- TRUE
> v
[1] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
At a basic level, we're doing the same thing here, but the second option is certainly more compact, although possibly harder to understand.
Perform an operation on a vector using the previous value after an initial value
It seems like you're looking for a way to do recursive calculations in R. Base R has two ways of doing this which differ by the form of the function used to do the recursion. Both methods could be used for your example.
Reduce
can be used with recursion equations of the form v[i+1] = function(v[i], x[i])
where v
is the calculated vector and x
an input vector; i.e. where the i+1
output depends only the i-th values of the calculated and input vectors and the calculation performed by function(v, x)
may be nonlinear. For you case, this would be
value <- 100
nout <- 10
# v[i+1] = function(v[i], x[i])
v <- Reduce(function(v, x) .9*v + 9, x=numeric(nout), init=value, accumulate=TRUE)
cbind(step = 0:nout, v)
filter
is used with recursion equations of the form y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
where y
is the calculated vector and x
an input vector; i.e. where the output can depend linearly upon lagged values of the calculated vector as well as the i-th
value of the input vector. For your case, this would be:
value <- 100
nout <- 10
# y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
y <- c(value, stats::filter(x=rep(9, nout), filter=.9, method="recursive", sides=1, init=value))
cbind(step = 0:nout, y)
For both functions, the length of the output is given by the length of the input vector x
.
Both of these approaches give your result.
Related Topics
Using R to "Click" a Download File Button on a Webpage
Arrange N Ggplots into Lower Triangle Matrix Shape
Data.Table Inner/Outer Join with Na in Join Column of Type Double Bug
Installation of Rodbc on Os X Yosemite
Using Trycatch and Rvest to Deal with 404 and Other Crawling Errors
R Install Package Loaded Namespace
How to Do Conditional Grouping of Data in R
Convert Comma Separated String to Integer in R
R: How to Draw a Line with Multiple Arrows in It
Colorize Parts of the Title in a Plot
Programmatically Insert Text, Headers and Lists with R Markdown
Dependency 'Slam' Is Not Available When Installing Tm Package
Setting the Color for an Individual Data Point
Asterisk (*) VS. Colon (:) in R Formulas
Use Ls() or Objects() to Get Objects of Class Data.Frame
How to Add an Inset (Subplot) to "Topright" of an R Plot
How to Compute Roc and Auc Under Roc After Training Using Caret in R