Efficient Calculation of Matrix Cumulative Standard Deviation in R

Efficient calculation of matrix cumulative standard deviation in r

You could use cumsum to compute necessary sums from direct formulas for variance/sd to vectorized operations on matrix:

cumsd_mod <- function(mat) {
cum_var <- function(x) {
ind_na <- !is.na(x)
nn <- cumsum(ind_na)
x[!ind_na] <- 0
cumsum(x^2) / (nn-1) - (cumsum(x))^2/(nn-1)/nn
}
v <- sqrt(apply(mat,2,cum_var))
v[is.na(mat) | is.infinite(v)] <- NA
v
}

just for comparison:

set.seed(2765374)
X <- matrix(rnorm(1000),100,10)
X[cbind(1:10,1:10)] <- NA # to have some NA's

all.equal(cumsd(X),cumsd_mod(X))
# [1] TRUE

And about timing:

X <- matrix(rnorm(100000),1000,100)
system.time(cumsd(X))
# user system elapsed
# 7.94 0.00 7.97
system.time(cumsd_mod(X))
# user system elapsed
# 0.03 0.00 0.03

Calculate cumulative standard deviation

Use TTR::runSD with cumulative=TRUE.

library(TTR)
x <- xts(test.df[,2],test.df[,1])
runSD(x, n=1, cumulative=TRUE)

Surprisingly Slow Standard Deviation in R

You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.

n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)

See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.

Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r

Efficient calculation of var-covar matrix in R

@F. Privé's Rcpp implementation is a good starting place, but we can do better. You will notice in the main algorithm supplied by the OP that there are many replicated fairly expensive calculations. Observe:

OPalgo <- function(m, p, ind1, n) {
vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
for (i in 0L:n) {
for (j in i:n) {
## lower and upper range for the first & second multiplicand
print(paste(c((1L + (j - i)),":",(periods - i),"
",1L,":",(periods - j)), collapse = ""))

vcov[j + 1L, i + 1L] <-
sum(mat[, (1L + (j - i)):(periods - i)] *
mat[, 1L:(periods - j)]) /
(ind * (periods - j) - 1)
}
}
vcov
}

OPalgo(mat, periods, ind, n_lags)
[1] "1:70 1:70" ## contains "1:65 1:65"
[1] "2:70 1:69"
[1] "3:70 1:68"
[1] "4:70 1:67"
[1] "5:70 1:66"
[1] "6:70 1:65"
[1] "1:69 1:69" ## contains "1:65 1:65"
[1] "2:69 1:68"
[1] "3:69 1:67"
[1] "4:69 1:66"
[1] "5:69 1:65"
[1] "1:68 1:68" ## contains "1:65 1:65"
[1] "2:68 1:67"
[1] "3:68 1:66"
[1] "4:68 1:65"
[1] "1:67 1:67" ## contains "1:65 1:65"
[1] "2:67 1:66"
[1] "3:67 1:65"
[1] "1:66 1:66" ## contains "1:65 1:65"
[1] "2:66 1:65"
[1] "1:65 1:65"

As you can see, the product mat[,1:65] * mat[,1:65] is performed 6 times above. The only difference between the first occurrence and the last occurrence is that the first occurrence has an additional 5 columns. So instead of computing:

sum(mat[ , 1:70] * mat[ , 1:70])
sum(mat[ , 1:69] * mat[ , 1:69])
sum(mat[ , 1:68] * mat[ , 1:68])
sum(mat[ , 1:67] * mat[ , 1:67])
sum(mat[ , 1:66] * mat[ , 1:66])
sum(mat[ , 1:65] * mat[ , 1:65])

We can compute preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65]) one time and use this in the other 5 calculations like so:

preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])
preCalc[1] + sum(mat[ , 66:69] * mat[ , 66:69])
preCalc[1] + sum(mat[ , 66:68] * mat[ , 66:68])
preCalc[1] + sum(mat[ , 66:67] * mat[ , 66:67])
preCalc[1] + sum(mat[ , 66:66] * mat[ , 66:66])

In each of the above, we have reduce the number of multiplications by 90000 * 65 = 5,850,000 and the number of additions by 5,850,000 - 1 = 5,849,999 for a total of 11,699,999 arithmetic operations saved. The function below achieves this very thing.

fasterAlgo <- function(m, p, ind1, n) {
vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
preCals <- vapply(1:(n + 1L), function(x) sum(m[ , x:(p - n + x - 2L)] *
m[ , 1L:(p - n - 1L)]), 42.42)
for (i in 0L:n) {
for (j in i:n) {
myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])
vcov[j + 1L, i + 1L] <- myNum / (ind * (p - j) - 1)
}
}
vcov
}

## outputs same results
all.equal(OPalgo(mat, periods, ind, n_lags), fasterAlgo(mat, periods, ind, n_lags))
[1] TRUE

Benchmarks:

## I commented out the print statements of the OPalgo before benchmarking
library(microbenchmark)
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
fasterBase = fasterAlgo(mat, periods, ind, n_lags),
RcppOrig = compute_vcov(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356 5 c
fasterBase 863.3897 863.9681 865.5576 865.593 866.7962 868.0409 5 b
RcppOrig 160.1040 161.8922 162.0153 162.235 162.4756 163.3697 5 a

As you can see, with this modification we see at least a 3 fold improvement but the Rcpp is still much faster. Let's implement the above concept in Rcpp.

// [[Rcpp::export]]
NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {

NumericMatrix vcov(n_lags + 1, n_lags + 1);
std::vector<double> preCalcs;
preCalcs.reserve(n_lags + 1);
double myCov;

int i, j, k1, k2, l;
int n = mat.nrow();
int m = mat.ncol();

for (i = 0; i <= n_lags; i++) {
myCov = 0;
for (k1 = i, k2 = 0; k2 < (m - n_lags - 1); k1++, k2++) {
for (l = 0; l < n; l++) {
myCov += mat(l, k1) * mat(l, k2);
}
}
preCalcs.push_back(myCov);
}

for (i = 0; i <= n_lags; i++) {
for (j = i; j <= n_lags; j++) {
myCov = preCalcs[j - i];
for (k1 = m - n_lags + j - i - 1, k2 = m - n_lags - 1; k2 < (m - j); k1++, k2++) {
for (l = 0; l < n; l++) {
myCov += mat(l, k1) * mat(l, k2);
}
}
myCov /= n * (m - j) - 1;
vcov(i, j) = vcov(j, i) = myCov;
}
}

return vcov;
}

## gives same results
all.equal(compute_vcov2(mat, n_lags), compute_vcov(mat, n_lags))
[1] TRUE

New benchmarks:

microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
fasterBase = fasterAlgo(mat, periods, ind, n_lags),
RcppOrig = compute_vcov(mat, n_lags),
RcppModified = compute_vcov2(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073 5 d
fasterBase 866.5601 868.25555 888.64418 869.31796 870.92308 968.16417 5 c
RcppOrig 160.3467 161.37992 162.74899 161.73009 164.38653 165.90174 5 b
RcppModified 51.1641 51.67149 52.87447 52.56067 53.06273 55.91334 5 a

Now the enhanced Rcpp solution is around 3x faster the original Rcpp solution and around 50x faster than the original algorithm provided by the OP.

Update

We can do even better. We can reverse the ranges of the indices i/j so as to continuously update preCalcs. This allows up to only compute the product of one new column every iteration. This really comes into play as n_lags increases. Observe:

// [[Rcpp::export]]
NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {

NumericMatrix vcov(n_lags + 1, n_lags + 1);
std::vector<double> preCalcs;
preCalcs.reserve(n_lags + 1);

int i, j, k1, k2, l;
int n = mat.nrow();
int m = mat.ncol();

for (i = 0; i <= n_lags; i++) {
preCalcs.push_back(0);
for (k1 = i, k2 = 0; k2 < (m - n_lags); k1++, k2++) {
for (l = 0; l < n; l++) {
preCalcs[i] += mat(l, k1) * mat(l, k2);
}
}
}

for (i = n_lags; i >= 0; i--) { ## reverse range
for (j = n_lags; j >= i; j--) { ## reverse range
vcov(i, j) = vcov(j, i) = preCalcs[j - i] / (n * (m - j) - 1);
if (i > 0 && i > 0) {
for (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {
for (l = 0; l < n; l++) {
## updating preCalcs vector
preCalcs[j - i] += mat(l, k1) * mat(l, k2);
}
}
}
}
}

return vcov;
}

all.equal(compute_vcov(mat, n_lags), compute_vcov3(mat, n_lags))
[1] TRUE

Rcpp benchmarks only:

n_lags <- 50L
microbenchmark(RcppOrig = compute_vcov(mat, n_lags),
RcppModified = compute_vcov2(mat, n_lags),
RcppExtreme = compute_vcov3(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
RcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446 5 c
RcppModified 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202 5 b
RcppExtreme 324.8252 330.7381 332.9657 333.5919 335.168 340.5054 5 a

The newest implementation is now over 20x faster than the original Rcpp version and well over 300x faster than the original algorithm when n-lags is large.

R: calculating population standard deviation with NA

We can use na.rm=TRUE in the mean and sum to account for the NA elements.

pop.sd<-function(x){sqrt(sum((x-mean(x, na.rm=TRUE))^2, 
na.rm=TRUE)/sum(!is.na(x)))}
apply(mf2, 1, pop.sd)
#[1] 25.152866 13.500000 7.586538 31.443070 0.000000 32.967998

This should also give the same result for 'mf1'

apply(mf1,1,pop.sd)
#[1] 25.152866 12.498889 7.586538 31.443070 22.156012 32.967998

Instead of looping over the rows, we can also us the vectorized rowSums and rowMeans

sqrt(rowSums((mf1-rowMeans(mf1, na.rm=TRUE))^2, na.rm=TRUE)/ncol(mf1))
#[1] 25.152866 12.498889 7.586538 31.443070 22.156012 32.967998

sqrt(rowSums((mf2-rowMeans(mf2, na.rm=TRUE))^2, na.rm=TRUE)/ncol(mf2))
#[1] 25.152866 11.022704 7.586538 31.443070 0.000000 32.967998

R: using apply.fromstart to calculate returns and standard deviation

Use the very efficient vectorized base R function cumprod for your first desired result. While the second result could be achieved (less efficiently) using a simple *apply loop

If you want to keep zoo class, do

cumprod(hourlyData$Position)
# 1 2 3 4 5 6
# 1.0000000 0.9929392 0.9224669 0.9754125 1.0673348 1.1547867

Otherwise

cumprod(as.numeric(hourlyData$Position))
## [1] 1.0000000 0.9929392 0.9224669 0.9754125 1.0673348 1.1547867

For sd (as proposed by @akrun) (used vapply instead of sapply in order to "squeeze" maximum performance out of it)

vapply(seq_len(nrow(hourlyData)), function(i) sd(hourlyData$Position[1:i]), FUN.VALUE = double(1))
# [1] NA 0.004992723 0.039097989 0.052519398 0.063598345 0.063156702

Cumulative sum over matrix diagonals

From the example it seems that every diagonal is all zeros or else is a sequence of ones followed by zeros. We assume that that is always the case.

First form a function cum which takes a diagonal x and outputs a vector of zeros the same length except that position sum(x) is to be set to sum(x) .

Then apply that function across diagonals using ave. row(m1)-col(m1) is constant on diagonals and can be used for grouping.

cum <- function(x, s = sum(x)) replace(0 * x, s, s)
ave(m1, row(m1) - col(m1), FUN = cum)

## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 0 0 0
## [3,] 0 2 0 0 3
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0

If the sequence of ones on a diaongal need not start at the beginning of the diagonal but it is still true that there is only one sequence of ones at most on each diagonal then use this in place of cum above:

cum <- function(x, s = sum(x)) replace(0 * x, s + which.max(x) - 1, s)

If there can be more than one sequence of ones on a diagonal then use this in place of cum above:

library(data.table)
cum <- function(x) {
ave(x, rleid(x), FUN = function(x, s = sum(x)) replace(0 * x, s, s))
}


Related Topics



Leave a reply



Submit