Efficient Calculation of Matrix Cumulative Standard Deviation in R

Efficient calculation of matrix cumulative standard deviation in r

You could use cumsum to compute necessary sums from direct formulas for variance/sd to vectorized operations on matrix:

cumsd_mod <- function(mat) {
    cum_var <- function(x) {
        ind_na <- !is.na(x)
        nn <- cumsum(ind_na)
        x[!ind_na] <- 0
        cumsum(x^2) / (nn-1) - (cumsum(x))^2/(nn-1)/nn
    }
    v <- sqrt(apply(mat,2,cum_var))
    v[is.na(mat) | is.infinite(v)] <- NA
    v
}

just for comparison:

set.seed(2765374)
X <- matrix(rnorm(1000),100,10)
X[cbind(1:10,1:10)] <- NA # to have some NA's

all.equal(cumsd(X),cumsd_mod(X))
# [1] TRUE

And about timing:

X <- matrix(rnorm(100000),1000,100)
system.time(cumsd(X))
# user  system elapsed 
# 7.94    0.00    7.97 
system.time(cumsd_mod(X))
# user  system elapsed 
# 0.03    0.00    0.03

Calculate cumulative standard deviation

Use TTR::runSD with cumulative=TRUE.

library(TTR)
x <- xts(test.df[,2],test.df[,1])
runSD(x, n=1, cumulative=TRUE)

Surprisingly Slow Standard Deviation in R

You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.

n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)

See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.

Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r

Efficient calculation of var-covar matrix in R

@F. Privé's Rcpp implementation is a good starting place, but we can do better. You will notice in the main algorithm supplied by the OP that there are many replicated fairly expensive calculations. Observe:

OPalgo <- function(m, p, ind1, n) {
    vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
    for (i in 0L:n) {
        for (j in i:n) {
            ## lower and upper range for the first & second multiplicand
            print(paste(c((1L + (j - i)),":",(periods - i)," 
                          ",1L,":",(periods - j)), collapse = ""))

            vcov[j + 1L, i + 1L] <- 
                sum(mat[, (1L + (j - i)):(periods - i)] *
                                mat[, 1L:(periods - j)]) /
                                    (ind * (periods - j) - 1)
        }
    }
    vcov
}

OPalgo(mat, periods, ind, n_lags)
[1] "1:70 1:70"  ## contains "1:65 1:65"
[1] "2:70 1:69"
[1] "3:70 1:68"
[1] "4:70 1:67"
[1] "5:70 1:66"
[1] "6:70 1:65"
[1] "1:69 1:69"  ## contains "1:65 1:65"
[1] "2:69 1:68"
[1] "3:69 1:67"
[1] "4:69 1:66"
[1] "5:69 1:65"
[1] "1:68 1:68"  ## contains "1:65 1:65"
[1] "2:68 1:67"
[1] "3:68 1:66"
[1] "4:68 1:65"
[1] "1:67 1:67"  ## contains "1:65 1:65"
[1] "2:67 1:66"
[1] "3:67 1:65"
[1] "1:66 1:66"  ## contains "1:65 1:65"
[1] "2:66 1:65"
[1] "1:65 1:65"

As you can see, the product mat[,1:65] * mat[,1:65] is performed 6 times above. The only difference between the first occurrence and the last occurrence is that the first occurrence has an additional 5 columns. So instead of computing:

sum(mat[ , 1:70] * mat[ , 1:70])
sum(mat[ , 1:69] * mat[ , 1:69])
sum(mat[ , 1:68] * mat[ , 1:68])
sum(mat[ , 1:67] * mat[ , 1:67])
sum(mat[ , 1:66] * mat[ , 1:66])
sum(mat[ , 1:65] * mat[ , 1:65])

We can compute preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65]) one time and use this in the other 5 calculations like so:

preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])
preCalc[1] + sum(mat[ , 66:69] * mat[ , 66:69])
preCalc[1] + sum(mat[ , 66:68] * mat[ , 66:68])
preCalc[1] + sum(mat[ , 66:67] * mat[ , 66:67])
preCalc[1] + sum(mat[ , 66:66] * mat[ , 66:66])

In each of the above, we have reduce the number of multiplications by 90000 * 65 = 5,850,000 and the number of additions by 5,850,000 - 1 = 5,849,999 for a total of 11,699,999 arithmetic operations saved. The function below achieves this very thing.

fasterAlgo <- function(m, p, ind1, n) {
    vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
    preCals <- vapply(1:(n + 1L), function(x) sum(m[ , x:(p - n + x - 2L)] * 
                                               m[ , 1L:(p - n - 1L)]), 42.42)
    for (i in 0L:n) {
        for (j in i:n) {
            myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])
            vcov[j + 1L, i + 1L] <- myNum / (ind * (p - j) - 1)
        }
    }
    vcov
}

## outputs same results
all.equal(OPalgo(mat, periods, ind, n_lags), fasterAlgo(mat, periods, ind, n_lags))
[1] TRUE

Benchmarks:

## I commented out the print statements of the OPalgo before benchmarking
library(microbenchmark)
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
               fasterBase = fasterAlgo(mat, periods, ind, n_lags),
               RcppOrig = compute_vcov(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min        lq      mean   median        uq       max neval cld
          OP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356     5   c
  fasterBase  863.3897  863.9681  865.5576  865.593  866.7962  868.0409     5  b 
    RcppOrig  160.1040  161.8922  162.0153  162.235  162.4756  163.3697     5 a

As you can see, with this modification we see at least a 3 fold improvement but the Rcpp is still much faster. Let's implement the above concept in Rcpp.

// [[Rcpp::export]]
NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {

    NumericMatrix vcov(n_lags + 1, n_lags + 1);
    std::vector<double> preCalcs;
    preCalcs.reserve(n_lags + 1);
    double myCov;

    int i, j, k1, k2, l;
    int n = mat.nrow();
    int m = mat.ncol();

    for (i = 0; i <= n_lags; i++) {
        myCov = 0;
        for (k1 = i, k2 = 0; k2 < (m - n_lags - 1); k1++, k2++) {
            for (l = 0; l < n; l++) {
                myCov += mat(l, k1) * mat(l, k2); 
            }
        }
        preCalcs.push_back(myCov);
    }

    for (i = 0; i <= n_lags; i++) {
        for (j = i; j <= n_lags; j++) {
            myCov = preCalcs[j - i];
            for (k1 = m - n_lags + j - i - 1, k2 = m - n_lags - 1; k2 < (m - j); k1++, k2++) {
                for (l = 0; l < n; l++) {
                    myCov += mat(l, k1) * mat(l, k2);
                }
            }
            myCov /= n * (m - j) - 1;
            vcov(i, j) = vcov(j, i) = myCov;
        }
    }

    return vcov;
}

## gives same results
all.equal(compute_vcov2(mat, n_lags), compute_vcov(mat, n_lags))
[1] TRUE

New benchmarks:

microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
               fasterBase = fasterAlgo(mat, periods, ind, n_lags),
               RcppOrig = compute_vcov(mat, n_lags), 
               RcppModified = compute_vcov2(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min         lq       mean     median         uq        max neval  cld
          OP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073     5    d
  fasterBase  866.5601  868.25555  888.64418  869.31796  870.92308  968.16417     5   c 
    RcppOrig  160.3467  161.37992  162.74899  161.73009  164.38653  165.90174     5  b  
RcppModified   51.1641   51.67149   52.87447   52.56067   53.06273   55.91334     5 a

Now the enhanced Rcpp solution is around 3x faster the original Rcpp solution and around 50x faster than the original algorithm provided by the OP.

Update

We can do even better. We can reverse the ranges of the indices i/j so as to continuously update preCalcs. This allows up to only compute the product of one new column every iteration. This really comes into play as n_lags increases. Observe:

// [[Rcpp::export]]
NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {

    NumericMatrix vcov(n_lags + 1, n_lags + 1);
    std::vector<double> preCalcs;
    preCalcs.reserve(n_lags + 1);

    int i, j, k1, k2, l;
    int n = mat.nrow();
    int m = mat.ncol();

    for (i = 0; i <= n_lags; i++) {
        preCalcs.push_back(0);
        for (k1 = i, k2 = 0; k2 < (m - n_lags); k1++, k2++) {
            for (l = 0; l < n; l++) {
                preCalcs[i] += mat(l, k1) * mat(l, k2); 
            }
        }
    }

    for (i = n_lags; i >= 0; i--) {  ## reverse range
        for (j = n_lags; j >= i; j--) {   ## reverse range
            vcov(i, j) = vcov(j, i) = preCalcs[j - i] / (n * (m - j) - 1);
            if (i > 0 && i > 0) {
                for (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {
                    for (l = 0; l < n; l++) {
                        ## updating preCalcs vector
                        preCalcs[j - i] += mat(l, k1) * mat(l, k2);  
                    }
                }
            }
        }
    }

    return vcov;
}

all.equal(compute_vcov(mat, n_lags), compute_vcov3(mat, n_lags))
[1] TRUE

Rcpp benchmarks only:

n_lags <- 50L
microbenchmark(RcppOrig = compute_vcov(mat, n_lags),
                 RcppModified = compute_vcov2(mat, n_lags),
                 RcppExtreme = compute_vcov3(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min        lq      mean    median       uq       max neval cld
    RcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446     5   c
RcppModified 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202     5  b 
 RcppExtreme  324.8252  330.7381  332.9657  333.5919  335.168  340.5054     5 a

The newest implementation is now over 20x faster than the original Rcpp version and well over 300x faster than the original algorithm when n-lags is large.

R: calculating population standard deviation with NA

We can use na.rm=TRUE in the mean and sum to account for the NA elements.

pop.sd<-function(x){sqrt(sum((x-mean(x, na.rm=TRUE))^2, 
                                 na.rm=TRUE)/sum(!is.na(x)))}
apply(mf2, 1, pop.sd)
#[1] 25.152866 13.500000  7.586538 31.443070  0.000000 32.967998

This should also give the same result for 'mf1'

apply(mf1,1,pop.sd)
#[1] 25.152866 12.498889  7.586538 31.443070 22.156012 32.967998

Instead of looping over the rows, we can also us the vectorized rowSums and rowMeans

sqrt(rowSums((mf1-rowMeans(mf1, na.rm=TRUE))^2, na.rm=TRUE)/ncol(mf1))
#[1] 25.152866 12.498889  7.586538 31.443070 22.156012 32.967998

sqrt(rowSums((mf2-rowMeans(mf2, na.rm=TRUE))^2, na.rm=TRUE)/ncol(mf2))
#[1] 25.152866 11.022704  7.586538 31.443070  0.000000 32.967998

R: using apply.fromstart to calculate returns and standard deviation

Use the very efficient vectorized base R function cumprod for your first desired result. While the second result could be achieved (less efficiently) using a simple *apply loop

If you want to keep zoo class, do

cumprod(hourlyData$Position)
#         1         2         3         4         5         6 
# 1.0000000 0.9929392 0.9224669 0.9754125 1.0673348 1.1547867

Otherwise

cumprod(as.numeric(hourlyData$Position))
## [1] 1.0000000 0.9929392 0.9224669 0.9754125 1.0673348 1.1547867

For sd (as proposed by @akrun) (used vapply instead of sapply in order to "squeeze" maximum performance out of it)

vapply(seq_len(nrow(hourlyData)), function(i) sd(hourlyData$Position[1:i]), FUN.VALUE = double(1))
# [1] NA 0.004992723 0.039097989 0.052519398 0.063598345 0.063156702

Cumulative sum over matrix diagonals

From the example it seems that every diagonal is all zeros or else is a sequence of ones followed by zeros. We assume that that is always the case.

First form a function cum which takes a diagonal x and outputs a vector of zeros the same length except that position sum(x) is to be set to sum(x) .

Then apply that function across diagonals using ave. row(m1)-col(m1) is constant on diagonals and can be used for grouping.

cum <- function(x, s = sum(x)) replace(0 * x, s, s)
ave(m1, row(m1) - col(m1), FUN = cum)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    2    0    0    3
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0

If the sequence of ones on a diaongal need not start at the beginning of the diagonal but it is still true that there is only one sequence of ones at most on each diagonal then use this in place of cum above:

cum <- function(x, s = sum(x)) replace(0 * x, s + which.max(x) - 1, s)

If there can be more than one sequence of ones on a diagonal then use this in place of cum above:

library(data.table)
cum <- function(x) {
  ave(x, rleid(x), FUN = function(x, s = sum(x)) replace(0 * x, s, s))
}

Efficient Calculation of Matrix Cumulative Standard Deviation in R