Why Apply() Returns a Transposed Xts Matrix

Why apply() returns a transposed xts matrix?

That's what apply is documented to do. From ?apply:

Value:

 If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
 returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.

In your case, 'n'=48 (because you're looping over rows), so apply will return an array of dimension c(48, 7429).

Also note that myxts.2 is not an xts object. It's a regular array. You have a couple options:

transpose the results of apply before re-creating your xts object:

data(sample_matrix)
myxts <- as.xts(sample_matrix)
dim(myxts)    # [1] 180   4
myxts.2 <- apply(myxts, 1 , identity)
dim(myxts.2)  # [1]   4 180
myxts.2 <- xts(t(apply(myxts, 1 , identity)), index(myxts))
dim(myxts.2)  # [1] 180   4

Vectorize your function so it operates on all the rows of an xts
object and returns an xts object. Then you don't have to worry
about apply at all.

Finally, please start providing reproducible examples. It's not that hard and it makes it a lot easier for people to help. I've provided an example above and I hope you can use it in your following questions.

Dealing with apply functions of xts object in R

This might not be the most elegant solution, but it works:

# Split xts_object by year
xts_list = split(xts_object, "years")
# cumsum for each year
cumsum_list = lapply(xts_list, FUN = cumsum)
# rbind them together
do.call(rbind, cumsum_list)  

#             [,1] [,2]
# 2010-01-01    1   48
# 2010-02-01    3   95
# 2010-03-01    6  141
# 2010-04-01   10  186
# 2010-05-01   15  230
# 2010-06-01   21  273
# 2010-07-01   28  315
# 2010-08-01   36  356
# 2010-09-01   45  396
# 2010-10-01   55  435
# 2010-11-01   66  473
# 2010-12-01   78  510
# 2011-01-01   13   36
# 2011-02-01   27   71
# 2011-03-01   42  105
# 2011-04-01   58  138
# 2011-05-01   75  170
# 2011-06-01   93  201
# 2011-07-01  112  231
# 2011-08-01  132  260
# 2011-09-01  153  288
# 2011-10-01  175  315
# 2011-11-01  198  341
# 2011-12-01  222  366
# 2012-01-01   25   24
# 2012-02-01   51   47
# 2012-03-01   78   69
# 2012-04-01  106   90
# 2012-05-01  135  110
# 2012-06-01  165  129
# 2012-07-01  196  147
# 2012-08-01  228  164
# 2012-09-01  261  180
# 2012-10-01  295  195
# 2012-11-01  330  209
# 2012-12-01  366  222
# 2013-01-01   37   12
# 2013-02-01   75   23
# 2013-03-01  114   33
# 2013-04-01  154   42
# 2013-05-01  195   50
# 2013-06-01  237   57
# 2013-07-01  280   63
# 2013-08-01  324   68
# 2013-09-01  369   72
# 2013-10-01  415   75
# 2013-11-01  462   77
# 2013-12-01  510   78

class(do.call(rbind, cumsum_list))
# [1] "xts" "zoo"

The resulting object would still be "xts"

apply in R returns weird results

Because the matrix is structured by columns, dividing the matrix by the row sums gives the correct result:

> a / rowSums(a)
           [,1]      [,2] [,3]      [,4]      [,5]
[1,] 0.01818182 0.1090909  0.2 0.2909091 0.3818182
[2,] 0.03333333 0.1166667  0.2 0.2833333 0.3666667
[3,] 0.04615385 0.1230769  0.2 0.2769231 0.3538462
[4,] 0.05714286 0.1285714  0.2 0.2714286 0.3428571
[5,] 0.06666667 0.1333333  0.2 0.2666667 0.3333333

This works because the vector returned by rowSums is recycled.

For the apply to work, you would simply need to transpose the result, as the return value of apply is built up by column, not row.

R: Sub-selecting through XTS object vs Matrix: Why such a performance hit?

Yes. The short answer is that when you subset an xts object you're extracting the relevant times from a vector, and also extracting the relevant rows from the matrix which is more expensive in computational time than simply extracting the components from the matrix alone. You typically want to keep your data in xts format in order to make subsetting your data via times easy, but you can call coredata first (which is faster than as.matrix), which exposes the matrix of data, before subsetting an xts object via integer indices

Read ?coredata

> class(coredata(theXTS))
[1] "matrix"

# Compare benchmark below against subsetting with an existing matrix
theXTS_matrix <- as.matrix(theXTS)

library(microbenchmark)
microbenchmark(theXTS_matrix[5, 7:10], coredata(theXTS), 

coredata(theXTS)[5, 7:10],
                   theXTS[5, 7:10], as.matrix(theXTS)[5, 7:10])
# Unit: nanoseconds
# expr    min       lq      mean   median       uq    max neval
# theXTS_matrix[5, 7:10]    663   1087.5   1479.39   1254.0   1569.0   9062   100
# coredata(theXTS)  10456  12090.5  13413.92  13122.0  14269.0  24106   100
# coredata(theXTS)[5, 7:10]  11703  12959.5  15193.21  14298.5  15499.5  56137   100
# theXTS[5, 7:10]  27519  30293.5  32669.63  31805.5  33130.5  57130   100
# as.matrix(theXTS)[5, 7:10] 200927 205187.5 209949.47 206926.0 212582.0 330426   100

i.e. coredata offers a small overhead, but the subsetting is then faster.

R: how to vapply across rows for xts object?

I would not use vapply if you want the mean of the columns for each row. I would use rowMeans, and note that you have to convert the result back to xts.

(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5

And I would use apply for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.

(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
#                        [,1]    [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5

To address the comment of "why not use vapply()", here are some benchmarks (using the data from the code review Q/A the OP linked to):

set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
          order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
    mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }

library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
#          expr       min         lq       mean     median         uq       max neval
#  xrowmean(xz)   169.496   188.8505   207.1931   204.2455   219.4945   285.329   100
#    xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353   100
#   xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961   100

So, why not use vapply()? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply() version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply(). I simply prefer apply() for this case.

matrix multiplication without losing xts properties

A few simple alternatives exists. Obviously you could rewrite the method in Rcpp as suggested, but a simpler alternative is just to overwrite the attributes after performing matrix regular multiplication.

dd_new <- dd %*% c(-1, 1)
att <- attributes(dd)
att$dim <- dim(dd_new)
attributes(dd_new) <- att

This is not as fast as pure matrix multiplication, but is about 10 - 13x faster than subsetting the time series itself.

microbenchmark::microbenchmark(xts = dd[, 1] - dd[, 2], 
                               matmult = dd %*% c(1, -1),
                               xtsmatmult = xts(dd %*% c(1, -1), index(dd)),
                               "%.%" = dd %.% c(1, -1),
                               "%-%" = dd %-% c(1, -1),
                               times = 1e5)
Unit: milliseconds
       expr    min     lq    mean median     uq    max neval
        xts 0.0396 0.0685 0.11200 0.0998 0.1170  15.40 1e+05
    matmult 0.0008 0.0021 0.00352 0.0028 0.0040   7.71 1e+05
 xtsmatmult 0.0853 0.1380 0.22900 0.2100 0.2300 117.00 1e+05
        %.% 0.0025 0.0055 0.00905 0.0076 0.0099   8.97 1e+05
        %-% 0.0096 0.0183 0.03030 0.0268 0.0318 101.00 1e+05

In the above %.% is a barebone function leaving only doing the matrix multiplication and overwriting the attributes, while %-% adds some simple input-checks, to ensure that dimensions are acceptable, and using S3 class style, in order to simplify generalizations.

Functions:

note that the compiler::cmpfun function has been used to byte-compile the functions (similar to a package function). In this case the effect is insignificant.

`%.%` <- compiler::cmpfun(function(x, z){
    x2 <- x %*% z
    att <- attributes(x)
    att$dim <- dim(x2)
    attributes(x2) <- att
    x2
})
`%-%` <- function(x, z)
    UseMethod('%-%')
`%-%.xts` <- compiler::cmpfun(function(x, z){
    ## 
    if(!is.xts(x))
        stop('x must be an xts object')
    if(!is.numeric(z) || !(n <- length(z)) == ncol(x) || n == 0)
        stop('z must be an index vector')
    x2 <- x %*% z
    att <- attributes(x)
    att$dim <- dim(x2)
    attributes(x2) <- att
    x2
})

Why Apply() Returns a Transposed Xts Matrix