Why Apply() Returns a Transposed Xts Matrix

Why apply() returns a transposed xts matrix?

That's what apply is documented to do. From ?apply:

Value:

 If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.

In your case, 'n'=48 (because you're looping over rows), so apply will return an array of dimension c(48, 7429).

Also note that myxts.2 is not an xts object. It's a regular array. You have a couple options:

  1. transpose the results of apply before re-creating your xts object:

    data(sample_matrix)
    myxts <- as.xts(sample_matrix)
    dim(myxts) # [1] 180 4
    myxts.2 <- apply(myxts, 1 , identity)
    dim(myxts.2) # [1] 4 180
    myxts.2 <- xts(t(apply(myxts, 1 , identity)), index(myxts))
    dim(myxts.2) # [1] 180 4
  2. Vectorize your function so it operates on all the rows of an xts
    object and returns an xts object. Then you don't have to worry
    about apply at all.

Finally, please start providing reproducible examples. It's not that hard and it makes it a lot easier for people to help. I've provided an example above and I hope you can use it in your following questions.

Dealing with apply functions of xts object in R

This might not be the most elegant solution, but it works:

# Split xts_object by year
xts_list = split(xts_object, "years")
# cumsum for each year
cumsum_list = lapply(xts_list, FUN = cumsum)
# rbind them together
do.call(rbind, cumsum_list)

# [,1] [,2]
# 2010-01-01 1 48
# 2010-02-01 3 95
# 2010-03-01 6 141
# 2010-04-01 10 186
# 2010-05-01 15 230
# 2010-06-01 21 273
# 2010-07-01 28 315
# 2010-08-01 36 356
# 2010-09-01 45 396
# 2010-10-01 55 435
# 2010-11-01 66 473
# 2010-12-01 78 510
# 2011-01-01 13 36
# 2011-02-01 27 71
# 2011-03-01 42 105
# 2011-04-01 58 138
# 2011-05-01 75 170
# 2011-06-01 93 201
# 2011-07-01 112 231
# 2011-08-01 132 260
# 2011-09-01 153 288
# 2011-10-01 175 315
# 2011-11-01 198 341
# 2011-12-01 222 366
# 2012-01-01 25 24
# 2012-02-01 51 47
# 2012-03-01 78 69
# 2012-04-01 106 90
# 2012-05-01 135 110
# 2012-06-01 165 129
# 2012-07-01 196 147
# 2012-08-01 228 164
# 2012-09-01 261 180
# 2012-10-01 295 195
# 2012-11-01 330 209
# 2012-12-01 366 222
# 2013-01-01 37 12
# 2013-02-01 75 23
# 2013-03-01 114 33
# 2013-04-01 154 42
# 2013-05-01 195 50
# 2013-06-01 237 57
# 2013-07-01 280 63
# 2013-08-01 324 68
# 2013-09-01 369 72
# 2013-10-01 415 75
# 2013-11-01 462 77
# 2013-12-01 510 78

class(do.call(rbind, cumsum_list))
# [1] "xts" "zoo"

The resulting object would still be "xts"

apply in R returns weird results

Because the matrix is structured by columns, dividing the matrix by the row sums gives the correct result:

> a / rowSums(a)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.01818182 0.1090909 0.2 0.2909091 0.3818182
[2,] 0.03333333 0.1166667 0.2 0.2833333 0.3666667
[3,] 0.04615385 0.1230769 0.2 0.2769231 0.3538462
[4,] 0.05714286 0.1285714 0.2 0.2714286 0.3428571
[5,] 0.06666667 0.1333333 0.2 0.2666667 0.3333333

This works because the vector returned by rowSums is recycled.

For the apply to work, you would simply need to transpose the result, as the return value of apply is built up by column, not row.

R: Sub-selecting through XTS object vs Matrix: Why such a performance hit?

Yes. The short answer is that when you subset an xts object you're extracting the relevant times from a vector, and also extracting the relevant rows from the matrix which is more expensive in computational time than simply extracting the components from the matrix alone. You typically want to keep your data in xts format in order to make subsetting your data via times easy, but you can call coredata first (which is faster than as.matrix), which exposes the matrix of data, before subsetting an xts object via integer indices

Read ?coredata

> class(coredata(theXTS))
[1] "matrix"

# Compare benchmark below against subsetting with an existing matrix
theXTS_matrix <- as.matrix(theXTS)

library(microbenchmark)
microbenchmark(theXTS_matrix[5, 7:10], coredata(theXTS),

coredata(theXTS)[5, 7:10],
theXTS[5, 7:10], as.matrix(theXTS)[5, 7:10])
# Unit: nanoseconds
# expr min lq mean median uq max neval
# theXTS_matrix[5, 7:10] 663 1087.5 1479.39 1254.0 1569.0 9062 100
# coredata(theXTS) 10456 12090.5 13413.92 13122.0 14269.0 24106 100
# coredata(theXTS)[5, 7:10] 11703 12959.5 15193.21 14298.5 15499.5 56137 100
# theXTS[5, 7:10] 27519 30293.5 32669.63 31805.5 33130.5 57130 100
# as.matrix(theXTS)[5, 7:10] 200927 205187.5 209949.47 206926.0 212582.0 330426 100

i.e. coredata offers a small overhead, but the subsetting is then faster.

R: how to vapply across rows for xts object?

I would not use vapply if you want the mean of the columns for each row. I would use rowMeans, and note that you have to convert the result back to xts.

(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5

And I would use apply for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.

(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
# [,1] [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5

To address the comment of "why not use vapply()", here are some benchmarks (using the data from the code review Q/A the OP linked to):

set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }

library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
# expr min lq mean median uq max neval
# xrowmean(xz) 169.496 188.8505 207.1931 204.2455 219.4945 285.329 100
# xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353 100
# xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961 100

So, why not use vapply()? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply() version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply(). I simply prefer apply() for this case.

matrix multiplication without losing xts properties

A few simple alternatives exists. Obviously you could rewrite the method in Rcpp as suggested, but a simpler alternative is just to overwrite the attributes after performing matrix regular multiplication.

dd_new <- dd %*% c(-1, 1)
att <- attributes(dd)
att$dim <- dim(dd_new)
attributes(dd_new) <- att

This is not as fast as pure matrix multiplication, but is about 10 - 13x faster than subsetting the time series itself.

microbenchmark::microbenchmark(xts = dd[, 1] - dd[, 2], 
matmult = dd %*% c(1, -1),
xtsmatmult = xts(dd %*% c(1, -1), index(dd)),
"%.%" = dd %.% c(1, -1),
"%-%" = dd %-% c(1, -1),
times = 1e5)
Unit: milliseconds
expr min lq mean median uq max neval
xts 0.0396 0.0685 0.11200 0.0998 0.1170 15.40 1e+05
matmult 0.0008 0.0021 0.00352 0.0028 0.0040 7.71 1e+05
xtsmatmult 0.0853 0.1380 0.22900 0.2100 0.2300 117.00 1e+05
%.% 0.0025 0.0055 0.00905 0.0076 0.0099 8.97 1e+05
%-% 0.0096 0.0183 0.03030 0.0268 0.0318 101.00 1e+05

In the above %.% is a barebone function leaving only doing the matrix multiplication and overwriting the attributes, while %-% adds some simple input-checks, to ensure that dimensions are acceptable, and using S3 class style, in order to simplify generalizations.

Functions:

note that the compiler::cmpfun function has been used to byte-compile the functions (similar to a package function). In this case the effect is insignificant.

`%.%` <- compiler::cmpfun(function(x, z){
x2 <- x %*% z
att <- attributes(x)
att$dim <- dim(x2)
attributes(x2) <- att
x2
})
`%-%` <- function(x, z)
UseMethod('%-%')
`%-%.xts` <- compiler::cmpfun(function(x, z){
##
if(!is.xts(x))
stop('x must be an xts object')
if(!is.numeric(z) || !(n <- length(z)) == ncol(x) || n == 0)
stop('z must be an index vector')
x2 <- x %*% z
att <- attributes(x)
att$dim <- dim(x2)
attributes(x2) <- att
x2
})


Related Topics



Leave a reply



Submit