Why apply() returns a transposed xts matrix?
That's what apply
is documented to do. From ?apply
:
Value:
If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
In your case, 'n'=48
(because you're looping over rows), so apply
will return an array of dimension c(48, 7429)
.
Also note that myxts.2
is not an xts object. It's a regular array. You have a couple options:
transpose the results of
apply
before re-creating your xts object:data(sample_matrix)
myxts <- as.xts(sample_matrix)
dim(myxts) # [1] 180 4
myxts.2 <- apply(myxts, 1 , identity)
dim(myxts.2) # [1] 4 180
myxts.2 <- xts(t(apply(myxts, 1 , identity)), index(myxts))
dim(myxts.2) # [1] 180 4Vectorize your function so it operates on all the rows of an xts
object and returns an xts object. Then you don't have to worry
aboutapply
at all.
Finally, please start providing reproducible examples. It's not that hard and it makes it a lot easier for people to help. I've provided an example above and I hope you can use it in your following questions.
Dealing with apply functions of xts object in R
This might not be the most elegant solution, but it works:
# Split xts_object by year
xts_list = split(xts_object, "years")
# cumsum for each year
cumsum_list = lapply(xts_list, FUN = cumsum)
# rbind them together
do.call(rbind, cumsum_list)
# [,1] [,2]
# 2010-01-01 1 48
# 2010-02-01 3 95
# 2010-03-01 6 141
# 2010-04-01 10 186
# 2010-05-01 15 230
# 2010-06-01 21 273
# 2010-07-01 28 315
# 2010-08-01 36 356
# 2010-09-01 45 396
# 2010-10-01 55 435
# 2010-11-01 66 473
# 2010-12-01 78 510
# 2011-01-01 13 36
# 2011-02-01 27 71
# 2011-03-01 42 105
# 2011-04-01 58 138
# 2011-05-01 75 170
# 2011-06-01 93 201
# 2011-07-01 112 231
# 2011-08-01 132 260
# 2011-09-01 153 288
# 2011-10-01 175 315
# 2011-11-01 198 341
# 2011-12-01 222 366
# 2012-01-01 25 24
# 2012-02-01 51 47
# 2012-03-01 78 69
# 2012-04-01 106 90
# 2012-05-01 135 110
# 2012-06-01 165 129
# 2012-07-01 196 147
# 2012-08-01 228 164
# 2012-09-01 261 180
# 2012-10-01 295 195
# 2012-11-01 330 209
# 2012-12-01 366 222
# 2013-01-01 37 12
# 2013-02-01 75 23
# 2013-03-01 114 33
# 2013-04-01 154 42
# 2013-05-01 195 50
# 2013-06-01 237 57
# 2013-07-01 280 63
# 2013-08-01 324 68
# 2013-09-01 369 72
# 2013-10-01 415 75
# 2013-11-01 462 77
# 2013-12-01 510 78
class(do.call(rbind, cumsum_list))
# [1] "xts" "zoo"
The resulting object would still be "xts"
apply in R returns weird results
Because the matrix is structured by columns, dividing the matrix by the row sums gives the correct result:
> a / rowSums(a)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.01818182 0.1090909 0.2 0.2909091 0.3818182
[2,] 0.03333333 0.1166667 0.2 0.2833333 0.3666667
[3,] 0.04615385 0.1230769 0.2 0.2769231 0.3538462
[4,] 0.05714286 0.1285714 0.2 0.2714286 0.3428571
[5,] 0.06666667 0.1333333 0.2 0.2666667 0.3333333
This works because the vector returned by rowSums
is recycled.
For the apply
to work, you would simply need to transpose the result, as the return value of apply
is built up by column, not row.
R: Sub-selecting through XTS object vs Matrix: Why such a performance hit?
Yes. The short answer is that when you subset an xts object you're extracting the relevant times from a vector, and also extracting the relevant rows from the matrix which is more expensive in computational time than simply extracting the components from the matrix alone. You typically want to keep your data in xts format in order to make subsetting your data via times easy, but you can call coredata
first (which is faster than as.matrix
), which exposes the matrix of data, before subsetting an xts
object via integer indices
Read ?coredata
> class(coredata(theXTS))
[1] "matrix"
# Compare benchmark below against subsetting with an existing matrix
theXTS_matrix <- as.matrix(theXTS)
library(microbenchmark)
microbenchmark(theXTS_matrix[5, 7:10], coredata(theXTS),
coredata(theXTS)[5, 7:10],
theXTS[5, 7:10], as.matrix(theXTS)[5, 7:10])
# Unit: nanoseconds
# expr min lq mean median uq max neval
# theXTS_matrix[5, 7:10] 663 1087.5 1479.39 1254.0 1569.0 9062 100
# coredata(theXTS) 10456 12090.5 13413.92 13122.0 14269.0 24106 100
# coredata(theXTS)[5, 7:10] 11703 12959.5 15193.21 14298.5 15499.5 56137 100
# theXTS[5, 7:10] 27519 30293.5 32669.63 31805.5 33130.5 57130 100
# as.matrix(theXTS)[5, 7:10] 200927 205187.5 209949.47 206926.0 212582.0 330426 100
i.e. coredata
offers a small overhead, but the subsetting is then faster.
R: how to vapply across rows for xts object?
I would not use vapply
if you want the mean of the columns for each row. I would use rowMeans
, and note that you have to convert the result back to xts.
(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
And I would use apply
for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.
(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
# [,1] [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5
To address the comment of "why not use vapply()
", here are some benchmarks (using the data from the code review Q/A the OP linked to):
set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }
library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
# expr min lq mean median uq max neval
# xrowmean(xz) 169.496 188.8505 207.1931 204.2455 219.4945 285.329 100
# xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353 100
# xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961 100
So, why not use vapply()
? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply()
version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply()
. I simply prefer apply()
for this case.
matrix multiplication without losing xts properties
A few simple alternatives exists. Obviously you could rewrite the method in Rcpp
as suggested, but a simpler alternative is just to overwrite the attributes after performing matrix regular multiplication.
dd_new <- dd %*% c(-1, 1)
att <- attributes(dd)
att$dim <- dim(dd_new)
attributes(dd_new) <- att
This is not as fast as pure matrix multiplication, but is about 10 - 13x faster than subsetting the time series itself.
microbenchmark::microbenchmark(xts = dd[, 1] - dd[, 2],
matmult = dd %*% c(1, -1),
xtsmatmult = xts(dd %*% c(1, -1), index(dd)),
"%.%" = dd %.% c(1, -1),
"%-%" = dd %-% c(1, -1),
times = 1e5)
Unit: milliseconds
expr min lq mean median uq max neval
xts 0.0396 0.0685 0.11200 0.0998 0.1170 15.40 1e+05
matmult 0.0008 0.0021 0.00352 0.0028 0.0040 7.71 1e+05
xtsmatmult 0.0853 0.1380 0.22900 0.2100 0.2300 117.00 1e+05
%.% 0.0025 0.0055 0.00905 0.0076 0.0099 8.97 1e+05
%-% 0.0096 0.0183 0.03030 0.0268 0.0318 101.00 1e+05
In the above %.%
is a barebone function leaving only doing the matrix multiplication and overwriting the attributes, while %-%
adds some simple input-checks, to ensure that dimensions are acceptable, and using S3
class style, in order to simplify generalizations.
Functions:
note that the compiler::cmpfun
function has been used to byte-compile the functions (similar to a package function). In this case the effect is insignificant.
`%.%` <- compiler::cmpfun(function(x, z){
x2 <- x %*% z
att <- attributes(x)
att$dim <- dim(x2)
attributes(x2) <- att
x2
})
`%-%` <- function(x, z)
UseMethod('%-%')
`%-%.xts` <- compiler::cmpfun(function(x, z){
##
if(!is.xts(x))
stop('x must be an xts object')
if(!is.numeric(z) || !(n <- length(z)) == ncol(x) || n == 0)
stop('z must be an index vector')
x2 <- x %*% z
att <- attributes(x)
att$dim <- dim(x2)
attributes(x2) <- att
x2
})
Related Topics
How to Change Library Location in R
Convert Named Character Vector to Data.Frame
Longest Common Substring in R Finding Non-Contiguous Matches Between the Two Strings
Append Value to Empty Vector in R
Meaning of Ddply Error: 'Names' Attribute [9] Must Be the Same Length as the Vector [1]
What's the Best Way to Use R Scripts on the Command Line (Terminal)
Find Common Substrings Between Two Character Variables
Read.CSV Warning 'Eof Within Quoted String' Prevents Complete Reading of File
How to Add Legend to Ggplot Manually? - R
Prevent Row Names to Be Written to File When Using Write.Csv
Why Is 'Vapply' Safer Than 'Sapply'
How to Deal with "'Somefunction' Is Not an Exported Object from 'Namespace:Somepackage'" Error
How to Work with Large Numbers in R
How to Use Objects from Global Environment in Rstudio Markdown