Time Series and Stl in R: Error Only Univariate Series Are Allowed

stl() decomposition won't accept univariate ts object?

I'm not 100% sure about what the exact cause of the problem is, but you can fix this by passing dummyData$index to ts instead of the entire object:

tsData2 <- ts(
data=dummyData$index,
start = c(2012,1),
end = c(2014,12),
frequency = 12)
##
R> stl(tsData2, s.window="periodic")
Call:
stl(x = tsData2, s.window = "periodic")

Components
seasonal trend remainder
Jan 2012 -24.0219753 36.19189 9.8300831
Feb 2012 -20.2516062 37.82808 8.4235219
Mar 2012 -0.4812396 39.46428 -4.9830367
Apr 2012 -10.1034302 41.32047 1.7829612
May 2012 0.6077088 43.17666 -3.7843705
Jun 2012 4.4723800 45.22411 -10.6964877
Jul 2012 -7.6629462 47.27155 -0.6086074
Aug 2012 -1.0551286 49.50673 -3.4516016
Sep 2012 2.2193527 51.74191 -3.9612597
Oct 2012 7.3239448 55.27391 -4.5978509
Nov 2012 18.4285405 58.80591 -13.2344456
Dec 2012 30.5244146 63.70105 -16.2254684

...


I'm guessing that when you pass a data.frame to the data argument of ts, some extra attributes carry over, and although this generally doesn't seem to be an issue with many functions that take a ts class object (univariate or otherwise), apparently it is an issue for stl.

R>  all.equal(tsData2,tsData)
[1] "Attributes: < Names: 1 string mismatch >"
[2] "Attributes: < Length mismatch: comparison on first 2 components >"
[3] "Attributes: < Component 2: Numeric: lengths (3, 2) differ >"
##
R> str(tsData2)
Time-Series [1:36] from 2012 to 2015: 22 26 34 33 40 39 39 45 50 58 ...
##
R> str(tsData)
'ts' int [1:36, 1] 22 26 34 33 40 39 39 45 50 58 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "index"
- attr(*, "tsp")= num [1:3] 2012 2015 12

Edit:

Looking into this a little further, I think the problem has to do with the dimnames attribute being carried over from the dummyData when it is passed as a whole. Note this excerpt from the body of stl:

if (is.matrix(x)) 
stop("only univariate series are allowed")

and from the definition of matrix:

is.matrix returns TRUE if x is a vector and has a "dim" attribute of
length 2) and FALSE otherwise

so although you are passing stl a univariate time series (the original tsData), as far as the function is concerned, a vector with a length 2 dimnames attribute (i.e. a matrix) is not a univariate series. It seems a little strange to do error handling in this way, but I'm sure the author of the function had a very good reason for this.

Why am I getting this error message even after transforming my data set into a ts file for time series analysis?

The reason that you got this error is because you tried to feed a data set with two variables (date + pkgrev) into STL's argument, which only takes a univariate time series as a proper argument.

To solve this problem, you could create a univariate ts object without the date variable. In your case, you need to use mydata2$pkgrev (or mydata2["pkgrev"] after mydata2 is converted into a dataframe) instead of mydata2 in your code mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12). The ts object is already supplied with the temporal information as you specified start date and frequency in the argument.

If you would like to create a new dataframe with both the ts object and its corresponding date variable, I would suggest you to use the following code:

mydata3 = cbind(as.Date(time(mydata2_ts)), mydata2_ts)
mydata3 = as.data.frame(mydata3)

However, for the purpose of STL decompostion, the input of the first argument should be a ts object, i.e., mydata2_ts.

What is causing this error related to time series attributes?

You are observing floating point round-off error caused by initially starting at different times before windowing the longer series to match the smaller. As a simpler example (but with ranges that match yours) consider the following (which both illustrates mysteriously different time series attributes which display the same, as well as an easy fix):

x <-ts(rnorm(491), start=c(1950,2), frequency = 12)
y <-ts(rnorm(528), start=c(1947,1), frequency = 12)
y <- window(y,c(1950,2),c(1990,12))
print(attributes(x)$tsp) #prints 1950.083 1990.917 12.000
print(attributes(y)$tsp) #prints 1950.083 1990.917 12.000
#but:
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE FALSE TRUE (!)

#the fix:

y <- ts(y,start=c(1950,2), frequency = 12)
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE TRUE TRUE

There is some strangeness here that I don't understand. I would have thought that as.vector(time(x)) (the times where the time series is sampled) is essentially the same as seq(a,b,1/c) (where attributes(x)$tsp = a b c) but when I compare the times of x with the sequence generated by seq I find a strange discrepancies:

> v <- as.vector(time(x))
> w <- seq(attributes(x)$tsp[1],attributes(x)$tsp[2],1/attributes(x)$tsp[3])
> sum(v == w)
[1] 412
> max(abs(v-w))
[1] 2.273737e-13
> which(v != w)
[1] 255 258 261 264 267 270 273 276 279 282 285 288 291 294 297 300 303 306
[19] 309 312 315 318 321 324 327 330 333 336 339 342 345 348 351 354 357 360
[37] 363 366 369 372 375 378 381 384 387 390 393 396 399 402 405 408 411 414
[55] 417 420 423 426 429 432 435 438 441 444 447 450 453 456 459 462 465 468
[73] 471 474 477 480 483 486 489

The strangest thing about the above is the non-contiguous nature of the indices where the two vectors differ. The underlying problem is that 1/12 is not exactly representable by a float, so neither v nor w have the property that their points differ from successive points by exactly 1/12. My conjecture is that time series objects adopt an error-reducing strategy which causes the inevitable error to be spread-out over the time span. Since y and the original x were initially constructed with different starts, the way that this error was spread out differed slightly, in a way that wasn't fixed by window. Given the noncontigous nature of these micro-discrepancies, I suspect that sometimes code like yours which windows-down a larger time series to the same time span as a smaller will sometimes result in time series attributes which are exactly equal but other times will result in ones where 1 or 2 of the attributes differ by something like 2.273737e-13. This could lead to hard to track down bugs where code seems to work on test cases but then mysteriously crashes when the input is changed. I am surprised that the documentation on window doesn't mention the danger.



Related Topics



Leave a reply



Submit