Time Series and Stl in R: Error Only Univariate Series Are Allowed

stl() decomposition won't accept univariate ts object?

I'm not 100% sure about what the exact cause of the problem is, but you can fix this by passing dummyData$index to ts instead of the entire object:

tsData2 <- ts(
  data=dummyData$index, 
  start = c(2012,1), 
  end = c(2014,12), 
  frequency = 12)
##
R>  stl(tsData2, s.window="periodic")
 Call:
 stl(x = tsData2, s.window = "periodic")

Components
            seasonal     trend   remainder
Jan 2012 -24.0219753  36.19189   9.8300831
Feb 2012 -20.2516062  37.82808   8.4235219
Mar 2012  -0.4812396  39.46428  -4.9830367
Apr 2012 -10.1034302  41.32047   1.7829612
May 2012   0.6077088  43.17666  -3.7843705
Jun 2012   4.4723800  45.22411 -10.6964877
Jul 2012  -7.6629462  47.27155  -0.6086074
Aug 2012  -1.0551286  49.50673  -3.4516016
Sep 2012   2.2193527  51.74191  -3.9612597
Oct 2012   7.3239448  55.27391  -4.5978509
Nov 2012  18.4285405  58.80591 -13.2344456
Dec 2012  30.5244146  63.70105 -16.2254684

...

I'm guessing that when you pass a data.frame to the data argument of ts, some extra attributes carry over, and although this generally doesn't seem to be an issue with many functions that take a ts class object (univariate or otherwise), apparently it is an issue for stl.

R>  all.equal(tsData2,tsData)
[1] "Attributes: < Names: 1 string mismatch >"                         
[2] "Attributes: < Length mismatch: comparison on first 2 components >"
[3] "Attributes: < Component 2: Numeric: lengths (3, 2) differ >"      
##
R>  str(tsData2)
 Time-Series [1:36] from 2012 to 2015: 22 26 34 33 40 39 39 45 50 58 ...
##
R>  str(tsData)
 'ts' int [1:36, 1] 22 26 34 33 40 39 39 45 50 58 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "index"
 - attr(*, "tsp")= num [1:3] 2012 2015 12

Edit:

Looking into this a little further, I think the problem has to do with the dimnames attribute being carried over from the dummyData when it is passed as a whole. Note this excerpt from the body of stl:

if (is.matrix(x)) 
        stop("only univariate series are allowed")

and from the definition of matrix:

is.matrix returns TRUE if x is a vector and has a "dim" attribute of
length 2) and FALSE otherwise

so although you are passing stl a univariate time series (the original tsData), as far as the function is concerned, a vector with a length 2 dimnames attribute (i.e. a matrix) is not a univariate series. It seems a little strange to do error handling in this way, but I'm sure the author of the function had a very good reason for this.

Why am I getting this error message even after transforming my data set into a ts file for time series analysis?

The reason that you got this error is because you tried to feed a data set with two variables (date + pkgrev) into STL's argument, which only takes a univariate time series as a proper argument.

To solve this problem, you could create a univariate ts object without the date variable. In your case, you need to use mydata2$pkgrev (or mydata2["pkgrev"] after mydata2 is converted into a dataframe) instead of mydata2 in your code mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12). The ts object is already supplied with the temporal information as you specified start date and frequency in the argument.

If you would like to create a new dataframe with both the ts object and its corresponding date variable, I would suggest you to use the following code:

mydata3 = cbind(as.Date(time(mydata2_ts)), mydata2_ts)
mydata3 = as.data.frame(mydata3)

However, for the purpose of STL decompostion, the input of the first argument should be a ts object, i.e., mydata2_ts.

What is causing this error related to time series attributes?

You are observing floating point round-off error caused by initially starting at different times before windowing the longer series to match the smaller. As a simpler example (but with ranges that match yours) consider the following (which both illustrates mysteriously different time series attributes which display the same, as well as an easy fix):

x <-ts(rnorm(491), start=c(1950,2), frequency = 12)
y <-ts(rnorm(528), start=c(1947,1), frequency = 12)
y <- window(y,c(1950,2),c(1990,12))
print(attributes(x)$tsp) #prints 1950.083 1990.917   12.000
print(attributes(y)$tsp) #prints 1950.083 1990.917   12.000
#but:
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE FALSE  TRUE (!)

#the fix:

y <- ts(y,start=c(1950,2), frequency = 12)
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE TRUE  TRUE

There is some strangeness here that I don't understand. I would have thought that as.vector(time(x)) (the times where the time series is sampled) is essentially the same as seq(a,b,1/c) (where attributes(x)$tsp = a b c) but when I compare the times of x with the sequence generated by seq I find a strange discrepancies:

> v <- as.vector(time(x))
> w <- seq(attributes(x)$tsp[1],attributes(x)$tsp[2],1/attributes(x)$tsp[3])
> sum(v == w)
[1] 412
> max(abs(v-w))
[1] 2.273737e-13
> which(v != w)
 [1] 255 258 261 264 267 270 273 276 279 282 285 288 291 294 297 300 303 306
[19] 309 312 315 318 321 324 327 330 333 336 339 342 345 348 351 354 357 360
[37] 363 366 369 372 375 378 381 384 387 390 393 396 399 402 405 408 411 414
[55] 417 420 423 426 429 432 435 438 441 444 447 450 453 456 459 462 465 468
[73] 471 474 477 480 483 486 489

The strangest thing about the above is the non-contiguous nature of the indices where the two vectors differ. The underlying problem is that 1/12 is not exactly representable by a float, so neither v nor w have the property that their points differ from successive points by exactly 1/12. My conjecture is that time series objects adopt an error-reducing strategy which causes the inevitable error to be spread-out over the time span. Since y and the original x were initially constructed with different starts, the way that this error was spread out differed slightly, in a way that wasn't fixed by window. Given the noncontigous nature of these micro-discrepancies, I suspect that sometimes code like yours which windows-down a larger time series to the same time span as a smaller will sometimes result in time series attributes which are exactly equal but other times will result in ones where 1 or 2 of the attributes differ by something like 2.273737e-13. This could lead to hard to track down bugs where code seems to work on test cases but then mysteriously crashes when the input is changed. I am surprised that the documentation on window doesn't mention the danger.

Time Series and Stl in R: Error Only Univariate Series Are Allowed

stl() decomposition won't accept univariate ts object?

Why am I getting this error message even after transforming my data set into a ts file for time series analysis?

What is causing this error related to time series attributes?

Related Topics

Leave a reply