stl() decomposition won't accept univariate ts object?
I'm not 100% sure about what the exact cause of the problem is, but you can fix this by passing dummyData$index
to ts
instead of the entire object:
tsData2 <- ts(
data=dummyData$index,
start = c(2012,1),
end = c(2014,12),
frequency = 12)
##
R> stl(tsData2, s.window="periodic")
Call:
stl(x = tsData2, s.window = "periodic")
Components
seasonal trend remainder
Jan 2012 -24.0219753 36.19189 9.8300831
Feb 2012 -20.2516062 37.82808 8.4235219
Mar 2012 -0.4812396 39.46428 -4.9830367
Apr 2012 -10.1034302 41.32047 1.7829612
May 2012 0.6077088 43.17666 -3.7843705
Jun 2012 4.4723800 45.22411 -10.6964877
Jul 2012 -7.6629462 47.27155 -0.6086074
Aug 2012 -1.0551286 49.50673 -3.4516016
Sep 2012 2.2193527 51.74191 -3.9612597
Oct 2012 7.3239448 55.27391 -4.5978509
Nov 2012 18.4285405 58.80591 -13.2344456
Dec 2012 30.5244146 63.70105 -16.2254684
...
I'm guessing that when you pass a data.frame
to the data
argument of ts
, some extra attributes carry over, and although this generally doesn't seem to be an issue with many functions that take a ts
class object (univariate or otherwise), apparently it is an issue for stl
.
R> all.equal(tsData2,tsData)
[1] "Attributes: < Names: 1 string mismatch >"
[2] "Attributes: < Length mismatch: comparison on first 2 components >"
[3] "Attributes: < Component 2: Numeric: lengths (3, 2) differ >"
##
R> str(tsData2)
Time-Series [1:36] from 2012 to 2015: 22 26 34 33 40 39 39 45 50 58 ...
##
R> str(tsData)
'ts' int [1:36, 1] 22 26 34 33 40 39 39 45 50 58 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "index"
- attr(*, "tsp")= num [1:3] 2012 2015 12
Edit:
Looking into this a little further, I think the problem has to do with the dimnames
attribute being carried over from the dummyData
when it is passed as a whole. Note this excerpt from the body of stl
:
if (is.matrix(x))
stop("only univariate series are allowed")
and from the definition of matrix
:
is.matrix returns TRUE if x is a vector and has a "dim" attribute of
length 2) and FALSE otherwise
so although you are passing stl
a univariate time series (the original tsData
), as far as the function is concerned, a vector with a length 2 dimnames
attribute (i.e. a matrix
) is not a univariate series. It seems a little strange to do error handling in this way, but I'm sure the author of the function had a very good reason for this.
Why am I getting this error message even after transforming my data set into a ts file for time series analysis?
The reason that you got this error is because you tried to feed a data set with two variables (date + pkgrev) into STL's argument, which only takes a univariate time series as a proper argument.
To solve this problem, you could create a univariate ts object without the date variable. In your case, you need to use mydata2$pkgrev
(or mydata2["pkgrev"]
after mydata2
is converted into a dataframe) instead of mydata2
in your code mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12)
. The ts object is already supplied with the temporal information as you specified start date and frequency in the argument.
If you would like to create a new dataframe with both the ts object and its corresponding date variable, I would suggest you to use the following code:
mydata3 = cbind(as.Date(time(mydata2_ts)), mydata2_ts)
mydata3 = as.data.frame(mydata3)
However, for the purpose of STL decompostion, the input of the first argument should be a ts object, i.e., mydata2_ts
.
What is causing this error related to time series attributes?
You are observing floating point round-off error caused by initially starting at different times before windowing the longer series to match the smaller. As a simpler example (but with ranges that match yours) consider the following (which both illustrates mysteriously different time series attributes which display the same, as well as an easy fix):
x <-ts(rnorm(491), start=c(1950,2), frequency = 12)
y <-ts(rnorm(528), start=c(1947,1), frequency = 12)
y <- window(y,c(1950,2),c(1990,12))
print(attributes(x)$tsp) #prints 1950.083 1990.917 12.000
print(attributes(y)$tsp) #prints 1950.083 1990.917 12.000
#but:
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE FALSE TRUE (!)
#the fix:
y <- ts(y,start=c(1950,2), frequency = 12)
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE TRUE TRUE
There is some strangeness here that I don't understand. I would have thought that as.vector(time(x))
(the times where the time series is sampled) is essentially the same as seq(a,b,1/c)
(where attributes(x)$tsp = a b c
) but when I compare the times of x
with the sequence generated by seq
I find a strange discrepancies:
> v <- as.vector(time(x))
> w <- seq(attributes(x)$tsp[1],attributes(x)$tsp[2],1/attributes(x)$tsp[3])
> sum(v == w)
[1] 412
> max(abs(v-w))
[1] 2.273737e-13
> which(v != w)
[1] 255 258 261 264 267 270 273 276 279 282 285 288 291 294 297 300 303 306
[19] 309 312 315 318 321 324 327 330 333 336 339 342 345 348 351 354 357 360
[37] 363 366 369 372 375 378 381 384 387 390 393 396 399 402 405 408 411 414
[55] 417 420 423 426 429 432 435 438 441 444 447 450 453 456 459 462 465 468
[73] 471 474 477 480 483 486 489
The strangest thing about the above is the non-contiguous nature of the indices where the two vectors differ. The underlying problem is that 1/12
is not exactly representable by a float, so neither v
nor w
have the property that their points differ from successive points by exactly 1/12. My conjecture is that time series objects adopt an error-reducing strategy which causes the inevitable error to be spread-out over the time span. Since y
and the original x
were initially constructed with different starts, the way that this error was spread out differed slightly, in a way that wasn't fixed by window
. Given the noncontigous nature of these micro-discrepancies, I suspect that sometimes code like yours which windows-down a larger time series to the same time span as a smaller will sometimes result in time series attributes which are exactly equal but other times will result in ones where 1 or 2 of the attributes differ by something like 2.273737e-13
. This could lead to hard to track down bugs where code seems to work on test cases but then mysteriously crashes when the input is changed. I am surprised that the documentation on window
doesn't mention the danger.
Related Topics
Problems with Dplyr and Posixlt Data
Renaming Multiple Columns with Dplyr Rename(Across(
Compute All Pairwise Differences Within a Vector in R
How to Rename Element's List Indexed by a Loop in R
Warning: Replacing Previous Import 'Head' When Loading 'Utils' in R
Installing R on Osx Big Sur (Edit: and Apple M1) for Use with Rcpp and Openmp
How to Use Write.Table() and Ddply, Together
How to Calculate the Distance Between Latitude and Longitude Along Rows of Columns in R
R Corpus Is Messing Up My Utf-8 Encoded Text
Subset a Data.Frame with Multiple Conditions
Difference of Prediction Results in Random Forest Model
Findassocs for Multiple Terms in R
Match Dataframes Excluding Last Non-Na Value and Disregarding Order