How to Force Seasonality from Auto.Arima

Is there a way to force seasonality from auto.arima

You can set the D parameter, which governs seasonal differencing, to a value greater than zero. (The default NA allows auto.arima() to use or not use seasonality.) For example:

> set.seed(1)
> foo <- ts(rnorm(60),frequency=12)
> auto.arima(foo)
Series: foo 
ARIMA(0,0,0) with zero mean     

sigma^2 estimated as 0.7307:  log likelihood=-75.72
AIC=153.45   AICc=153.52   BIC=155.54
> auto.arima(foo,D=1)
Series: foo 
ARIMA(0,0,0)(1,1,0)[12]                    

Coefficients:
         sar1
      -0.3902
s.e.   0.1478

sigma^2 estimated as 1.139:  log likelihood=-72.23
AIC=148.46   AICc=148.73   BIC=152.21

Seasonality in auto.arima() from forecast package

Convert my_data to a ts object and then try auto.arima. For instance, suppose your data started in Jan 2010 and went through Oct 2030.

> my_data <- c(232,294,320,314,336,189,331,185,161,140,49,7,0,3,4,9,38,169,275,316,366,422,328,283,213,238,220,193,250,308,224,190,188,99,41,17,19,9,1,3,10,108,149,189,168,170,155,101,119,89,142,169,192,242,152,141,105,76,39,20,17,13,5,3,8,54,102,102,155,159,164,200,183,144,204,190,219,158,128,142,130,86,58,13,12,0,6,4,20,302,297,312,345,293,233,275,233,199,279,250,208,161,200,181,133,140,17,14,2,0,2,4,36,183,379,371,356,425,320,282,172,214,226,250,196,239,183,194,135,75,28,11,2,3,5,4,29,212,316,343,375,431,225,248,209,258,262,230,218,162,193,178,126,131,37,7,5,3,0,1,20,149,258,408,316,307,352,247,285,236,254,321,233,175,264,114,104,82,37,49,4,16,2,14,22,169,259,355,379,346,261,256,220,238,227,201,242,185,121,160,114,91,33,9,4,2,0,2,22,62,114,156,190,186,140,155,141,135,140,137,179,128,156,124,98,66,63,32,27,0,21,5,4,39,73,162,175,207,183,121,174,107,160,177,258,170,152,165,117,59,35,69,7,0,3,3,28,98,165,194,200,190,162,160,170,200,189,187,141,224,152,115,111,47,20,15,2,0,0,29,10,59,170,212,164,201,193,182,277,283,376,310,194,247,177,164,140,192,95,49,10,10,2,5,38,52,156,331,480,378,231,172,132,199,245,267,192,223,182,168,152,81,20,14,13,6,14,16,6,21,51,113,94,103,113,93,205,98,118,97,138,112,98,99,79,74,71,38,31,30,31,38,41,48,131,159,212,134,150,145,149,105,142,149,122,137,193,105,68,75,35,33,41,38,33,29,44,54,85,109,118,117,113,107,112,92,112,98,111,81,120,113,66,55,10,20,26,25,3,10,15,30,60,91,97,67,100,99,75,92,98,126,116,103,110,87,124,66,55,30,31,28,28,31,29,49,109,144,152,116,106,88,164,127,121,161,186,104,81,79,103,69,47,35,35,30,28,34,42,56,114,110,149,153,112,151,138,151,141,139,206,225,166,173,185,384,221,100,61,51,35,44,38,83,87,182,205,243,191,144,106,112,167,234,147,136,152,107,156,53)
> myts <- ts(my_data, start=c(2010, 1), end=c(2030, 10), frequency=24)
> auto.arima(myts, seasonal = T)
Series: myts 
ARIMA(1,1,0)(1,0,0)[24]                    

Coefficients:
          ar1   sar1
      -0.1427  0.373
s.e.   0.0532  0.052

sigma^2 estimated as 2502:  log likelihood=-2607.86
AIC=5221.73   AICc=5221.78   BIC=5234.31

Be careful to NOT set approximation and stepwise to False if you want to save time.

In R, auto.arima fails to capture seasonality

Just reporting a good well explained - as usual - blogpost by Rob Hyndman.

https://robjhyndman.com/hyndsight/dailydata/

The relevant part to your question says (blockquoting exactly the page):

When the time series is long enough to take in more than a year, then
it may be necessary to allow for annual seasonality as well as weekly
seasonality. In that case, a multiple seasonal model such as TBATS is
required.

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)
plot(fc)

This should capture the weekly pattern as well as the longer annual
pattern. The period 365.25 is the average length of a year allowing
for leap years. In some countries, alternative or additional year
lengths may be necessary.

I think it does exactly what you want.

I also tried to simply create the time series with msts

y <- msts(x[1:1800], seasonal.periods=c(7,365.25))

(I cut the time series in half to be quicker)

and then run auto.arima() directly on it, forcing a seasonal component with D=1

fc = auto.arima(y,D=1,trace=T,stepwise = F)

it will take a while.. because I set stepwise = FALSE (if you want it to look at all combinations without shortcuts you can set approximation=FALSE as well)

Series: y 
ARIMA(1,0,3)(0,1,0)[365] 

Coefficients:
         ar1      ma1      ma2      ma3
      0.9036  -0.3647  -0.3278  -0.0733
s.e.  0.0500   0.0571   0.0405   0.0310

sigma^2 estimated as 12.63:  log likelihood=-3854.1
AIC=7718.19   AICc=7718.23   BIC=7744.54

and then the forecast

for_fc = forecast(fc)
plot(for_fc)

I am adding a figure with the complete time series (red) on top of the output of
plot(for_fc)
and it seems to work decently - but it was just a quick test.

Sample Image

Why does auto.arima drop my seasonality component when stepwise=FALSE and approximation=FALSE?

auto.arima will return the best model (according to AICc) that it can find given the constraints provided. When stepwise=FALSE, it looks through more models than using the default stepwise procedure. However, the number of models are restricted depending on the number of parameters allowed. In the default settings, no more than five total parameters are allowed (to save time looking at too many models). In your case, it found a non-seasonal model -- the ARIMA(4,0,1) -- that has the minimum AICc amongst all models (seasonal or otherwise) with 5 or fewer parameters. This model mimics the seasonality quite well using cyclic behaviour. Have a look at the forecasts and you will see that the results look seasonal even if they are not.

If you want to find a better truly seasonal model, just increase the number of parameters allowed by setting max.order=10 for example. There is not too much gained using approximation=FALSE. What that does is force it to evaluate the likelihood more accurately for each model, but the approximation is quite good and much faster, so is usually acceptable.

auto_arima(... , seasonal=False) but got SARIMAX 😐

It's not really using a seasonal model. It's just a confusing message.

In the pmdarima library, in version v1.5.1 they changed the statistical model in use from ARIMA to a more flexible and less buggy model called SARIMAX. (It stands for Seasonal Autoregressive Integrated Moving Average Exogenous.)

Despite the name, you can use it in a non-seasonal way by setting the seasonal terms to zero.

You can double-check whether the model is seasonal or not by using the following code:

model = auto_arima(...)
print(model.seasonal_order)

If it shows as (0, 0, 0, 0), then no seasonality adjustment will be done.

ARIMA model producing a straight line prediction

Having gone through similar issue recently, I would recommend the following:

Visualize seasonal decomposition of the data to make sure that the seasonality exists in your data. Please make sure that the dataframe has frequency component in it. You can enforce frequency in pandas dataframe with the following :
dh = df.asfreq('W') #for weekly resampled data and fillnas with appropriate method

Here is a sample code to do seasonal decomposition:

import statsmodels.api as sm

decomposition = sm.tsa.seasonal_decompose(dh['value'], model='additive', 
                            extrapolate_trend='freq') #additive or multiplicative is data specific
fig = decomposition.plot()
plt.show()

The plot will show whether seasonality exists in your data. Please feel free to go through this amazing document regarding seasonal decomposition. Decomposition

If you're sure that the seasonal component of the model is 30, then you should be able to get a good result with pmdarima package. The package is extremely effective in finding optimal pdq values for your model. Here is the link to it: pmdarima
example code pmdarima

If you're unsure about seasonality, please consult with a domain expert about the seasonal effects of your data or try experimenting with different seasonal components in your model and estimate the error.

Please make sure that the stationarity of data is checked by Dickey-Fuller test before training the model. pmdarima supports finding d component with the following:

from pmdarima.arima import ndiffs
kpss_diff = ndiffs(dh['value'].values, alpha=0.05, test='kpss', max_d=12)
adf_diff = ndiffs(dh['value'].values, alpha=0.05, test='adf', max_d=12)
n_diffs = max(adf_diff , kpss_diff )

You may also find d with the help of the document I provided here. If the answer isn't helpful, please provide the data source for exchange rate. I will try to explain the process flow with a sample code.