Which model performs better?
This is a follow-up to my prior article: Time Series Analysis with Prophet: Air Passenger Data
In this example, an ARIMA (Autoregressive Integrated Moving Average) model is built using R to forecast air passenger numbers using the San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline.
Specifically, adjusted passenger numbers for the airline KLM (enplaned) are filtered as the time series for analysis from the period May 2005 to March 2016.
The results will be compared to those obtained by Prophet to see how ARIMA performs relative to this model.
The model is trained on 90% of the time series (training set), with the latter 10% of the series being used to compare to the predictions (test set).
An ARIMA model consists of coordinates (p, d, q):
- p stands for the number of autoregressive terms, i.e. the number of observations from past time values used to forecast future values. e.g. if the value of p is 2, then this means that two previous time observations in the series are being used to forecast the future trend.
- d denotes the number of differences needed to make the time series stationary (i.e. one with a constant mean, variance, and autocorrelation). For instance, if d = 1, then it means that a first-difference of the series must be obtained to transform it into a stationary one.
- q represents the moving average of the previous forecast errors in our model, or the lagged values of the error term. As an example, if q has a value of 1, then this means that we have one lagged value of the error term in the model.
When it comes to forecasting, a key assumption for time series data is that of stationarity. This is where the mean, variance, and autocorrelation of a series are constant.
The reason for this assumption is that if the mean, variance and autocorrelation of a series are not constant over time — then it makes it a lot harder to use forecasting models to make predictions.
However, as we have noted above, the d parameter in the ARIMA model denotes the number of differences needed to make the series stationary. When using the auto.arima feature in R, the p, d, and q values are selected automatically.
Stationarity can usually be indicated visually through a visual inspection of the data.
Let’s have a look at the time series again.
We can see that — on the whole — the time series demonstrates a zig-zag pattern with little to no change in the overall trend over time.
However, this can formally be tested for using the Dickey-Fuller test. The null hypothesis is that the series has a unit root present (non-stationarity), with the alternative hypothesis being stationarity or trend stationarity.
The test for a unit root is as follows:
When running the test on the series, a p-value of 0.01 is obtained, meaning that the null hypothesis is rejected at the 5% level of significance — indicating stationarity is present in the series.
> adf.test(passengernumbers)Augmented Dickey-Fuller Testdata: passengernumbers
Dickey-Fuller = -8.061, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
The p parameter in ARIMA accounts for the number of autoregressive terms in the model — which allows the model to handle autocorrelation. Autocorrelation is a situation where correlation exists between the error terms across different observations in the time series, and this can also skew forecasts.
Here is the ACF (autocorrelation function) plot for the time series (maximum lags of 20 are set):
We see that autocorrelations persist for two months after lag 0, with positive and negative correlations persisting at the 6-month and 12-month period. In this instance, the lags are monthly — hence the 6-month period is indicated as 0.5, while the 12-month period is indicated as 1.0.
The partial autocorrelation function seeks to remove indirect correlations that result from inherent linear functions that exist between each observation.
Here is the partial autocorrelation plot for the series:
We see that the partial autocorrelation function still shows correlations (albeit negative) persisting two months after lag 0. Therefore, the ARIMA model will need to be configured in such a way so as to account for the correct number of autoregressive terms.
Here is a general overview of the components for the time series trend:
> components <- decompose(passengernumbers)
When looking at the series, significant seasonality patterns are observed roughly every six to eight months. With that in mind, the ARIMA model will need to account for such seasonality in its forecasts.
Given the visual indication of autocorrelation as well as seasonality, auto.arima is used to automatically generate many ARIMA models with differing p, d, and q coordinates — the best model is then selected based on the one with the lowest BIC (Bayesian Information Criterion).
> # ARIMA
> fitpassengers<-auto.arima(passengernumbers, trace=TRUE, test="kpss", ic="bic")ARIMA(2,0,2)(1,1,1) with drift : 1693.37
ARIMA(0,0,0)(0,1,0) with drift : 1779.685
ARIMA(1,0,0)(1,1,0) with drift : 1682.899
ARIMA(0,0,1)(0,1,1) with drift : 1724.722
ARIMA(0,0,0)(0,1,0) : 1775.932
ARIMA(1,0,0)(0,1,0) with drift : 1702.397
ARIMA(1,0,0)(2,1,0) with drift : 1685.158
ARIMA(1,0,0)(1,1,1) with drift : 1684.594
ARIMA(1,0,0)(0,1,1) with drift : 1680.905
ARIMA(1,0,0)(0,1,2) with drift : 1684.211
ARIMA(1,0,0)(1,1,2) with drift : Inf
ARIMA(0,0,0)(0,1,1) with drift : 1772.299
ARIMA(2,0,0)(0,1,1) with drift : 1684.965
ARIMA(1,0,1)(0,1,1) with drift : 1685.149
ARIMA(2,0,1)(0,1,1) with drift : 1688.428
ARIMA(1,0,0)(0,1,1) : 1676.443
ARIMA(1,0,0)(0,1,0) : 1697.822
ARIMA(1,0,0)(1,1,1) : 1680.163
ARIMA(1,0,0)(0,1,2) : 1679.795
ARIMA(1,0,0)(1,1,0) : 1678.467
ARIMA(1,0,0)(1,1,2) : Inf
ARIMA(0,0,0)(0,1,1) : 1768.607
ARIMA(2,0,0)(0,1,1) : 1680.489
ARIMA(1,0,1)(0,1,1) : 1680.677
ARIMA(0,0,1)(0,1,1) : 1720.732
ARIMA(2,0,1)(0,1,1) : 1683.954Best model: ARIMA(1,0,0)(0,1,1)
The best model configuration is indicated as ARIMA(1,0,0)(0,1,1).
The coefficients for this model are as follows:
s.e. 0.0609 0.0840sigma^2 estimated as 585834: log likelihood=-831.27
AIC=1668.54 AICc=1668.78 BIC=1676.44
The ARIMA model generates the following forecasts:
The RMSE (root mean squared error) and MFE (mean forecast error) are as follows:
- RMSE: 698
- MFE: -115
> # Errors
> rmse(df$`Actual Passenger Numbers`, df$`Forecasted Passenger Numbers`) 698.6527
> ## Forecast Error -115.7381
> forecast_error=(df$`Actual Passenger Numbers`-df$`Forecasted Passenger Numbers`)
 -415.8149 -982.0972 -273.1147 -539.5008 -520.7138 -845.2062 -824.7485
 -212.4401 -454.4602 1387.1299 962.7507 153.2876 257.2238 687.3703
One key difference between ARIMA and Prophet is that the Prophet model accounts for “change points”, or specific shifts in trend in the time series. While it is technically possible to do this with ARIMA in R — it requires use of a separate package called AEDForecasting.
Prophet works through use of an additive model whereby the non-linear trends in the series are fitted with the appropriate seasonality (whether daily, weekly, or yearly).
In this particular dataset, a shift in trend came after 2009 — when we saw a reversal of growth in the overall trend for passenger demand:
A standard Prophet model was fitted to the dataset, without any manual specifications for seasonality:
prophet_basic = Prophet()
Through trial and error, 4 change points were shown to minimise the RMSE and MFE and therefore were defined as the appropriate number of change points in the model:
forecast = pro_change.fit(train_dataset).predict(future)
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)
The RMSE and MAE are calculated as follows:
>>> from sklearn.metrics import mean_squared_error
>>> from math import sqrt
>>> mse = mean_squared_error(passenger_test, yhat14)
>>> rmse = sqrt(mse)
>>> print('RMSE: %f' % rmse)RMSE: 524.263928>>> forecast_error = (passenger_test-yhat14)
>>> mean_forecast_error = np.mean(forecast_error)
With an RMSE and MFE of 524 and 71 obtained with Prophet respectively, the errors are significantly lower than the errors of 698 and -115 obtained with ARIMA.
From this standpoint, while ARIMA has accounted for patterns such as seasonality and autocorrelation in the time series — Prophet has excelled at correctly identifying the relevant change point in the series — and this appears to have boosted the forecast accuracy for the model.
It may be possible that ARIMA would excel when forecasting data with a more clearly defined trend or one that does not particularly exhibit seasonality — we saw very clear signs of this for the dataset in question. However, other cases where seasonality is not present may not be suitable for use with Prophet.
In this example, we have seen:
- How to configure an ARIMA model
- The relevance of change points in a time series
- Comparison of ARIMA and Prophet in calculating forecast accuracy
Many thanks for reading, and questions or feedback are always welcome.
Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.