## Which model performs better?

This is a follow-up to my prior article: Time Series Analysis with Prophet: Air Passenger Data

In this example, an ARIMA (Autoregressive Integrated Moving Average) model is built using R to forecast air passenger numbers using the **San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline.**

Specifically, adjusted passenger numbers for the airline **KLM** (enplaned) are filtered as the time series for analysis from the period **May 2005 to March 2016**.

The results will be compared to those obtained by Prophet to see how ARIMA performs relative to this model.

The model is trained on 90% of the time series (training set), with the latter 10% of the series being used to compare to the predictions (test set).

An ARIMA model consists of coordinates (p, d, q):

**p**stands for the number of autoregressive terms, i.e. the number of observations from past time values used to forecast future values. e.g. if the value of p is 2, then this means that two previous time observations in the series are being used to forecast the future trend.**d**denotes the number of differences needed to make the time series stationary (i.e. one with a constant mean, variance, and autocorrelation). For instance, if d = 1, then it means that a first-difference of the series must be obtained to transform it into a stationary one.**q**represents the moving average of the previous forecast errors in our model, or the lagged values of the error term. As an example, if q has a value of 1, then this means that we have one lagged value of the error term in the model.

## Stationarity

When it comes to forecasting, a key assumption for time series data is that of stationarity. This is where the mean, variance, and autocorrelation of a series are constant.

The reason for this assumption is that if the mean, variance and autocorrelation of a series are not constant over time — then it makes it a lot harder to use forecasting models to make predictions.

However, as we have noted above, the *d*** **parameter in the ARIMA model denotes the number of differences needed to make the series stationary. When using the **auto.arima **feature in R, the *p, d,* and *q* values are selected automatically.

Stationarity can usually be indicated visually through a visual inspection of the data.

Let’s have a look at the time series again.

We can see that — on the whole — the time series demonstrates a zig-zag pattern with little to no change in the overall trend over time.

However, this can formally be tested for using the Dickey-Fuller test. The null hypothesis is that the series has a unit root present (non-stationarity), with the alternative hypothesis being stationarity or trend stationarity.

The test for a unit root is as follows:

When running the test on the series, a p-value of 0.01 is obtained, meaning that the null hypothesis is rejected at the 5% level of significance — indicating stationarity is present in the series.

> adf.test(passengernumbers)Augmented Dickey-Fuller Testdata: passengernumbers

Dickey-Fuller = -8.061, Lag order = 4, p-value = 0.01

alternative hypothesis: stationary

## Autocorrelation

The **p **parameter in ARIMA accounts for the number of autoregressive terms in the model — which allows the model to handle autocorrelation. Autocorrelation is a situation where correlation exists between the error terms across different observations in the time series, and this can also skew forecasts.

Here is the ACF (autocorrelation function) plot for the time series (maximum lags of 20 are set):

We see that autocorrelations persist for two months after lag 0, with positive and negative correlations persisting at the 6-month and 12-month period. In this instance, the lags are monthly — hence the 6-month period is indicated as **0.5**, while the 12-month period is indicated as **1.0**.

The partial autocorrelation function seeks to remove indirect correlations that result from inherent linear functions that exist between each observation.

Here is the partial autocorrelation plot for the series:

We see that the partial autocorrelation function still shows correlations (albeit negative) persisting two months after lag 0. Therefore, the ARIMA model will need to be configured in such a way so as to account for the correct number of autoregressive terms.

Here is a general overview of the components for the time series trend:

`> components <- decompose(passengernumbers)`

> components

When looking at the series, significant seasonality patterns are observed roughly every six to eight months. With that in mind, the ARIMA model will need to account for such seasonality in its forecasts.

Given the visual indication of autocorrelation as well as seasonality, **auto.arima **is used to automatically generate many ARIMA models with differing p, d, and q coordinates — the best model is then selected based on the one with the lowest BIC (Bayesian Information Criterion).

> # ARIMA

> fitpassengers<-auto.arima(passengernumbers, trace=TRUE, test="kpss", ic="bic")ARIMA(2,0,2)(1,1,1)[12] with drift : 1693.37

ARIMA(0,0,0)(0,1,0)[12] with drift : 1779.685

ARIMA(1,0,0)(1,1,0)[12] with drift : 1682.899

ARIMA(0,0,1)(0,1,1)[12] with drift : 1724.722

ARIMA(0,0,0)(0,1,0)[12] : 1775.932

ARIMA(1,0,0)(0,1,0)[12] with drift : 1702.397

ARIMA(1,0,0)(2,1,0)[12] with drift : 1685.158

ARIMA(1,0,0)(1,1,1)[12] with drift : 1684.594

ARIMA(1,0,0)(0,1,1)[12] with drift : 1680.905

ARIMA(1,0,0)(0,1,2)[12] with drift : 1684.211

ARIMA(1,0,0)(1,1,2)[12] with drift : Inf

ARIMA(0,0,0)(0,1,1)[12] with drift : 1772.299

ARIMA(2,0,0)(0,1,1)[12] with drift : 1684.965

ARIMA(1,0,1)(0,1,1)[12] with drift : 1685.149

ARIMA(2,0,1)(0,1,1)[12] with drift : 1688.428

ARIMA(1,0,0)(0,1,1)[12] : 1676.443

ARIMA(1,0,0)(0,1,0)[12] : 1697.822

ARIMA(1,0,0)(1,1,1)[12] : 1680.163

ARIMA(1,0,0)(0,1,2)[12] : 1679.795

ARIMA(1,0,0)(1,1,0)[12] : 1678.467

ARIMA(1,0,0)(1,1,2)[12] : Inf

ARIMA(0,0,0)(0,1,1)[12] : 1768.607

ARIMA(2,0,0)(0,1,1)[12] : 1680.489

ARIMA(1,0,1)(0,1,1)[12] : 1680.677

ARIMA(0,0,1)(0,1,1)[12] : 1720.732

ARIMA(2,0,1)(0,1,1)[12] : 1683.954Best model: ARIMA(1,0,0)(0,1,1)[12]

The best model configuration is indicated as **ARIMA(1,0,0)(0,1,1)[12]**.

The coefficients for this model are as follows:

> fitpassengers

Series: passengernumbers

ARIMA(1,0,0)(0,1,1)[12]Coefficients:

ar1 sma1

0.7794 -0.5001

s.e. 0.0609 0.0840sigma^2 estimated as 585834: log likelihood=-831.27

AIC=1668.54 AICc=1668.78 BIC=1676.44

The ARIMA model generates the following forecasts:

The RMSE (root mean squared error) and MFE (mean forecast error) are as follows:

**RMSE:**698**MFE:**-115

> # Errors

> library(Metrics)

> rmse(df$`Actual Passenger Numbers`, df$`Forecasted Passenger Numbers`)[1] 698.6527> ## Forecast Error

[1] -115.7381

> forecast_error=(df$`Actual Passenger Numbers`-df$`Forecasted Passenger Numbers`)

> forecast_error

[1] -415.8149 -982.0972 -273.1147 -539.5008 -520.7138 -845.2062 -824.7485

[8] -212.4401 -454.4602 1387.1299 962.7507 153.2876 257.2238 687.3703

> mean(forecast_error)

One key difference between ARIMA and Prophet is that the Prophet model accounts for “change points”, or specific shifts in trend in the time series. While it is technically possible to do this with ARIMA in R — it requires use of a separate package called **AEDForecasting**.

Prophet works through use of an additive model whereby the non-linear trends in the series are fitted with the appropriate seasonality (whether daily, weekly, or yearly).

In this particular dataset, a shift in trend came after 2009 — when we saw a reversal of growth in the overall trend for passenger demand:

A standard Prophet model was fitted to the dataset, without any manual specifications for seasonality:

`prophet_basic = Prophet()`

prophet_basic.fit(train_dataset)

Through trial and error, **4** change points were shown to minimise the RMSE and MFE and therefore were defined as the appropriate number of change points in the model:

`pro_change= Prophet(n_changepoints=4)`

forecast = pro_change.fit(train_dataset).predict(future)

fig= pro_change.plot(forecast);

a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)

The RMSE and MAE are calculated as follows:

>>> from sklearn.metrics import mean_squared_error

>>> from math import sqrt

>>> mse = mean_squared_error(passenger_test, yhat14)

>>> rmse = sqrt(mse)

>>> print('RMSE: %f' % rmse)RMSE: 524.263928>>> forecast_error = (passenger_test-yhat14)

>>> forecast_error

>>> mean_forecast_error = np.mean(forecast_error)

>>> mean_forecast_error71.58326743881493

With an RMSE and MFE of **524** and **71** obtained with Prophet respectively, the errors are significantly lower than the errors of **698** and **-115** obtained with ARIMA.

From this standpoint, while ARIMA has accounted for patterns such as seasonality and autocorrelation in the time series — Prophet has excelled at correctly identifying the relevant change point in the series — and this appears to have boosted the forecast accuracy for the model.

It may be possible that ARIMA would excel when forecasting data with a more clearly defined trend or one that does not particularly exhibit seasonality — we saw very clear signs of this for the dataset in question. However, other cases where seasonality is not present may not be suitable for use with Prophet.

In this example, we have seen:

- How to configure an ARIMA model
- The relevance of change points in a time series
- Comparison of ARIMA and Prophet in calculating forecast accuracy

Many thanks for reading, and questions or feedback are always welcome.

*Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.*