Time Series Forecasting using ARIMA

September 21, 2024 by Mostafa Abdelrahman

ARIMA is a well-known model for time series forecasting. In this post we will go over the building blocks: MA(q) models, AR(p) models and ARMA(p,q) models of such a powerful model.

ARIMA(p, q, d) model is a combination of:

  • I(d): integration (reverse of differencing) component
  • AR(p): auto regressive component
  • MA(q): moving average component

Integration: I(d)

The moving average (MA) and auto regressive (AR) models, requires the time series to be stationary. A time series is tested for stationarity using the Augmented Dickey–Fuller test(ADF) test and a non-stationary time series data is made stationary using differencing. We are interested in the order of integration $d$ which is the minimum number of times a series must be differenced to become stationary. The differenced time series $y^{(d)}$ is then used in subsequent models.

Auto Regressive: AR(p)

The auto regressive (AR) model is a regression of a variable against itself i.e, the current value is a linear combination of the past values. The current value is represented as:

$$ y_t = C + \epsilon_t + \phi_1y_{t-1} +_ … \phi_py_{t-p} $$

Where:

  • $C$ is a constant term,
  • $\epsilon_t$ is the current error term $i$,
  • $\phi_i$ is the coefficient of the series value at time $i$
  • $p$ is the order of the AR model.

Moving Average: MA(q)

The moving average (MA) model, states that the current value is a linear combination on the current and past error terms. The current value is represented as:

$$ y_t = \mu + \epsilon_t + \theta_1\epsilon_{t-1} +_ … \theta_q\epsilon_{t-q} $$

Where:

  • $\mu$ is the series average,
  • $\epsilon_i$ are the error terms at time $i$,
  • $\theta_i$ is the coefficient of the error term at time $i$
  • $q$ is the order of the MA model.

Auto Regressive Moving Average: ARMA(p, q)

The auto regressive moving average (ARMA) model is a combination of the moving average (MR) model and the auto regressive (AR).

$$ y_t = C + \phi_1y_{t-1} + … \phi_py_{t-p} + \mu + \theta_1\epsilon_{t-1} + … \theta_q\epsilon_{t-q} + \epsilon_t $$

Where:

  • $C$ is a constant term,
  • $\mu$ is the series average,
  • $\epsilon_i$ are the error terms at time $i$,
  • $q$ is the order of the MA model.
  • $\theta_i$ is the coefficient of the error term at time $i$
  • $p$ is the order of the AR model.
  • $\phi_i$ is the coefficient of the series value at time $i$

Putting it all together

The auto regressive integrated moving average (ARIMA) model is the same as the ARIMA model along with the integration component i.e., it is a combination of the moving average (MR) model, the auto regressive (AR) model using the differenced series.

$$ y_{t}^{(d)} = C + \phi_1y_{t-1}^{(d)} + … \phi_py_{t-p}^{(d)} + \mu + \theta_1\epsilon_{t-1}^{(d)} + … \theta_q\epsilon_{t-q}^{(d)} + \epsilon_t $$

Fortunately, in practice we don’t have to implement these models from scratch as ARIMA implementation is available in statsmodel. They also provide implementation for more advanced models such as SARIMA and SARIMAX that can handle seasonality and external features. I recommend Time Series Forecasting in Python book to dive deeper into the fascinating world of time series forecasting.