ARIMA I

BUS 323 Forecasting and Risk Analysis

Unit root tests

  • Test whether a series has a unit root.
    • A series has a unit root if it is integrated of order 1.
    • That is, if it needs to be differenced to be stationary.
  • Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test and Augmented Dickey-Fuller (ADF) test most common.

Implementation

  • Use unitroot_kpss():
library(fpp3)
google_2015 <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2015)
google_2015 |>
  features(Close, unitroot_kpss)
# A tibble: 1 × 3
  Symbol kpss_stat kpss_pvalue
  <chr>      <dbl>       <dbl>
1 GOOG        3.56        0.01

Interpretation

  • unitroot_kpss() reports the p-value associated with the KPSS test.
  • \(H_{0}\) of the KPSS test is that the data are stationary.
    • p-value of above was 0.01 or less
    • \(\rightarrow\) reject \(H_{0}\)
    • \(\rightarrow\) data are non-stationary.
    • Difference the data and re-apply test.

Trying again…

  • Apply unit root test to Close':
google_2015 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, unitroot_kpss)
# A tibble: 1 × 3
  Symbol kpss_stat kpss_pvalue
  <chr>      <dbl>       <dbl>
1 GOOG      0.0989         0.1

How many differences?

  • Use unitroot_ndiffs() to find how mnay differences will be needed before the test detects a unit root.
    • Use unitroot_nsdiffs() for seasonal data.
  • Try it on monthly Australian retail turnover (from aus_retail).

Example: Australian retail turnover

aus_total_retail <- aus_retail |>
  summarise(Turnover = sum(Turnover)) 
aus_total_retail |>
  autoplot(Turnover)

Example: Australian retail turnover

  • Let’s try to log it:
aus_total_retail |>
  mutate(log_turnover = log(Turnover)) |>
  autoplot(log_turnover)

Example: Australian retail turnover

  • Try using unitroot_nsdiffs():
aus_total_retail |>
  mutate(log_turnover = log(Turnover)) |>
  features(log_turnover, unitroot_nsdiffs)
# A tibble: 1 × 1
  nsdiffs
    <int>
1       1

Example: Australian retail turnover

  • Apply seasonal difference, then perform unit root test again:
aus_total_retail |>
  mutate(log_turnover = difference(log(Turnover), 12)) |>
  features(log_turnover, unitroot_ndiffs)
# A tibble: 1 × 1
  ndiffs
   <int>
1      1

Example: Australian retail turnover

  • The test suggests one more difference is needed:

Example: Australian retail turnover

  • Difference the seasonally differenced data, then test again:
aus_total_retail |>
  mutate(diff_log_turnover = difference(difference(log(Turnover),12),1)) |>
  features(diff_log_turnover, unitroot_ndiffs)
# A tibble: 1 × 1
  ndiffs
   <int>
1      0

Example: Australian retail turnover

Autoregressive models

  • Forecast \(y_{t}\) based on past values of \(y\).
  • AR(\(p\)):

    \[ y_{t} = c + \phi_{1} y_{t-1} + \phi_{2} y_{t-2} + ... + \phi_{p} y_{t-p} + \epsilon_{t} \]

Autoregressive models

  • For an AR(1) model:
    • \(\phi_{1}=0\) and \(c=0\), \(y_{t}\) is white noise.
    • \(\phi_{1} =1\) and \(c=0\), \(y_{t}\) is a random walk.
    • \(\phi_{1}=1\) and \(c \neq 0\), \(y_{t}\) is a random walk with drift.
    • \(\phi_{1} < 0\) results in \(y_{t}\) oscillating around its mean.

Autoregressive models

  • AR models perform best on stationary data.
  • Necessary constraints on parameters:
    • For an AR(1): \(-1 < \phi_{1} < 1\).
    • For an AR(2): \(-1 < \phi_{2} < 1\); \(\phi_{1} + \phi_{2} < 1\); \(\phi_{2} - \phi_{1} < 1\).

Implementation

  • Use ARIMA() (more later). Specify \(p\) in the first argument of the function.
  • e.g. let’s estimate an AR(2) based on the stationary series we found above:
ar_fit <- aus_total_retail |>
  mutate(diff_log_turnover = difference(difference(log(Turnover),12),1)) |>
  model(ar2 = ARIMA(diff_log_turnover ~ pdq(2,0,0)))
glance(ar_fit)
# A tibble: 1 × 8
  .model   sigma2 log_lik    AIC   AICc    BIC ar_roots   ma_roots 
  <chr>     <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <list>     <list>   
1 ar2    0.000350   1089. -2168. -2168. -2148. <cpl [26]> <cpl [0]>

Exercise

  • Apply an AR(1) and AR(2) model to the stationary series derived last time. Which fits better? Try to plot the forecast and the data.