Distributional forecasts and prediction intervals

BUS 323 Forecasting and Risk Analysis

Forecast distributions

  • Forecasts have associated probability distribution
    • Typically Normal
    • Mean: point forecast

Prediction intervals

  • 95% interval for a Normally distributed \(h\)-step forecast:

    \[ \widehat{y}_{T+h|T} \pm 1.96 \widehat{\sigma}_{h} \]

    where \(\widehat{\sigma}_{h}\) is the standard error of the \(h\)-step forecast.

Forecast standard error

  • Where do we get \(\widehat{\sigma}_{h}\) from?
    • Need residual standard error first
    • Next, we can find the distribution standard error.

Forecast standard error: demonstration

  • e.g. for a mean forecast:

    \[ \widehat{y}_{T+h|T} = \bar{y} \]

    The residual is \(e_{T+h} = y_{T+h} - \bar{y}\).

  • \(y_{T+h} \sim N(0,\sigma)\).
  • \(\bar{y} \sim N(0,\sqrt{\frac{\sigma^{2}}{T}})\).
  • Thus \(e_{T+H} \sim N(0, \sqrt{\sigma^{2} + \frac{\sigma^{2}}{T}})\).

Forecast standard error: demonstration

  • Constant forecast with this method

    \(\rightarrow\) Constant forecast standard error of \(\sigma \sqrt{1 + \frac{1}{t}}\).

Forecast standard errors: other methods

  • Mean: \(\widehat{\sigma}_{h} = \widehat{\sigma} \sqrt{1 + \frac{1}{T}}\)
  • Naive: \(\widehat{\sigma}_{h} = \widehat{\sigma} \sqrt{h}\)
  • Seasonal naive: \(\widehat{\sigma}_{h} = \widehat{\sigma} \sqrt{k+1}\)
    • where \(k\) represents \(\frac{h-1}{m}\) and \(m\) represents the seasonal period
  • Drift: \(\widehat{\sigma}_{h} = \widehat{\sigma} \sqrt{h(1 + \frac{h}{T-1})}\)

Computing prediction intervals with R

  • Using the pakcage fable (within fpp3) and function hilo():
preds <- google_2015 |>
  model(NAIVE(Close)) |>
  forecast(h = 10) |>
  hilo()
kable(preds)
Symbol .model day Close .mean 80% 95%
GOOG NAIVE(Close) 253 N(759, 125) 758.88 [744.5400, 773.2200]80 [736.9488, 780.8112]95
GOOG NAIVE(Close) 254 N(759, 250) 758.88 [738.6001, 779.1599]80 [727.8646, 789.8954]95
GOOG NAIVE(Close) 255 N(759, 376) 758.88 [734.0423, 783.7177]80 [720.8941, 796.8659]95
GOOG NAIVE(Close) 256 N(759, 501) 758.88 [730.1999, 787.5601]80 [715.0176, 802.7424]95
GOOG NAIVE(Close) 257 N(759, 626) 758.88 [726.8147, 790.9453]80 [709.8404, 807.9196]95
GOOG NAIVE(Close) 258 N(759, 751) 758.88 [723.7543, 794.0058]80 [705.1598, 812.6002]95
GOOG NAIVE(Close) 259 N(759, 876) 758.88 [720.9399, 796.8202]80 [700.8556, 816.9045]95
GOOG NAIVE(Close) 260 N(759, 1002) 758.88 [718.3203, 799.4397]80 [696.8493, 820.9108]95
GOOG NAIVE(Close) 261 N(759, 1127) 758.88 [715.8599, 801.9001]80 [693.0865, 824.6735]95
GOOG NAIVE(Close) 262 N(759, 1252) 758.88 [713.5329, 804.2272]80 [689.5275, 828.2325]95

Computing prediction intervals with R

  • Use the level option for different levels of confidence.

Prediction intervals from bootstrapped residuals

  • Normal distribution not always a reasonable assumption.
    • Bootstrapping: non-parametric method of generating prediction intervals
  • Suppose a naive forecast: \(y_{t} = y_{t-1} + e_{t}\).
    • Bootstrapping assumes future residuals will be similar to past residuals.
    • For \(t>T\), use \(e_{t|t \leq T}\) to model \(e_{t}\).

prediction intervals from bootstrapped residuals

  • Thus we can simulate: \(y_{T+1}^{*} = y_{T} + e_{T+1}^{*}\).
    • Where \(e_{T+1}^{*}\) is a randomly sampled observed residual. \(y_{T+1}^{*}\) is the forecast under that particular residual value.
  • Repeat: \(y_{T+2}^{*} = y_{T+1}^{*} + e_{T+2}^{*}\).
    • We can obtain many possible futures by repeating this process.

Bootstrap prediction intervals demonstration

  • To see how this works, use the generate() function:
fit <- google_2015 |>
  model(NAIVE(Close))
sim <- fit |> generate(h = 30, times = 5, bootstrap = TRUE)
sim
# A tsibble: 150 x 5 [1]
# Key:       Symbol, .model, .rep [5]
   Symbol .model         day .rep   .sim
   <chr>  <chr>        <dbl> <chr> <dbl>
 1 GOOG   NAIVE(Close)   253 1      761.
 2 GOOG   NAIVE(Close)   254 1      743.
 3 GOOG   NAIVE(Close)   255 1      731.
 4 GOOG   NAIVE(Close)   256 1      730.
 5 GOOG   NAIVE(Close)   257 1      728.
 6 GOOG   NAIVE(Close)   258 1      730.
 7 GOOG   NAIVE(Close)   259 1      719.
 8 GOOG   NAIVE(Close)   260 1      712.
 9 GOOG   NAIVE(Close)   261 1      706.
10 GOOG   NAIVE(Close)   262 1      717.
# ℹ 140 more rows

Bootstrap prediction intervals demonstration

  • Here are our five simulated futures plotted:
google_2015 |>
  ggplot(aes(x = day)) +
  geom_line(aes(y = Close)) +
  geom_line(aes(y = .sim, colour = as.factor(.rep)),
    data = sim) +
  labs(title="Google daily closing stock price", y="$US" ) +
  guides(colour = "none")

Bootstrap prediction intervals demonstration

  • Here are our five simulated futures plotted:

Bootstrap prediction intervals: bootstrapped distribution

  • With these we can manufacture a “probability distribution”.
  • Luckily there is a bootstrap option built into forecast():
fc <- fit |> forecast(h=30, bootstrap=TRUE)
fc
# A fable: 30 x 5 [1]
# Key:     Symbol, .model [1]
   Symbol .model         day        Close .mean
   <chr>  <chr>        <dbl>       <dist> <dbl>
 1 GOOG   NAIVE(Close)   253 sample[5000]  759.
 2 GOOG   NAIVE(Close)   254 sample[5000]  759.
 3 GOOG   NAIVE(Close)   255 sample[5000]  759.
 4 GOOG   NAIVE(Close)   256 sample[5000]  759.
 5 GOOG   NAIVE(Close)   257 sample[5000]  759.
 6 GOOG   NAIVE(Close)   258 sample[5000]  759.
 7 GOOG   NAIVE(Close)   259 sample[5000]  759.
 8 GOOG   NAIVE(Close)   260 sample[5000]  759.
 9 GOOG   NAIVE(Close)   261 sample[5000]  759.
10 GOOG   NAIVE(Close)   262 sample[5000]  759.
# ℹ 20 more rows

Bootstrap prediction intervals: bootstrapped distribution

  • Note that the .mean column is the mean of the bootstraps.
autoplot(fc, google_2015) +
  labs(title="Google daily closing stock price", y="$US" )

Bootstrapping in R

  • Number of samples is controlled with the times option:
bootstraps <- google_2015 |>
  model(NAIVE(Close)) |>
  forecast(h = 10, bootstrap = TRUE, times = 1000) |>
  hilo()
kable(bootstraps)
Symbol .model day Close .mean 80% 95%
GOOG NAIVE(Close) 253 sample[1000] 759.1838 [747.9560, 771.8960]80 [743.7060, 781.6654]95
GOOG NAIVE(Close) 254 sample[1000] 759.2800 [742.6378, 775.2679]80 [733.9835, 796.0598]95
GOOG NAIVE(Close) 255 sample[1000] 759.3544 [739.1602, 780.3466]80 [730.2132, 804.6933]95
GOOG NAIVE(Close) 256 sample[1000] 759.5816 [735.9287, 783.4072]80 [725.6406, 818.4894]95
GOOG NAIVE(Close) 257 sample[1000] 759.9708 [733.4634, 786.7182]80 [721.3592, 829.4516]95
GOOG NAIVE(Close) 258 sample[1000] 759.1356 [729.9575, 789.9693]80 [715.2863, 833.4845]95
GOOG NAIVE(Close) 259 sample[1000] 758.8663 [728.2751, 793.5958]80 [711.3703, 835.3340]95
GOOG NAIVE(Close) 260 sample[1000] 759.1014 [725.5190, 797.5592]80 [707.3571, 837.5594]95
GOOG NAIVE(Close) 261 sample[1000] 759.0371 [722.4588, 799.1424]80 [704.7527, 841.6002]95
GOOG NAIVE(Close) 262 sample[1000] 758.8098 [720.8937, 798.4185]80 [703.2162, 843.4535]95

Exercise

  • Create a 90% and 99% prediction interval for the forecast you produced last time.
  • Create a bootstrapped forecast for the forecast you produced last time (try a bunch of different number of bootstraps)