Time series cross-validation

BUS 323 Forecasting and Risk Analysis

Cross-validation

Many training sets, many test sets.
Can be used for one- or multi-step forecasts.

`stretch_tsibble()`

stretch_tsibble() can be used to create many training sets.
- .init defines length of initial training set.
- .step defines by how much each successive training set increases in length.

Example: Google stock data

Using our 2015 Google stock price data again:

library(fpp3)
google_stock <- gafa_stock |>
    filter(Symbol == "GOOG", year(Date) >= 2015) |>
    mutate(day = row_number()) |>
    update_tsibble(index = day, regular = TRUE)
# Filter for training set
google_2015 <- google_stock |> filter(year(Date) == 2015)

Example: Google stock data

Use stretch_tsibble() with .init=3 and .step=1.
Use relocate() to put index variables up front:

google_2015_tr <- google_2015 |>
  stretch_tsibble(.init = 3, .step = 1) |>
  relocate(Date, Symbol, .id)
google_2015_tr

# A tsibble: 31,875 x 10 [1]
# Key:       Symbol, .id [250]
   Date       Symbol   .id  Open  High   Low Close Adj_Close  Volume   day
   <date>     <chr>  <int> <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl> <int>
 1 2015-01-02 GOOG       1  526.  528.  521.  522.      522. 1447600     1
 2 2015-01-05 GOOG       1  520.  521.  510.  511.      511. 2059800     2
 3 2015-01-06 GOOG       1  512.  513.  498.  499.      499. 2899900     3
 4 2015-01-02 GOOG       2  526.  528.  521.  522.      522. 1447600     1
 5 2015-01-05 GOOG       2  520.  521.  510.  511.      511. 2059800     2
 6 2015-01-06 GOOG       2  512.  513.  498.  499.      499. 2899900     3
 7 2015-01-07 GOOG       2  504.  504.  497.  498.      498. 2065100     4
 8 2015-01-02 GOOG       3  526.  528.  521.  522.      522. 1447600     1
 9 2015-01-05 GOOG       3  520.  521.  510.  511.      511. 2059800     2
10 2015-01-06 GOOG       3  512.  513.  498.  499.      499. 2899900     3
# ℹ 31,865 more rows

Example: Google stock data

Use accuracy() to evaluate forecast accuracy across training sets:

google_2015_tr |>
  model(RW(Close ~ drift())) |>
  forecast(h = 1) |>
  accuracy(google_2015)

# A tibble: 1 × 11
  .model           Symbol .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
  <chr>            <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 RW(Close ~ drif… GOOG   Test  0.726  11.3  7.26 0.112  1.19  1.02  1.01 0.0985

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts:

fc <- google_2015_tr |>
  model(RW(Close ~ drift())) |>
  forecast(h = 8)

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts:

fc <- google_2015_tr |>
  model(RW(Close ~ drift())) |>
  forecast(h = 8) |>
  group_by(.id) |>
  mutate(h = row_number())

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts:

fc <- google_2015_tr |>
  model(RW(Close ~ drift())) |>
  forecast(h = 8) |>
  group_by(.id) |>
  mutate(h = row_number()) |>
  ungroup() |>
  as_fable(response = "Close", distribution = Close)

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts:

fc |>
  accuracy(google_2015, by = c("h", ".model"))

# A tibble: 8 × 11
      h .model            .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
  <int> <chr>             <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1     1 RW(Close ~ drift… Test  0.726  11.3  7.26 0.112  1.19  1.02  1.01 0.0985
2     2 RW(Close ~ drift… Test  1.52   16.9 11.0  0.228  1.82  1.54  1.51 0.509 
3     3 RW(Close ~ drift… Test  2.34   20.7 14.0  0.350  2.32  1.96  1.85 0.668 
4     4 RW(Close ~ drift… Test  3.15   23.7 16.0  0.474  2.65  2.24  2.12 0.749 
5     5 RW(Close ~ drift… Test  3.93   26.4 17.7  0.596  2.94  2.48  2.36 0.791 
6     6 RW(Close ~ drift… Test  4.71   28.7 19.3  0.718  3.21  2.71  2.57 0.827 
7     7 RW(Close ~ drift… Test  5.49   31.2 21.3  0.837  3.53  2.99  2.79 0.833 
8     8 RW(Close ~ drift… Test  6.27   33.3 22.6  0.958  3.75  3.17  2.98 0.848

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts:

fc |>
  accuracy(google_2015, by = "h") |>
  ggplot(aes(x = h, y = RMSE)) +
  geom_point()

Example: Google stock data

We can evaluate the accuracy of multi-step forecasts: