Time series regression in R

BUS 323 Forecasting and Risk Analysis

Getting set up

  • We’ll work with the us_change dataset today.
head(us_change)
# A tsibble: 6 x 6 [1Q]
  Quarter Consumption Income Production Savings Unemployment
    <qtr>       <dbl>  <dbl>      <dbl>   <dbl>        <dbl>
1 1970 Q1       0.619  1.04      -2.45     5.30        0.9  
2 1970 Q2       0.452  1.23      -0.551    7.79        0.5  
3 1970 Q3       0.873  1.59      -0.359    7.40        0.5  
4 1970 Q4      -0.272 -0.240     -2.19     1.17        0.700
5 1971 Q1       1.90   1.98       1.91     3.54       -0.100
6 1971 Q2       0.915  1.45       0.902    5.87       -0.100

Plotting multiple series on a time plot

  • autoplot plots a single series across time.
  • We want to plot consumption and income change on one plot.
  • To plot two, we’ll have to transform the data such that both series correspond to one variable. We can use pivot_longer() to do so.
us_change |>
  pivot_longer(c(Consumption, Income), names_to="Series")

Plotting multiple series on a time plot

  • Great, now we can autoplot value
us_change |>
  pivot_longer(c(Consumption, Income), names_to="Series") |>
    autoplot(value) +
      labs(y = "% change")

Plotting a line of best fit

  • Adding a regression line to a scatterplot is simple with geom_smooth and the method = "lm" option:
us_change |>
  ggplot(aes(x = Income, y = Consumption)) +
  labs(y = "Consumption (quarterly % change)",
        x = "Income (quarterly % change)") +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Plotting a line of best fit

Estimating a regression function

  • Suppose we want to estimate the following regression fucntion: \(y_{t} = \beta_{0} + \beta_{1} x_{t} + \epsilon_{t}\).
  • Use TSLM() to do so:
us_change |>
  model(TSLM(Consumption ~ Income)) |>
  report()
  • Use TSLM() instead of lm() because it has time series-friendly options built in.

Regression results

Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-2.58236 -0.27777  0.01862  0.32330  1.42229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
Income       0.27183    0.04673   5.817  2.4e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08

Multiple linear regression

  • Regression with more than one regressor is called multiple regression: \(y_{t} = \beta_{0} + \beta_{1} x_{1,t} + \beta_{2} x_{2,t} + ... + \beta_{k} x_{k,t} + \epsilon_{t}\)
  • Coefficients reflect conditional marginal effects

Multiple regression

  • Pick a few variables in us_change to include in a multiple regression against Consumption.

Consumption/production

Consumption/savings

Consumption/unemployment

Multiple regression model

mr_c <- us_change |>
  model(TSLM(Consumption ~ Income + Production + Unemployment))
report(mr_c)

Multiple regression results

Series: Consumption 
Model: TSLM 

Residuals:
       Min         1Q     Median         3Q        Max 
-1.5973206 -0.3317848 -0.0004972  0.2948522  1.6881414 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.55973    0.04915  11.389  < 2e-16 ***
Income        0.18044    0.04175   4.322 2.47e-05 ***
Production    0.10227    0.03756   2.722  0.00707 ** 
Unemployment -0.49026    0.15376  -3.188  0.00167 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.508 on 194 degrees of freedom
Multiple R-squared: 0.3755, Adjusted R-squared: 0.3658
F-statistic: 38.88 on 3 and 194 DF, p-value: < 2.22e-16

Fitted values

  • Estimating the regression allows us to obtain fitted (predicted) values for \(y\): \(\hat{y}_{t} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{1,t} + \hat{\beta}_{2} x_{2,t} + ... + \hat{\beta}_{k} x_{k,t} + \epsilon_{t}\)
  • Note these are predictions for the actual observed values, not forecasts.

Accessing fitted values

  • With a tslm object, use augment() to access fitted values:
augment(mr_c)
# A tsibble: 198 x 6 [1Q]
# Key:       .model [1]
   .model                            Quarter Consumption .fitted  .resid  .innov
   <chr>                               <qtr>       <dbl>   <dbl>   <dbl>   <dbl>
 1 TSLM(Consumption ~ Income + Prod… 1970 Q1       0.619  0.0562  0.562   0.562 
 2 TSLM(Consumption ~ Income + Prod… 1970 Q2       0.452  0.479  -0.0274 -0.0274
 3 TSLM(Consumption ~ Income + Prod… 1970 Q3       0.873  0.564   0.309   0.309 
 4 TSLM(Consumption ~ Income + Prod… 1970 Q4      -0.272 -0.0502 -0.222  -0.222 
 5 TSLM(Consumption ~ Income + Prod… 1971 Q1       1.90   1.16    0.741   0.741 
 6 TSLM(Consumption ~ Income + Prod… 1971 Q2       0.915  0.962  -0.0470 -0.0470
 7 TSLM(Consumption ~ Income + Prod… 1971 Q3       0.794  0.636   0.158   0.158 
 8 TSLM(Consumption ~ Income + Prod… 1971 Q4       1.65   1.00    0.642   0.642 
 9 TSLM(Consumption ~ Income + Prod… 1972 Q1       1.31   1.17    0.146   0.146 
10 TSLM(Consumption ~ Income + Prod… 1972 Q2       1.89   0.988   0.897   0.897 
# ℹ 188 more rows

Plotting fitted values

  • Let’s see how well our predicted values line up with reality.
augment(mr_c) |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Consumption, colour = "Data")) +
  geom_line(aes(y = .fitted, colour = "Fitted")) +
  labs(y = NULL,
  title = "Percent change in consumption expenditure") +
  scale_colour_manual(values=c(Data="black",Fitted="red")) +
  guides(colour = guide_legend(title = NULL))

Plotting fitted values

Plotting fitted values: scatter

  • We could scatter \(y\) against \(\hat{y}\):
augment(mr_c) |>
  ggplot(aes(x = Consumption, y = .fitted)) +
  geom_point() +
  labs(x = "Data",
  y = "Fitted values",
  title = "% change in consumption expenditure") +
  geom_abline(intercept = 0, slope = 1)

Plotting fitted values: scatter

Goodness-of-fit

  • The coefficient of determination, \(R^{2}\) is a good summary measure of how well your model fits the data:

\(R^{2} = \frac{\sum(\hat{y}_{t} - \bar{y})^{2}}{\sum(y_{t} - \bar{y})^{2}}\)

  • It gives the proportion of variation in \(y\) explained by \(\hat{y}\).
  • Note \(R^{2}\) never decreases when adding predictors. Not necessarily a good metric for evaluating a forecast model.
  • \(R^{2}\) of above model is 0.3755.