Time series regression in R

BUS 323 Forecasting and Risk Analysis

Getting set up

We’ll work with the us_change dataset today.

head(us_change)

# A tsibble: 6 x 6 [1Q]
  Quarter Consumption Income Production Savings Unemployment
    <qtr>       <dbl>  <dbl>      <dbl>   <dbl>        <dbl>
1 1970 Q1       0.619  1.04      -2.45     5.30        0.9  
2 1970 Q2       0.452  1.23      -0.551    7.79        0.5  
3 1970 Q3       0.873  1.59      -0.359    7.40        0.5  
4 1970 Q4      -0.272 -0.240     -2.19     1.17        0.700
5 1971 Q1       1.90   1.98       1.91     3.54       -0.100
6 1971 Q2       0.915  1.45       0.902    5.87       -0.100

Plotting multiple series on a time plot

autoplot plots a single series across time.
We want to plot consumption and income change on one plot.
To plot two, we’ll have to transform the data such that both series correspond to one variable. We can use pivot_longer() to do so.

us_change |>
  pivot_longer(c(Consumption, Income), names_to="Series")

Plotting multiple series on a time plot

Great, now we can autoplot value

us_change |>
  pivot_longer(c(Consumption, Income), names_to="Series") |>
    autoplot(value) +
      labs(y = "% change")

Plotting a line of best fit

Adding a regression line to a scatterplot is simple with geom_smooth and the method = "lm" option:

us_change |>
  ggplot(aes(x = Income, y = Consumption)) +
  labs(y = "Consumption (quarterly % change)",
        x = "Income (quarterly % change)") +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Plotting a line of best fit

Estimating a regression function

Suppose we want to estimate the following regression fucntion: \(y_{t} = \beta_{0} + \beta_{1} x_{t} + \epsilon_{t}\).
Use TSLM() to do so:

us_change |>
  model(TSLM(Consumption ~ Income)) |>
  report()

Use TSLM() instead of lm() because it has time series-friendly options built in.

Regression results

Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-2.58236 -0.27777  0.01862  0.32330  1.42229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
Income       0.27183    0.04673   5.817  2.4e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08

Multiple linear regression

Regression with more than one regressor is called multiple regression: \(y_{t} = \beta_{0} + \beta_{1} x_{1,t} + \beta_{2} x_{2,t} + ... + \beta_{k} x_{k,t} + \epsilon_{t}\)
Coefficients reflect conditional marginal effects

Multiple regression

Pick a few variables in us_change to include in a multiple regression against Consumption.

Consumption/production

Consumption/savings

Consumption/unemployment

Multiple regression model

mr_c <- us_change |>
  model(TSLM(Consumption ~ Income + Production + Unemployment))
report(mr_c)

Multiple regression results

Series: Consumption 
Model: TSLM 

Residuals:
       Min         1Q     Median         3Q        Max 
-1.5973206 -0.3317848 -0.0004972  0.2948522  1.6881414 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.55973    0.04915  11.389  < 2e-16 ***
Income        0.18044    0.04175   4.322 2.47e-05 ***
Production    0.10227    0.03756   2.722  0.00707 ** 
Unemployment -0.49026    0.15376  -3.188  0.00167 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.508 on 194 degrees of freedom
Multiple R-squared: 0.3755, Adjusted R-squared: 0.3658
F-statistic: 38.88 on 3 and 194 DF, p-value: < 2.22e-16

Fitted values

Estimating the regression allows us to obtain fitted (predicted) values for \(y\): \(\hat{y}_{t} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{1,t} + \hat{\beta}_{2} x_{2,t} + ... + \hat{\beta}_{k} x_{k,t} + \epsilon_{t}\)
Note these are predictions for the actual observed values, not forecasts.

Accessing fitted values

With a tslm object, use augment() to access fitted values:

augment(mr_c)

# A tsibble: 198 x 6 [1Q]
# Key:       .model [1]
   .model                            Quarter Consumption .fitted  .resid  .innov
   <chr>                               <qtr>       <dbl>   <dbl>   <dbl>   <dbl>
 1 TSLM(Consumption ~ Income + Prod… 1970 Q1       0.619  0.0562  0.562   0.562 
 2 TSLM(Consumption ~ Income + Prod… 1970 Q2       0.452  0.479  -0.0274 -0.0274
 3 TSLM(Consumption ~ Income + Prod… 1970 Q3       0.873  0.564   0.309   0.309 
 4 TSLM(Consumption ~ Income + Prod… 1970 Q4      -0.272 -0.0502 -0.222  -0.222 
 5 TSLM(Consumption ~ Income + Prod… 1971 Q1       1.90   1.16    0.741   0.741 
 6 TSLM(Consumption ~ Income + Prod… 1971 Q2       0.915  0.962  -0.0470 -0.0470
 7 TSLM(Consumption ~ Income + Prod… 1971 Q3       0.794  0.636   0.158   0.158 
 8 TSLM(Consumption ~ Income + Prod… 1971 Q4       1.65   1.00    0.642   0.642 
 9 TSLM(Consumption ~ Income + Prod… 1972 Q1       1.31   1.17    0.146   0.146 
10 TSLM(Consumption ~ Income + Prod… 1972 Q2       1.89   0.988   0.897   0.897 
# ℹ 188 more rows

Plotting fitted values

Let’s see how well our predicted values line up with reality.

augment(mr_c) |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Consumption, colour = "Data")) +
  geom_line(aes(y = .fitted, colour = "Fitted")) +
  labs(y = NULL,
  title = "Percent change in consumption expenditure") +
  scale_colour_manual(values=c(Data="black",Fitted="red")) +
  guides(colour = guide_legend(title = NULL))

Plotting fitted values

Plotting fitted values: scatter

We could scatter \(y\) against \(\hat{y}\):

augment(mr_c) |>
  ggplot(aes(x = Consumption, y = .fitted)) +
  geom_point() +
  labs(x = "Data",
  y = "Fitted values",
  title = "% change in consumption expenditure") +
  geom_abline(intercept = 0, slope = 1)

Plotting fitted values: scatter

Goodness-of-fit

The coefficient of determination, \(R^{2}\) is a good summary measure of how well your model fits the data:

\(R^{2} = \frac{\sum(\hat{y}_{t} - \bar{y})^{2}}{\sum(y_{t} - \bar{y})^{2}}\)

It gives the proportion of variation in \(y\) explained by \(\hat{y}\).
Note \(R^{2}\) never decreases when adding predictors. Not necessarily a good metric for evaluating a forecast model.
\(R^{2}\) of above model is 0.3755.