Transformations

BUS 323 Forecasting and Risk Analysis

Three components for decomposition

Trend-cycle component
Seasonal component
Remainder component

Why decompose?

Improve understanding
Improve forecast accuracy
Often, need to transform the series first

Calendar adjustments

Some variation due to calendar effects. We may want to remove that variation to make the patterns more predictable.
e.g. Some months have 30 days, some have 31. Some have 28. Some have 29!
- If we computed total sales per month, some months would have “bonus” days, making them look more productive.
- One possible adjustment: average sales per day.

Population adjustments

Some data series are affected by population changes.We can alter them to give per-capita data.
e.g. GDP. What would GDP per capita reflect that GDP on its own does not?

Population adjustments

Some data series are affected by population changes.We can alter them to give per-capita data.
e.g. GDP. What would GDP per capita reflect that GDP on its own does not?
Use global_economy to make a time plot of US GDP and US per-capita GDP.

Population adjustments

Some data series are affected by population changes.We can alter them to give per-capita data.
e.g. GDP. What would GDP per capita reflect that GDP on its own does not?
Use global_economy to make a time plot of Australia GDP and per-capita GDP.

Population adjustments: GDP time plot

library(fpp3)
global_economy |>
  filter(Country == "Australia") |>
  autoplot(GDP) +
  labs(title = "GDP", y = "$")

Population adjustments: GDP time plot

Population adjustments: GDP per capita time plot

library(fpp3)
global_economy |>
  filter(Country == "Australia") |>
  autoplot(GDP/Population) +
  labs(title = "GDP per capita", y = "$")

Population adjustments: GDP per capita time plot

Inflation adjustments

Data which are affected by the value of money should be adjusted before modeling.
- e.g. a $0.99 Arizona Iced Tea today is not the same as a $0.99 Arizona Iced Tea in 2010.

Inflation adjustments

To adjust, use a price index.
- Let $z_{t}$ represent the price index in year $t$ and $y_{t}$ represent the price of Arizona Iced Tea in year $t$.
- Then the price of Arizona Iced Tea in 2000 dollars would be: $x_{t} = \frac{y_{t}}{z_{t}} \times z_{2000}$
The Consumer Price Index (CPI) is commonly used for this. The CPI is equal to 100 in the base year.

Inflation adjustments: Australia retail

The aus_retail dataset gives retail turnover in $Million AUD. Let’s inflation-adjust that series.
In particular, let’s inflation-adjust total retail turnover across states by year for a certain industry. I’ll look “Newspaper and book retailing”. Use whatever industry you like. The first step is to get total retail turnover by year for that industry:

print_retail <- aus_retail |>
  filter(Industry == "Newspaper and book retailing") |>
    group_by(Industry) |>
      index_by(Year = year(Month)) |>
        summarise(Turnover = sum(Turnover))

Inflation adjustments: Australia retail

Next we’ll need to get our price index. We can grab the CPI from global_economy:

aus_economy <- global_economy |>
  filter(Code == "AUS")

Inflation adjustments: Australia retail

Next we’ll need to use the CPI from aus_economy to inflation-adjust the values of Turnover from print_retail:

print_retail |>
  left_join(aus_economy, by = "Year") |>
    mutate(Adjusted_turnover = (Turnover/CPI)*100)

# A tsibble: 37 x 12 [1Y]
# Key:       Industry [1]
   Industry     Year Turnover Country Code      GDP Growth   CPI Imports Exports
   <chr>       <dbl>    <dbl> <fct>   <fct>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>
 1 Newspaper …  1982    1263  Austra… AUS   1.94e11  3.33   33.4    16.8    13.5
 2 Newspaper …  1983    1800. Austra… AUS   1.77e11 -2.22   36.8    15.6    13.6
 3 Newspaper …  1984    2011. Austra… AUS   1.93e11  4.58   38.2    15.0    13.6
 4 Newspaper …  1985    2109. Austra… AUS   1.80e11  5.25   40.8    17.3    15.3
 5 Newspaper …  1986    2263. Austra… AUS   1.82e11  4.06   44.5    18.1    15.0
 6 Newspaper …  1987    2534. Austra… AUS   1.89e11  2.51   48.3    17.1    15.5
 7 Newspaper …  1988    2557. Austra… AUS   2.36e11  5.68   51.8    16.6    16.0
 8 Newspaper …  1989    2859. Austra… AUS   2.99e11  3.87   55.7    17.0    15.1
 9 Newspaper …  1990    2891. Austra… AUS   3.11e11  3.56   59.8    17.1    15.1
10 Newspaper …  1991    2843. Austra… AUS   3.25e11 -0.388  61.7    16.2    16.1
# ℹ 27 more rows
# ℹ 2 more variables: Population <dbl>, Adjusted_turnover <dbl>

Inflation adjustments: Australia retail

For plotting, we’ll need to pivot turnover so that it’s indexed by whether it’s inflation-adjusted or not

print_retail |>
  left_join(aus_economy, by = "Year") |>
    mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
      pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover")

# A tsibble: 74 x 12 [1Y]
# Key:       Industry, name [2]
   Industry   Year Country Code      GDP Growth   CPI Imports Exports Population
   <chr>     <dbl> <fct>   <fct>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
 1 Newspape…  1982 Austra… AUS   1.94e11   3.33  33.4    16.8    13.5   15178000
 2 Newspape…  1982 Austra… AUS   1.94e11   3.33  33.4    16.8    13.5   15178000
 3 Newspape…  1983 Austra… AUS   1.77e11  -2.22  36.8    15.6    13.6   15369000
 4 Newspape…  1983 Austra… AUS   1.77e11  -2.22  36.8    15.6    13.6   15369000
 5 Newspape…  1984 Austra… AUS   1.93e11   4.58  38.2    15.0    13.6   15544000
 6 Newspape…  1984 Austra… AUS   1.93e11   4.58  38.2    15.0    13.6   15544000
 7 Newspape…  1985 Austra… AUS   1.80e11   5.25  40.8    17.3    15.3   15758000
 8 Newspape…  1985 Austra… AUS   1.80e11   5.25  40.8    17.3    15.3   15758000
 9 Newspape…  1986 Austra… AUS   1.82e11   4.06  44.5    18.1    15.0   16018400
10 Newspape…  1986 Austra… AUS   1.82e11   4.06  44.5    18.1    15.0   16018400
# ℹ 64 more rows
# ℹ 2 more variables: name <chr>, Turnover <dbl>

Inflation adjustments: Australia retail

Then we’ll need to factorize the new name variable so it can be plotted.

print_retail |>
  left_join(aus_economy, by = "Year") |>
    mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
      pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover") |>
        mutate(name = factor(name, levels = c("Turnover", "Adjusted_turnover")))

# A tsibble: 74 x 12 [1Y]
# Key:       Industry, name [2]
   Industry   Year Country Code      GDP Growth   CPI Imports Exports Population
   <chr>     <dbl> <fct>   <fct>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
 1 Newspape…  1982 Austra… AUS   1.94e11   3.33  33.4    16.8    13.5   15178000
 2 Newspape…  1982 Austra… AUS   1.94e11   3.33  33.4    16.8    13.5   15178000
 3 Newspape…  1983 Austra… AUS   1.77e11  -2.22  36.8    15.6    13.6   15369000
 4 Newspape…  1983 Austra… AUS   1.77e11  -2.22  36.8    15.6    13.6   15369000
 5 Newspape…  1984 Austra… AUS   1.93e11   4.58  38.2    15.0    13.6   15544000
 6 Newspape…  1984 Austra… AUS   1.93e11   4.58  38.2    15.0    13.6   15544000
 7 Newspape…  1985 Austra… AUS   1.80e11   5.25  40.8    17.3    15.3   15758000
 8 Newspape…  1985 Austra… AUS   1.80e11   5.25  40.8    17.3    15.3   15758000
 9 Newspape…  1986 Austra… AUS   1.82e11   4.06  44.5    18.1    15.0   16018400
10 Newspape…  1986 Austra… AUS   1.82e11   4.06  44.5    18.1    15.0   16018400
# ℹ 64 more rows
# ℹ 2 more variables: name <fct>, Turnover <dbl>

Inflation adjustments: Australia retail

Finally, we can plot both time plots in one plot area using the facet_grid plot type:

print_retail |>
  left_join(aus_economy, by = "Year") |>
    mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
      pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover") |>
        mutate(name = factor(name, levels = c("Turnover", "Adjusted_turnover"))) |>
          ggplot(aes(x = Year, y = Turnover)) +
          geom_line() +
          facet_grid(name ~ ., scales = "free_y") +
          labs(title = "Turnover: Australian print industry", y = "$AUD")

Inflation adjustments: Australia retail

Mathematical transformations

Logarithms: $w_{t} = log(y_{t})$.
- Changes in a log value are relative changes on the original scale.
- e.g. if a $log_{10}$ is used, a 1-unit change on the log scale corresponds to a 10x change on the original scale.
- Note if there are any zeroes or negative values in the oriignal variable, a log transformation is impossible.

Mathematical transformations

Box-Cox transformation: \[ w_{t} = \begin{cases} \frac{y^{\lambda} - 1}{\lambda}, & \text{if } \lambda \ne 0 \\ \log(y), & \text{if } \lambda = 0 \end{cases} \]
This actually allows for negative $y$ if $\lambda > 0$.
Box-Cox transformations use a natural logarithm.
If $\lambda=1$, $w_{t} = $y_{t}-1$.

Mathematical transformations

Power transformations: $w_{t} = y_{t}^{p}$
Square roots, cube roots, squares, etc.

Mathematical transformations

The goal of a mathematical transformation should be to smooth out the series.
In particular, these transformations can really help with series that are non-stationary in variance.
So in the case of a Box-Cox transformation, $\lambda$ should be chosen to make the size of the seasonal variation relatively constant across the whole series.

Mathematical transformations: Australian gas production

Recall the time series of Australian gas production showed seasonality:

aus_production |>
  autoplot(Gas)

Mathematical transformations: Australian gas production

Note that the variance of the seasonal trend is increasing over time.
Try to smooth it out with a Box-Cox transformation. You can use autoplot(box_cox(Gas, lambda)) to do it automatically if you supply a lambda value.

Australian gas production: lambda = 0

aus_production |>
  autoplot(box_cox(Gas, 0))

Australian gas production: lambda = 1

aus_production |>
  autoplot(box_cox(Gas, 1))

Australian gas production: lambda = 0.1

aus_production |>
  autoplot(box_cox(Gas, 0.1))