Transformations
BUS 323 Forecasting and Risk Analysis
Three components for decomposition
- Trend-cycle component
- Seasonal component
- Remainder component
Why decompose?
- Improve understanding
- Improve forecast accuracy
- Often, need to transform the series first
Calendar adjustments
- Some variation due to calendar effects. We may want to remove that variation to make the patterns more predictable.
- e.g. Some months have 30 days, some have 31. Some have 28. Some have 29!
- If we computed total sales per month, some months would have “bonus” days, making them look more productive.
- One possible adjustment: average sales per day.
Population adjustments
- Some data series are affected by population changes.We can alter them to give per-capita data.
- e.g. GDP. What would GDP per capita reflect that GDP on its own does not?
Population adjustments
- Some data series are affected by population changes.We can alter them to give per-capita data.
- e.g. GDP. What would GDP per capita reflect that GDP on its own does not?
- Use
global_economy
to make a time plot of US GDP and US per-capita GDP.
Population adjustments
- Some data series are affected by population changes.We can alter them to give per-capita data.
- e.g. GDP. What would GDP per capita reflect that GDP on its own does not?
- Use
global_economy
to make a time plot of Australia GDP and per-capita GDP.
Population adjustments: GDP time plot
library(fpp3)
global_economy |>
filter(Country == "Australia") |>
autoplot(GDP) +
labs(title = "GDP", y = "$")
Population adjustments: GDP time plot
Population adjustments: GDP per capita time plot
library(fpp3)
global_economy |>
filter(Country == "Australia") |>
autoplot(GDP/Population) +
labs(title = "GDP per capita", y = "$")
Population adjustments: GDP per capita time plot
Inflation adjustments
- Data which are affected by the value of money should be adjusted before modeling.
- e.g. a $0.99 Arizona Iced Tea today is not the same as a $0.99 Arizona Iced Tea in 2010.
Inflation adjustments
- To adjust, use a price index.
- Let \(z_{t}\) represent the price index in year \(t\) and \(y_{t}\) represent the price of Arizona Iced Tea in year \(t\).
- Then the price of Arizona Iced Tea in 2000 dollars would be: \(x_{t} = \frac{y_{t}}{z_{t}} \times z_{2000}\)
- The Consumer Price Index (CPI) is commonly used for this. The CPI is equal to 100 in the base year.
Inflation adjustments: Australia retail
- The
aus_retail
dataset gives retail turnover in $Million AUD. Let’s inflation-adjust that series.
- In particular, let’s inflation-adjust total retail turnover across states by year for a certain industry. I’ll look “Newspaper and book retailing”. Use whatever industry you like. The first step is to get total retail turnover by year for that industry:
print_retail <- aus_retail |>
filter(Industry == "Newspaper and book retailing") |>
group_by(Industry) |>
index_by(Year = year(Month)) |>
summarise(Turnover = sum(Turnover))
Inflation adjustments: Australia retail
- Next we’ll need to get our price index. We can grab the CPI from
global_economy
:
aus_economy <- global_economy |>
filter(Code == "AUS")
Inflation adjustments: Australia retail
- Next we’ll need to use the CPI from
aus_economy
to inflation-adjust the values of Turnover
from print_retail
:
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = (Turnover/CPI)*100)
# A tsibble: 37 x 12 [1Y]
# Key: Industry [1]
Industry Year Turnover Country Code GDP Growth CPI Imports Exports
<chr> <dbl> <dbl> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Newspaper … 1982 1263 Austra… AUS 1.94e11 3.33 33.4 16.8 13.5
2 Newspaper … 1983 1800. Austra… AUS 1.77e11 -2.22 36.8 15.6 13.6
3 Newspaper … 1984 2011. Austra… AUS 1.93e11 4.58 38.2 15.0 13.6
4 Newspaper … 1985 2109. Austra… AUS 1.80e11 5.25 40.8 17.3 15.3
5 Newspaper … 1986 2263. Austra… AUS 1.82e11 4.06 44.5 18.1 15.0
6 Newspaper … 1987 2534. Austra… AUS 1.89e11 2.51 48.3 17.1 15.5
7 Newspaper … 1988 2557. Austra… AUS 2.36e11 5.68 51.8 16.6 16.0
8 Newspaper … 1989 2859. Austra… AUS 2.99e11 3.87 55.7 17.0 15.1
9 Newspaper … 1990 2891. Austra… AUS 3.11e11 3.56 59.8 17.1 15.1
10 Newspaper … 1991 2843. Austra… AUS 3.25e11 -0.388 61.7 16.2 16.1
# ℹ 27 more rows
# ℹ 2 more variables: Population <dbl>, Adjusted_turnover <dbl>
Inflation adjustments: Australia retail
- For plotting, we’ll need to pivot turnover so that it’s indexed by whether it’s inflation-adjusted or not
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover")
# A tsibble: 74 x 12 [1Y]
# Key: Industry, name [2]
Industry Year Country Code GDP Growth CPI Imports Exports Population
<chr> <dbl> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Newspape… 1982 Austra… AUS 1.94e11 3.33 33.4 16.8 13.5 15178000
2 Newspape… 1982 Austra… AUS 1.94e11 3.33 33.4 16.8 13.5 15178000
3 Newspape… 1983 Austra… AUS 1.77e11 -2.22 36.8 15.6 13.6 15369000
4 Newspape… 1983 Austra… AUS 1.77e11 -2.22 36.8 15.6 13.6 15369000
5 Newspape… 1984 Austra… AUS 1.93e11 4.58 38.2 15.0 13.6 15544000
6 Newspape… 1984 Austra… AUS 1.93e11 4.58 38.2 15.0 13.6 15544000
7 Newspape… 1985 Austra… AUS 1.80e11 5.25 40.8 17.3 15.3 15758000
8 Newspape… 1985 Austra… AUS 1.80e11 5.25 40.8 17.3 15.3 15758000
9 Newspape… 1986 Austra… AUS 1.82e11 4.06 44.5 18.1 15.0 16018400
10 Newspape… 1986 Austra… AUS 1.82e11 4.06 44.5 18.1 15.0 16018400
# ℹ 64 more rows
# ℹ 2 more variables: name <chr>, Turnover <dbl>
Inflation adjustments: Australia retail
- Then we’ll need to factorize the new
name
variable so it can be plotted.
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover") |>
mutate(name = factor(name, levels = c("Turnover", "Adjusted_turnover")))
# A tsibble: 74 x 12 [1Y]
# Key: Industry, name [2]
Industry Year Country Code GDP Growth CPI Imports Exports Population
<chr> <dbl> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Newspape… 1982 Austra… AUS 1.94e11 3.33 33.4 16.8 13.5 15178000
2 Newspape… 1982 Austra… AUS 1.94e11 3.33 33.4 16.8 13.5 15178000
3 Newspape… 1983 Austra… AUS 1.77e11 -2.22 36.8 15.6 13.6 15369000
4 Newspape… 1983 Austra… AUS 1.77e11 -2.22 36.8 15.6 13.6 15369000
5 Newspape… 1984 Austra… AUS 1.93e11 4.58 38.2 15.0 13.6 15544000
6 Newspape… 1984 Austra… AUS 1.93e11 4.58 38.2 15.0 13.6 15544000
7 Newspape… 1985 Austra… AUS 1.80e11 5.25 40.8 17.3 15.3 15758000
8 Newspape… 1985 Austra… AUS 1.80e11 5.25 40.8 17.3 15.3 15758000
9 Newspape… 1986 Austra… AUS 1.82e11 4.06 44.5 18.1 15.0 16018400
10 Newspape… 1986 Austra… AUS 1.82e11 4.06 44.5 18.1 15.0 16018400
# ℹ 64 more rows
# ℹ 2 more variables: name <fct>, Turnover <dbl>
Inflation adjustments: Australia retail
- Finally, we can plot both time plots in one plot area using the
facet_grid
plot type:
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = (Turnover/CPI)*100) |>
pivot_longer(c(Turnover, Adjusted_turnover), values_to = "Turnover") |>
mutate(name = factor(name, levels = c("Turnover", "Adjusted_turnover"))) |>
ggplot(aes(x = Year, y = Turnover)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") +
labs(title = "Turnover: Australian print industry", y = "$AUD")
Inflation adjustments: Australia retail
Australian gas production: lambda = 0
aus_production |>
autoplot(box_cox(Gas, 0))
Australian gas production: lambda = 1
aus_production |>
autoplot(box_cox(Gas, 1))
Australian gas production: lambda = 0.1
aus_production |>
autoplot(box_cox(Gas, 0.1))