Discrete Time Series

Lecture 07

Dr. Colin Rundel

Random variable review

Mean and variance of RVs

  • Expected Value

\[ E(X) = \begin{cases} \sum_x x \cdot P(X = x) & \text{$X$ is discrete}\\ \int_{-\infty}^{\infty} x \cdot f(x) \; dx & \text{$X$ is continuous} \end{cases} \]

  • Variance

\[ \begin{aligned} Var(X) &= E\Big(\big(X-E(X)\big)^2\Big) = E(X^2)-E(X)^2 \\ &= \begin{cases} \sum_x \big(x - E(X)\big)^2 \cdot P(X = x) & \text{$X$ is discrete}\\ \int_{-\infty}^{\infty} \big(x-E(X)\big)^2 \cdot f(x) \; dx & \text{$X$ is continuous} \end{cases} \end{aligned} \]

Covariance of RVs

\[ \begin{aligned} Cov(X,Y) &= E\Big(\big(X-E(X)\big)\big(Y-E(Y)\big)\Big) = E(XY)-E(X)E(Y) \\ &= \begin{cases} \sum_x \big(x - E(X)\big)\big(y - E(Y)\big) \cdot P(X = x, Y=y) & \text{$X$ is discrete}\\ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big(x-E(X)\big)\big(y-E(Y)\big) \cdot f(x,y) \; dx \; dy & \text{$X$ is continuous} \end{cases} \\ \\ Corr(X,Y) &= \frac{Cov(X,Y)}{\sqrt{Var(X)\,Var(Y)}} \end{aligned} \]

Properties of Expected Value

  • Constant

    \(E(c) = c\) if \(c\) is constant

  • Constant Multiplication

    \(E(cX) = cE(X)\)

  • Constant Addition

    \(E(X+c) = E(X)+c\)

  • Addition

    \(E(X+Y) = E(X)+E(Y)\)

  • Subtraction

    \(E(X-Y) = E(X)-E(Y)\)$

  • Multiplication

    \(E(XY) = E(X)\,E(Y)\)

    if \(X\) and \(Y\) are independent

Properties of Variance

  • Constant

    \(Var(c) = 0\) if \(c\) is constant

  • Constant Multiplication

    \(Var(cX) = c^2~Var(x)\)

  • Constant Addition

    \(Var(X+c) = Var(X)\)

  • Addition

    \(Var(X+Y) = Var(X)+Var(Y)\)

    if \(X\) and \(Y\) are independent.

  • Subtraction

    \(Var(X-Y) = Var(X)+Var(Y)\)

    if \(X\) and \(Y\) are independent.

Properties of Covariance

  • Constant

    \(Cov(X,c) = 0\) if \(c\) is constant

  • Identity

    \(Cov(X,X) = Var(X)\)

  • Symmetric

    \(Cov(X,Y) = Cov(Y,X)\)

  • Constant Multiplication

    \(Cov(aX, bY) = ab ~ Cov(X,Y)\)

  • Constant Addition

    \(Cov(X+a, Y+b) = Cov(X,Y)\)

  • Distribution

    \(Cov(aX+bY,cV+dW) = ac~Cov(X,V) + ad~Cov(X,W)+bc~Cov(Y,V)+bd~Cov(Y,W)\)

Discrete Time Series

Stationary Processes

A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin.

In the time series context this means that the joint distribution of \(\{y_{t_1}, \ldots, y_{t_n}\}\) must be identical to the distribution of \(\{y_{t_1+k}, \ldots, y_{t_n+k}\}\) for any value of \(n\) and \(k\).

Weakly Stationary

Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following,

  1. The process must have finite variance / second moment \[E(y_t^2) < \infty \text{ for all $t$}\]

  2. The mean of the process must be constant \[E(y_t) = \mu \text{ for all $t$}\]

  3. The cross moment (covariance) may only depends on the lag (i.e. \(t-s\) for \(y_t\) and \(y_s\)) \[Cov(y_t,y_s) = Cov(y_{t+k},y_{s+k}) \text{ for all $t,s,k$}\]

When we say stationary in class we will almost always mean weakly stationary.

Autocorrelation

For a stationary time series, where \(E(y_t)=\mu\) and \(\text{Var}(y_t)=\sigma^2\) for all \(t\), we define the autocorrelation at lag \(k\) as

\[ \begin{aligned} \rho_k &= Cor(y_t, \, y_{t+k}) \\ &= \frac{Cov(y_t, y_{t+k})}{\sqrt{Var(y_t)Var(y_{t+k})}} \\ &= \frac{E\left( (y_t-\mu)(y_{t+k}-\mu) \right)}{\sigma^2} \end{aligned} \]

this is also sometimes written in terms of the autocovariance function (\(\gamma_k\)) as \[ \begin{aligned} \gamma_k &= \gamma(t,t+k) = Cov(y_t, y_{t+k}) \\ \rho_k &= \frac{\gamma(t,t+k)}{\sqrt{\gamma(t,t) \gamma(t+k,t+k)}} = \frac{\gamma(k)}{\gamma(0)} \end{aligned} \]

Covariance Structure

Based on our definition of a (weakly) stationary process, it implies a covariance of the following structure,

\[ \boldsymbol{\Sigma} = \left( \begin{matrix} \gamma(0) & \gamma(1) & \gamma(2) & \gamma(3) & \cdots & \gamma(n-1) &\gamma(n) \\ \gamma(1) & \gamma(0) & \gamma(1) & \gamma(2) & \cdots & \gamma(n-2) &\gamma(n-1) \\ \gamma(2) & \gamma(1) & \gamma(0) & \gamma(1) & \cdots & \gamma(n-3) &\gamma(n-2) \\ \gamma(3) & \gamma(2) & \gamma(1) & \gamma(0) & \cdots & \gamma(n-4) &\gamma(n-3) \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ \gamma(n-1) & \gamma(n-2) & \gamma(n-3) & \gamma(n-4) & \cdots & \gamma(0) & \gamma(1) \\ \gamma(n) & \gamma(n-1) & \gamma(n-2) & \gamma(n-3) & \cdots & \gamma(1) & \gamma(0) \\ \end{matrix} \right) \]

Example - Random walk

Let \(y_t = y_{t-1} + w_t\) with \(y_0=0\) and \(w_t \sim N(0,1)\).

ACF + PACF

Stationary?

Is \(y_t\) stationary?

Partial Autocorrelation - pACF

Given these type of patterns in the autocorrelation we often want to examine the relationship between \(y_t\) and \(y_{t+k}\) with the (linear) dependence of \(y_t\) on \(y_{t+1}\) through \(y_{t+k-1}\) removed.

This is done through the calculation of a partial autocorrelation (\(\alpha(k)\)), which is defined as follows:

\[ \begin{aligned} \alpha(0) &= 1 \\ \alpha(1) &= \rho(1) = Cor(y_t,y_{t+1})\\ &~~\vdots \\ \alpha(k) &= Cor(y_t - P_{t,k}(y_t),~ y_{t+k} - P_{t,k}(y_{t+k})) \end{aligned} \]

where \(P_{t,k}(y)\) is the project of \(y\) onto the space spanned by \(y_{t+1},\ldots,y_{t+k-1}\).

pACF - Calculation

Let \(\rho(k)\) be the autocorrelation for the process at lag \(k\) then the partial autocorrelation at lag \(k\) will be \(\phi(k,k)\) given by the Durbin-Levinson algorithm,

\[ \phi(k,k) = \frac{ \rho(k) - \sum_{t=1}^{k-1} \phi(k-1, t) \, \rho(k-t) }{ 1 - \sum_{t=1}^{k-1} \phi(k-1, t) \, \rho(t) } \] where \[ \phi(k,t) = \phi(k-1,t) - \phi(k,k) \, \phi(k-1, k-t) \\ \]

Starting with \(\phi(1,1) = \rho(1)\) we can solve iteratively for \(\phi(2,2), \ldots, \phi(k,k)\).

Example - Random walk with drift

Let \(y_t = \delta + y_{t-1} + w_t\) with \(y_0=0\) and \(w_t \sim N(0,1)\).

ACF + PACF

Stationary?

Is \(y_t\) stationary?

Example - Moving Average

Let \(w_t \sim N(0,1)\) and \(y_t = w_{t-1}+w_t\).

ACF + PACF

Stationary?

Is \(y_t\) stationary?

Autoregressive

Let \(w_t \sim N(0,1)\) and \(y_t = y_{t-1} - 0.9 y_{t-2} + w_t\) with \(y_t = 0\) for \(t < 1\).

ACF + PACF

Example - Australian Wine Sales

Australian total wine sales by wine makers in bottles <= 1 litre. Jan 1980 – Aug 1994.

aus_wine = readRDS("data/aus_wine.rds")
aus_wine
# A tibble: 176 × 2
    date sales
   <dbl> <dbl>
 1 1980  15136
 2 1980. 16733
 3 1980. 20016
 4 1980. 17708
 5 1980. 18019
 6 1980. 19227
 7 1980. 22893
 8 1981. 23739
 9 1981. 21133
10 1981. 22591
# … with 166 more rows

Time series

Basic Model Fit

Residuals

Autocorrelation Plot

Auto regressive errors


Call:
lm(formula = quad_resid ~ lag_12, data = d_ar)

Residuals:
     Min       1Q   Median       3Q      Max 
-12286.5  -1380.5     73.4   1505.2   7188.1 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  83.65080  201.58416   0.415    0.679    
lag_12        0.89024    0.04045  22.006   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2581 on 162 degrees of freedom
  (12 observations deleted due to missingness)
Multiple R-squared:  0.7493,    Adjusted R-squared:  0.7478 
F-statistic: 484.3 on 1 and 162 DF,  p-value: < 2.2e-16

Residual residuals

Residual residuals - acf

Writing down the model?

So, is our EDA suggesting that we fit the following model?

\[ \text{sales}_t = \beta_0 + \beta_1 \, t + \beta_2 \, t^2 + \beta_3 \, \text{sales}_{t-12} + \epsilon_t \]

the model we actually fit is,

\[ \text{sales}_{t} = \beta_0 + \beta_1 \, t + \beta_2 \, t^2 + w_t \] where

\[ w_t = \delta \, w_{t-12} + \epsilon_t \]