Fitting ARIMA Models

Lecture 12

Dr. Colin Rundel

Model Fitting

Fitting ARIMA

For an \(ARIMA(p,d,q)\) model,

Assumes that the data is stationary after differencing
Handling \(d\) is straight forward, just difference the original data \(d\) times (leaving \(n-d\) observations) \[ y'_t = \Delta^d \, y_t \]
After differencing, fit an \(ARMA(p,q)\) model to \(y'_t\).
To keep things simple we’ll assume \(w_t \overset{iid}{\sim} \mathcal{N}(0,\sigma^2_w)\)

MLE - Stationarity & iid normal errors

If both of these assumptions are met, then the time series \(y_t\) will also be normal.

In general, the vector \(\boldsymbol{y} = (y_1, y_2, \ldots, y_t)'\) will have a multivariate normal distribution with mean \(\{\boldsymbol\mu\}_i = E(y_i) = E(y_t)\) and covariance \(\boldsymbol\Sigma\) where \(\{\boldsymbol{\Sigma}\}_{ij} = \gamma(i-j)\).

The joint density of \(\boldsymbol y\) is given by

\[ f_{\boldsymbol y}(\boldsymbol y) = \frac{1}{(2\pi)^{t/2}\,\det(\boldsymbol\Sigma)^{1/2}} \times \exp\left( -\frac{1}{2}(\boldsymbol y - \boldsymbol \mu)' \, \Sigma^{-1} \, (\boldsymbol y - \boldsymbol \mu) \right) \]

AR

Fitting \(AR(1)\)

\[ y_t = \delta + \phi \, y_{t-1} + w_t \]

We need to estimate three parameters: \(\delta\), \(\phi\), and \(\sigma_w^2\), we know

\[ \begin{aligned} E(y_t) = \frac{\delta}{1-\phi} \quad&\quad Var(y_t) = \frac{\sigma_w^2}{1-\phi^2} \\ \gamma(h) &= \frac{\sigma_w^2}{1-\phi^2} \phi^{|h|} \end{aligned} \]

Using these properties it is possible to write the distribution of \(\boldsymbol{y}\) as a MVN but that does not make it easy to write down a (simplified) closed form for the MLE estimate for \(\delta\), \(\phi\), and \(\sigma_w^2\).

Conditional Density

We can also rewrite the density as follows,

\[ \begin{aligned} f(\boldsymbol y) &= f(y_t,\,y_{t-1},\,\ldots,\,y_2,\,y_1) \\ &= f(y_t|\,y_{t-1},\,\ldots,\,y_2,\,y_1) f(y_{t-1}|y_{t-2},\,\ldots,\,y_2,\,y_1) \cdots f(y_2|y_1) f(y_1) \\ &= f(y_t|\,y_{t-1}) f(y_{t-1}|y_{t-2}) \cdots f(y_2|y_1) f(y_1) \end{aligned} \]

where,

\[ \begin{aligned} y_1 &\sim \mathcal{N}\left(\delta, \, \frac{\sigma^2_w}{1-\phi^2} \right) \\ y_{t}|y_{t-1} &\sim \mathcal{N}\left(\delta+\phi\, y_{t-1}, \, \sigma^2_w \right) \\ f(y_{t}|y_{t-1}) &= \frac{1}{\sqrt{2\pi \, \sigma^2_w}} \exp \left( -\frac{1}{2}\frac{(y_t -\delta+\phi\, y_{t-1})^2 }{\sigma^2_w} \right) \end{aligned} \]

Log likelihood of AR(1)

\[ \log f(y_{t} | y_{t-1}) = -\frac{1}{2}\left( \log 2\pi + \log \sigma^2_w + \frac{1}{\sigma_w^2} (y_t -\delta+\phi\, y_{t-1})^2 \right) \]

\[ \begin{aligned} \ell(\delta, \phi, \sigma^2_w) &= \log f(\boldsymbol{y}) = \log f(y_1) + \sum_{i=2}^t \log f(y_{i}|y_{i-1}) \\ &= - \frac{1}{2} \bigg(\log 2\pi + \log \sigma_w^2 - \log (1-\phi^2) + \frac{(1-\phi^2)}{\sigma_w^2 }(y_1-\delta)^2 \bigg) \\ & ~~~~ - \frac{1}{2} \bigg( (n-1) \log 2\pi + (n-1) \log \sigma_w^2 + \frac{1}{\sigma_w^2} \sum_{i=2}^n (y_i -\delta+\phi\, y_{i-1})^2 \bigg) \\ &= - \frac{1}{2} \bigg( n \log 2\pi + n \log \sigma_w^2 - \log (1-\phi^2) \\ &~~~~~~~~~~~~~~~+ \frac{1}{\sigma_w^2} \bigg( (1-\phi^2)(y_1-\delta)^2 + \sum_{i=2}^n (y_i -\delta+\phi\, y_{i-1})^2 \bigg) \bigg) \end{aligned} \]

AR(1) Example

with \(\phi = 0.75\), \(\delta=0.5\), and \(\sigma_w^2=1\),

ARIMA

( ar1_arima = forecast::Arima(ar1, order = c(1,0,0)) )

Series: ar1 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
         ar1    mean
      0.7601  2.2178
s.e.  0.0290  0.1890

sigma^2 = 1.045:  log likelihood = -719.84
AIC=1445.67   AICc=1445.72   BIC=1458.32

mean vs \(\delta\)?

The reported mean value from the ARIMA model is \(E(y_t)\) and not \(\delta\) - for an ARIMA(1,0,0) \[ E(y_t) = \frac{\delta}{1-\phi} ~~\Rightarrow~~ \delta = E(y_t) * (1-\phi) \]

True \(E(y_t)\):

0.5 / (1-0.75)

[1] 2

Sample \(\delta\):

ar1_arima$coef[2] * 
  (1 - ar1_arima$model$phi)

intercept 
0.5319962

lm

d = tsibble::as_tsibble(ar1) %>%
  as_tibble() %>%
  rename(y = value)
summary({ ar1_lm = lm(y~lag(y), data=d) })


Call:
lm(formula = y ~ lag(y), data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7194 -0.6991 -0.0139  0.6323  3.3518 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.53138    0.07898   6.728 4.74e-11 ***
lag(y)       0.76141    0.02918  26.090  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.023 on 497 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.578, Adjusted R-squared:  0.5771 
F-statistic: 680.7 on 1 and 497 DF,  p-value: < 2.2e-16

Bayesian AR(1) Model

library(brms) # must be loaded for arma to work
( ar1_brms = brm(y ~ arma(p = 1, q = 0), data=d, refresh=0) )

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ arma(p = 1, q = 0) 
   Data: d (Number of observations: 500) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Correlation Structures:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ar[1]     0.76      0.03     0.71     0.82 1.00     3820     2893

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     2.20      0.19     1.82     2.58 1.00     3728     2811

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.02      0.03     0.96     1.09 1.00     3889     2940

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Chains

plot(ar1_brms)

PP Checks

pp_check(ar1_brms, ndraws=100)

Posteriors

Predictions

Forecasting

ar1_brms_fc = ar1_brms %>%
  predicted_draws_fix(
    newdata = tibble(index=501:550, y=NA)
  ) %>%
  filter(.chain == 1)

:::

Fitting AR(p)

Lagged Regression

As with the AR(1), we can rewrite the density using conditioning, \[ \begin{aligned} f(\boldsymbol y) &= f(y_t, \,y_{t-1}, \,\ldots, \,y_{2}, \,y_{1}) \\ &= f(y_{n}|y_{n-1},\ldots,y_{n-p}) \cdots f(y_{p+1}|y_p,\ldots,y_1) f(y_p, \,\ldots, y_1) \end{aligned} \]

Regressing \(y_t\) on \(y_{t-1}, \ldots, y_{t-p}\) gets us an approximate solution, but it ignores the \(f(y_1, \, y_2, \,\ldots, y_p)\) part of the likelihood.

How much does this matter (vs. using the full likelihood)?

If \(p\) is near to \(n\) then probably a lot
If \(p << n\) then probably not much

Method of Moments

Recall for an AR(p) process,

\[ \begin{aligned} \gamma(0) &= \sigma^2_w + \phi_1 \gamma(1) + \phi_2 \gamma(2) + \ldots + \phi_p \gamma(p) \\ \gamma(h) &= \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + \ldots \phi_p \gamma(h-p) \end{aligned} \] We can rewrite the first equation in terms of \(\sigma^2_w\), \[ \sigma^2_w = \gamma(0) - \phi_1 \gamma(1) - \phi_2 \gamma(2) - \ldots - \phi_p \gamma(p) \] these are called the Yule-Walker equations.

Yule-Walker

These equations can be rewritten into matrix notation as follows

\[ \underset{p \times p}{\boldsymbol\Gamma_p} \underset{p \times 1}{\boldsymbol\phi} = \underset{p \times 1}{\boldsymbol\gamma_p} \qquad\qquad \underset{1 \times 1}{\sigma^2_w} = \underset{1 \times 1}{\gamma(0)} - \underset{1 \times p}{\boldsymbol{\phi'}}\underset{p \times 1}{\boldsymbol{\gamma_p}} \] where

\[ \begin{aligned} \underset{p \times p}{\boldsymbol{\Gamma_p}} &= \{\gamma(j-k)\}_{j,k} \\ \underset{p \times 1}{\boldsymbol\phi} &= (\phi_1, \phi_2, \ldots, \phi_p)' \\ \underset{p \times 1}{\boldsymbol\gamma_p} &= (\gamma(1), \gamma(2), \ldots, \gamma(p))' \end{aligned} \]

If we estimate the covariance structure from the data we obtain \(\hat{\boldsymbol\gamma_p}\) and \(\hat{\boldsymbol\Gamma_p}\) which we can plug in and solve for \(\boldsymbol{\phi}\) and \(\sigma^2_w\), \[ \hat{\boldsymbol\phi} =\hat{\boldsymbol{\Gamma}_p}^{-1}\hat{\boldsymbol{\gamma}_p} \qquad\qquad \hat{\sigma}^2_w = \gamma(0) - \hat{\boldsymbol{\gamma}_p}' \hat{\boldsymbol{\Gamma}_p^{-1}} \hat{\boldsymbol{\gamma}_p} \]

ARMA

Fitting \(ARMA(2,2)\)

\[ y_t = \delta + \phi_1 \, y_{t-1} + \phi_2 \, y_{t-2} + \theta_1 w_{t-1} + \theta_2 w_{t-2} + w_t \]

We now need to estimate six parameters: \(\delta\), \(\phi_1\), \(\phi_2\), \(\theta_1\), \(\theta_2\) and \(\sigma_w^2\).

\(~\)

We could figure out \(E(y_t)\), \(Var(y_t)\), and \(Cov(y_t, y_{t+h})\), but the last two are going to be pretty nasty and the full MVN likehood is similarly going to be unpleasant to work with.

\(~\)

Like the AR(1) and AR(p) processes we want to use conditioning to simplify things. \[ \begin{aligned} y_t | \delta, &y_{t-1}, y_{t-2}, w_{t-1}, w_{t-2} \\ &\sim \mathcal{N}(\delta + \phi_1 \, y_{t-1} + \phi_2 \, y_{t-2} + \theta_1 w_{t-1} + \theta_2 w_{t-2},~\sigma_w^2) \end{aligned} \]

ARMA(2,2) Example

with \(\phi = (0.75,-0.5)\), \(\theta = (0.5,0.2)\), \(\delta=0\), and \(\sigma_w^2=1\) using the same models

ARIMA

forecast::Arima(y, order = c(2,0,2), include.mean = FALSE) %>% summary()

Series: y 
ARIMA(2,0,2) with zero mean 

Coefficients:
         ar1      ar2     ma1     ma2
      0.7290  -0.4967  0.4896  0.2543
s.e.  0.0868   0.0586  0.0940  0.0727

sigma^2 = 1.082:  log likelihood = -728.13
AIC=1466.26   AICc=1466.38   BIC=1487.33

Training set error measures:
                     ME     RMSE       MAE       MPE     MAPE      MASE
Training set 0.02160055 1.036121 0.8309352 -7.423015 270.5659 0.6674512
                     ACF1
Training set -0.001909506

AR only lm

lm(y ~ lag(y,1) + lag(y,2)) %>% summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2))

Residuals:
     Min       1Q   Median       3Q      Max 
-2.95562 -0.69955  0.00587  0.77063  3.13283 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.02892    0.04802   0.602    0.547    
lag(y, 1)    1.07883    0.03430  31.455   <2e-16 ***
lag(y, 2)   -0.64708    0.03438 -18.820   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.071 on 495 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.6677,    Adjusted R-squared:  0.6664 
F-statistic: 497.4 on 2 and 495 DF,  p-value: < 2.2e-16

Hannan-Rissanen Algorithm

Estimate a high order AR (remember AR \(\Leftrightarrow\) MA when stationary + invertible)
Use AR to estimate values for unobserved \(w_t\) via lm with lags
Regress \(y_t\) onto \(y_{t-1}, \ldots, y_{t-p}, \hat{w}_{t-1}, \ldots \hat{w}_{t-q}\)
Update \(\hat{w}_{t-1}, \ldots \hat{w}_{t-q}\) based on current model,
Goto 3, repeat until convergence

Hannan-Rissanen - Step 1 & 2

(ar = ar.mle(y, order.max = 10))


Call:
ar.mle(x = y, order.max = 10)

Coefficients:
      1        2        3        4        5  
 1.2135  -0.8334   0.0921   0.1544  -0.0989  

Order selected 5  sigma^2 estimated as  1.072

(ar = forecast::Arima(y, order = c(10,0,0)))

Series: y 
ARIMA(10,0,0) with non-zero mean 

Coefficients:
         ar1      ar2     ar3     ar4      ar5
      1.2162  -0.8378  0.0927  0.1693  -0.1274
s.e.  0.0446   0.0702  0.0796  0.0800   0.0807
         ar6     ar7      ar8     ar9     ar10
      0.0198  0.0274  -0.0841  0.1001  -0.0736
s.e.  0.0809  0.0809   0.0808  0.0713   0.0453
        mean
      0.0501
s.e.  0.0930

sigma^2 = 1.09:  log likelihood = -726.47
AIC=1476.93   AICc=1477.57   BIC=1527.51

Residuals

forecast::ggtsdisplay(ar$resid)

Hannan-Rissanen - Step 3

d = tibble(
  y = y %>% strip_attr(),
  index = seq_along(y),
  w_hat1 = ar$resid %>% strip_attr()
)

(lm1 = lm(y ~ lag(y,1) + lag(y,2) + lag(w_hat1,1) + lag(w_hat1,2), data=d)) %>%
  summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2) + lag(w_hat1, 1) + lag(w_hat1, 
    2), data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.95099 -0.67750 -0.06171  0.71850  2.76012 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.03851    0.04682   0.822  0.41120    
lag(y, 1)       0.75235    0.07460  10.085  < 2e-16 ***
lag(y, 2)      -0.50700    0.04260 -11.901  < 2e-16 ***
lag(w_hat1, 1)  0.46543    0.08735   5.328 1.51e-07 ***
lag(w_hat1, 2)  0.23369    0.08193   2.852  0.00452 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.043 on 493 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.6861,    Adjusted R-squared:  0.6836 
F-statistic: 269.4 on 4 and 493 DF,  p-value: < 2.2e-16

Hannan-Rissanen - Step 4

d = modelr::add_residuals(d,lm1,"w_hat2")

(lm2 = lm(y ~ lag(y,1) + lag(y,2) + lag(w_hat2,1) + lag(w_hat2,2), data=d)) %>%
  summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2) + lag(w_hat2, 1) + lag(w_hat2, 
    2), data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.94460 -0.66888 -0.05112  0.74813  2.82163 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.03373    0.04685   0.720  0.47187    
lag(y, 1)       0.75142    0.07501  10.017  < 2e-16 ***
lag(y, 2)      -0.50683    0.04271 -11.866  < 2e-16 ***
lag(w_hat2, 1)  0.46357    0.08737   5.306  1.7e-07 ***
lag(w_hat2, 2)  0.23568    0.08183   2.880  0.00415 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.042 on 491 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.6877,    Adjusted R-squared:  0.6851 
F-statistic: 270.3 on 4 and 491 DF,  p-value: < 2.2e-16

Hannan-Rissanen - Step 3.2 + 4.2

d = modelr::add_residuals(d,lm2,"w_hat3")

(lm3 = lm(y ~ lag(y,1) + lag(y,2) + lag(w_hat3,1) + lag(w_hat3,2), data=d)) %>%
  summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2) + lag(w_hat3, 1) + lag(w_hat3, 
    2), data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.97513 -0.67010 -0.04563  0.76228  2.78317 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.02734    0.04700   0.582  0.56097    
lag(y, 1)       0.75611    0.07540  10.028  < 2e-16 ***
lag(y, 2)      -0.50692    0.04287 -11.826  < 2e-16 ***
lag(w_hat3, 1)  0.45652    0.08791   5.193 3.05e-07 ***
lag(w_hat3, 2)  0.22498    0.08228   2.734  0.00648 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.044 on 489 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.6845,    Adjusted R-squared:  0.6819 
F-statistic: 265.2 on 4 and 489 DF,  p-value: < 2.2e-16

Hannan-Rissanen - Step 3.3 + 4.3

d = modelr::add_residuals(d,lm3,"w_hat4")

(lm4 = lm(y ~ lag(y,1) + lag(y,2) + lag(w_hat4,1) + lag(w_hat4,2), data=d)) %>%
  summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2) + lag(w_hat4, 1) + lag(w_hat4, 
    2), data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.95836 -0.66815 -0.03775  0.74089  2.77938 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.02909    0.04699   0.619  0.53614    
lag(y, 1)       0.73928    0.07567   9.769  < 2e-16 ***
lag(y, 2)      -0.50025    0.04291 -11.657  < 2e-16 ***
lag(w_hat4, 1)  0.47626    0.08791   5.417 9.52e-08 ***
lag(w_hat4, 2)  0.24343    0.08248   2.951  0.00332 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.042 on 487 degrees of freedom
  (8 observations deleted due to missingness)
Multiple R-squared:  0.6859,    Adjusted R-squared:  0.6833 
F-statistic: 265.8 on 4 and 487 DF,  p-value: < 2.2e-16

Hannan-Rissanen - Step 3.4 + 4.4

d = modelr::add_residuals(d,lm4,"w_hat5")

(lm5 = lm(y ~ lag(y,1) + lag(y,2) + lag(w_hat5,1) + lag(w_hat5,2), data=d)) %>%
  summary()


Call:
lm(formula = y ~ lag(y, 1) + lag(y, 2) + lag(w_hat5, 1) + lag(w_hat5, 
    2), data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.96227 -0.68428 -0.04699  0.75613  2.77697 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.02844    0.04721   0.602  0.54724    
lag(y, 1)       0.75159    0.07681   9.785  < 2e-16 ***
lag(y, 2)      -0.50072    0.04335 -11.551  < 2e-16 ***
lag(w_hat5, 1)  0.46345    0.08938   5.185 3.18e-07 ***
lag(w_hat5, 2)  0.22949    0.08355   2.747  0.00624 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.044 on 485 degrees of freedom
  (10 observations deleted due to missingness)
Multiple R-squared:  0.685, Adjusted R-squared:  0.6824 
F-statistic: 263.6 on 4 and 485 DF,  p-value: < 2.2e-16

BRMS

( arma22_brms = brm(
    y~arma(p=2,q=2)-1, data=d, 
    chains=2, refresh=0, iter = 5000, cores = 4
) )

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ arma(p = 2, q = 2) - 1 
   Data: d (Number of observations: 500) 
  Draws: 2 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 5000

Correlation Structures:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ar[1]     0.73      0.09     0.54     0.90 1.00      789      967
ar[2]    -0.49      0.06    -0.60    -0.36 1.00     1237     1288
ma[1]     0.49      0.10     0.29     0.68 1.00      795     1054
ma[2]     0.25      0.08     0.09     0.39 1.00      873     1280

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.05      0.03     0.98     1.11 1.00     6406     3711

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Chains

plot(arma22_brms)

Comparison

Predictions

Forecasting

arma22_brms_fc = arma22_brms %>%
  predicted_draws_fix(
    newdata = tibble(index=501:550, y=NA)
  ) %>%
  filter(.chain == 1)

Stan Code

arma22_brms %>% stancode()

// generated with brms 2.18.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  // data needed for ARMA correlations
  int<lower=0> Kar;  // AR order
  int<lower=0> Kma;  // MA order
  // number of lags per observation
  int<lower=0> J_lag[N];
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int max_lag = max(Kar, Kma);
}
parameters {
  vector[Kar] ar;  // autoregressive coefficients
  vector[Kma] ma;  // moving-average coefficients
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // matrix storing lagged residuals
    matrix[N, max_lag] Err = rep_matrix(0, N, max_lag);
    vector[N] err;  // actual residuals
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    // include ARMA terms
    for (n in 1:N) {
      mu[n] += Err[n, 1:Kma] * ma;
      err[n] = Y[n] - mu[n];
      for (i in 1:J_lag[n]) {
        Err[n + 1, i] = err[n + 1 - i];
      }
      mu[n] += Err[n, 1:Kar] * ar;
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
}