Models for areal data

Lecture 19

Dr. Colin Rundel

areal / lattice data

Example - NC SIDS

Adjacency Matrix

1*st_touches(nc[1:12,], sparse=FALSE)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]    0    1    0    0    0    0    0    0    0     0     0     0
 [2,]    1    0    1    0    0    0    0    0    0     0     0     0
 [3,]    0    1    0    0    0    0    0    0    0     1     0     0
 [4,]    0    0    0    0    0    0    1    0    0     0     0     0
 [5,]    0    0    0    0    0    1    0    0    1     0     0     0
 [6,]    0    0    0    0    1    0    0    1    0     0     0     0
 [7,]    0    0    0    1    0    0    0    1    0     0     0     0
 [8,]    0    0    0    0    0    1    1    0    0     0     0     0
 [9,]    0    0    0    0    1    0    0    0    0     0     0     0
[10,]    0    0    1    0    0    0    0    0    0     0     0     1
[11,]    0    0    0    0    0    0    0    0    0     0     0     1
[12,]    0    0    0    0    0    0    0    0    0     1     1     0

Normalized spatial weight matrix

normalize_weights = function(w) {
  w = 1*w
  diag(w) = 0
  rs = rowSums(w)
  rs[rs == 0] = 1
  w/rs
}
normalize_weights( st_touches(nc[1:12,], sparse=FALSE) )
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [2,]  0.5  0.0  0.5  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [3,]  0.0  0.5  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.5   0.0   0.0
 [4,]  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0   0.0   0.0   0.0
 [5,]  0.0  0.0  0.0  0.0  0.0  0.5  0.0  0.0  0.5   0.0   0.0   0.0
 [6,]  0.0  0.0  0.0  0.0  0.5  0.0  0.0  0.5  0.0   0.0   0.0   0.0
 [7,]  0.0  0.0  0.0  0.5  0.0  0.0  0.0  0.5  0.0   0.0   0.0   0.0
 [8,]  0.0  0.0  0.0  0.0  0.0  0.5  0.5  0.0  0.0   0.0   0.0   0.0
 [9,]  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
[10,]  0.0  0.0  0.5  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.5
[11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   1.0
[12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.5   0.5   0.0

EDA - Moran’s I

If we have observations at \(n\) spatial locations \((s_1, \ldots s_n)\)

\[ I = \frac{n}{\sum_{i=1}^n \sum_{j=1}^n w_{ij}} \frac{\sum_{i=1}^n \sum_{j=1}^n w_{ij} \big(y(s_i)-\bar{y}\big)\big(y(s_j)-\bar{y}\big)}{\sum_{i=1}^n \big(y(s_i) - \bar{y}\big)^2} \] where \(\boldsymbol{w}\) is a spatial weights matrix.

Some properties of Moran’s I when there is no spatial autocorrelation / dependence:

  • \(E(I) = -1 / (n-1)\)

  • \(Var(I) = (\text{Something ugly but closed form})- E(I)^2\)

  • \(\underset{n\to\infty}{\lim} \frac{I - E(I)}{\sqrt{Var(I)}} \sim N(0,1)\) (via the CLT)

NC SIDS & Moran’s I

Lets start by using a normalized spatial weight matrix for \(\boldsymbol{w}\) (basedd on shared county borders).

morans_I = function(y, w) {
  w = normalize_weights(w)
  n = length(y)
  num = sum(w * (y-mean(y)) %*% t(y-mean(y)))  
  denom = sum( (y-mean(y))^2 )
  (n/sum(w)) * (num/denom)
}

w = st_touches(nc, sparse=FALSE)
morans_I(y = nc$SID74, w)
[1] 0.1477405

ape::Moran.I(nc$SID74, weight = w) %>% str()
List of 4
 $ observed: num 0.148
 $ expected: num -0.0101
 $ sd      : num 0.0627
 $ p.value : num 0.0118

EDA - Geary’s C

Like Moran’s I, if we have observations at \(n\) spatial locations \((s_1, \ldots s_n)\)

\[ C = \frac{n-1}{2\sum_{i=1}^n \sum_{j=1}^n w_{ij}} \frac{\sum_{i=1}^n \sum_{j=1}^n w_{ij} \big(y(s_i)-y(s_j)\big)^2}{\sum_{i=1}^n \big(y(s_i) - \bar{y}\big)^2} \] where \(\boldsymbol{w}\) is a spatial weights matrix.

Some properties of Geary’s C:

  • \(0 < C < 2\)
    • If \(C \approx 1\) then no spatial autocorrelation
    • If \(C > 1\) then negative spatial autocorrelation
    • If \(C < 1\) then positive spatial autocorrelation
  • Geary’s C is inversely related to Moran’s I

NC SIDS & Geary’s C

Again using an normalized adjacency matrix for \(\boldsymbol{w}\) (shared county borders).

gearys_C = function(y, w) {
  w = normalize_weights(w)
  
  n = length(y)
  y_i = y %*% t(rep(1,n))
  y_j = t(y_i)
  ((n-1)/(2*sum(w))) * (sum(w * (y_i-y_j)^2) / sum( (y - mean(y))^2 ))
}

w = 1*st_touches(nc, sparse=FALSE)
gearys_C(y = nc$SID74, w = w)
[1] 0.8438767

Spatial Correlogram

nc_pt = st_centroid(nc)
plot(nc_pt[,"SID74"], pch=16)

Here we are defining adjacency based on the centroid being within a given distance,

Autoregressive Models

AR Models - Time

Lets return to the simplest case, an \(AR(1)\) process

\[ y_t = \delta + \phi \, y_{t-1} + w_t \]

where \(w_t \sim N(0,\sigma^2_w)\) and \(|\phi| < 1\), then

\[ \begin{aligned} E(y_t) &= \frac{\delta}{1-\phi} \\ Var(y_t) &= \frac{\sigma^2}{1-\phi} \\ \rho(h) &= \phi^{h} \\ \gamma(h) &= \phi^h \frac{\sigma^2}{1-\phi} \end{aligned} \]

AR Models - Time - Joint Distribution

Previously we saw that an \(AR(1)\) model can be represented using a multivariate normal distribution

\[ \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} \sim N \begin{pmatrix} \frac{\delta}{1-\phi} \begin{pmatrix}1\\ 1\\ \vdots\\ 1\end{pmatrix},~ \frac{\sigma^2}{1-\phi} \begin{pmatrix} 1 & \phi & \cdots & \phi^{n-1} \\ \phi & 1 & \cdots & \phi^{n-2} \\ \vdots & \vdots & \ddots & \vdots \\ \phi^{n-1} & \phi^{n-2} & \cdots & 1 \\ \end{pmatrix} \end{pmatrix} \]

In writing down the likelihood we also saw that an \(AR(1)\) is 1st order Markovian,

\[ \begin{aligned} f(y_1, \ldots, y_n) &= f(y_1) \, f(y_2 \mid y_1) \, f(y_3 \mid y_2,y_1) \,\cdots\, f(y_n|y_{n-1},y_{n-2},\ldots,y_1) \\ &= f(y_1) \, f(y_2 \mid y_1) \, f(y_3 \mid y_2) \,\cdots\, f(y_n \mid y_{n-1}) \end{aligned} \]

Alternative Definitions for \(y_t\)

\[ y_t = \delta + \phi \, y_{t-1} + w_t \]

vs.

\[ y_t\mid y_{t-1} \sim N(\delta + \phi \, y_{t-1},~\sigma^2) \]

In the case of time, both of these definitions result in the same multivariate distribution for \(\boldsymbol{y}\) given on the previous slide.

AR in Space

Even in the simplest spatial case there is no clear / unique ordering,

\[ \begin{aligned} f\big(y(s_1), \ldots, y(s_{10})\big) &= f\big(y(s_1)\big) \, f\big(y(s_2) \mid y(s_1)\big) \, \cdots \, f\big(y(s_{10}\mid y(s_{9}),y(s_{8}),\ldots,y(s_1)\big) \\ &= f\big(y(s_{10})\big) \, f\big(y(s_9) \mid y(s_{10})\big) \, \cdots \, f\big(y(s_{1}\mid y(s_{2}),y(s_{3}),\ldots,y(s_{10})\big) \\ &= ~? \end{aligned} \]

Instead we need to think about things in terms of their neighbors / neighborhoods. We define \(N(s_i)\) to be the set of neighbors of location \(s_i\).

  • If we define the neighborhood based on “touching” then \(N(s_3) = \{s_2, s_4\}\)

  • If we use distance within 2 units then \(N(s_3) = \{s_1,s_2,s_3,s_4\}\)

Defining the Spatial AR model

Here we will consider a simple average of neighboring observations, just like with the temporal AR model we have two options in terms of defining the autoregressive process,

  • Simultaneous Autogressve (SAR) \[ y(s) = \delta + \phi \frac{1}{|N(s)|}\sum_{s' \in N(s)} y(s') + N(0,\sigma^2) \]
  • Conditional Autoregressive (CAR) \[ y(s) \mid y(-s) \sim N\left(\delta + \phi \frac{1}{|N(s)|}\sum_{s' \in N(s)} y(s'),~ \sigma^2 \right) \]

Simultaneous Autogressve (SAR)

Using \[ y(s) = \phi \frac{1}{|N(s)|}\sum_{s' \in N(s)} y(s') + N(0,\sigma^2) \] we want to find the distribution of \(\boldsymbol{y} = \Big(y(s_1),\, y(s_2),\,\ldots,\,y(s_n)\Big)^t\).

First we can define a weight matrix \(\boldsymbol{W}\) where \[ \{\boldsymbol{W}\}_{ij} = \begin{cases} 1/|N(s_i)| & \text{if $j \in N(s_i)$} \\ 0 & \text{otherwise} \end{cases} \] then we can write \(\boldsymbol{y}\) as follows, \[ \boldsymbol{y} = \phi \, \boldsymbol{W} \, \boldsymbol{y} + \boldsymbol{\epsilon} \] where \[ \boldsymbol{\epsilon} \sim N(0,\sigma^2 \, \boldsymbol{I}) \]

A toy example

Back to SAR

\[ \boldsymbol{y} = \phi \, \boldsymbol{W} \, \boldsymbol{y} + \boldsymbol{\epsilon} \]

Conditional Autogressve (CAR)

This is a bit trickier, in the case of the temporal AR process we actually went from joint distribution \(\to\) conditional distributions (which we were then able to simplify).

Since we don’t have a natural ordering we can’t get away with this (at least not easily).

Going the other way, conditional distributions \(\to\) joint distribution is difficult because it is possible to specify conditional distributions that lead to an improper joint distribution.

Brooks’ Lemma

For sets of observations \(\boldsymbol{x}\) and \(\boldsymbol{y}\) where \(p(x) > 0~~\forall ~ x\in\boldsymbol{x}\) and \(p(y) > 0~~\forall ~ y\in\boldsymbol{y}\) then

\[ \begin{aligned} \frac{p(\boldsymbol{y})}{p(\boldsymbol{x})} &= \prod_{i=1}^n \frac{p(y_i \mid y_1,\ldots,y_{i-1},x_{i+1},\ldots,x_n)}{p(x_i \mid y_1,\ldots,y_{i-1},x_{i+1},\ldots,x_n)} \\ &= \prod_{i=1}^n \frac{p(y_i \mid x_1,\ldots,x_{i-1},y_{i+1},\ldots,y_n)}{p(x_i \mid x_1,\ldots,x_{i-1},y_{i+1},\ldots,y_n)} \\ \end{aligned} \]

A simplified example

Let \(\boldsymbol{y} = (y_1,y_2)\) and \(\boldsymbol{x} = (x_1,x_2)\) then we can derive Brook’s Lemma for this case,

\[ \begin{aligned} p (y_1,y_2) &= p(y_1 \mid y_2) p(y_2) \\ &= p(y_1 \mid y_2) \frac{p(y_2 \mid x_1)}{p(x_1 \mid y_2)} p(x_1) \\ & = \frac{p(y_1 \mid y_2)}{p(x_1 \mid y_2)} p(y_2 \mid x_1) \, p(x_1) \\ & = \frac{p(y_1 \mid y_2)}{p(x_1 \mid y_2)} p(y_2 \mid x_1) \, p(x_1) \left(\frac{p(x_2 \mid x_1)}{p(x_2 \mid x_1)}\right) \\ & = \frac{p(y_1 \mid y_2)}{p(x_1 \mid y_2)} \frac{p(y_2 \mid x_1)}{p(x_2 \mid x_1)} \, p(x_1,x_2) \\ \end{aligned} \]


This is just a derivation of Brook’s lemma for \(\boldsymbol{y} = (y_1,y_2)\) and \(\boldsymbol{x} = (x_1,x_2)\).

\[ \begin{aligned} p (y_1,y_2) & = \frac{p(y_1 \mid y_2)}{p(x_1 \mid y_2)} \frac{p(y_2 \mid x_1)}{p(x_2 \mid x_1)} \, p(x_1,x_2) \\ \\ \frac{p (y_1,y_2) }{p(x_1,x_2)} & = \frac{p(y_1 \mid y_2)}{p(x_1 \mid y_2)} \frac{p(y_2 \mid x_1)}{p(x_2 \mid x_1)} \end{aligned} \]


From which we can generalize for \(n=3\),

\[ \begin{aligned} \frac{p(y_1,y_2,y_3)}{p(x_1,x_2,x_3)} = \frac{p(y_1 \mid y_2,y_3)}{p(x_1 \mid y_2,y_3)} \frac{p(y_2 \mid x_1,y_3)}{p(x_2 \mid x_1,y_3} \frac{p(y_3 \mid x_1,x_2)}{p(x_3 \mid x_1,x_2)} \end{aligned} \] and so on.

Utility?

Lets repeat that last example but consider the case where \(\boldsymbol{y} = (y_1,y_2)\) but now we let \(\boldsymbol{x} = (y_1=0,y_2=0)\)

\[ \begin{aligned} \frac{p (y_1,y_2) }{p(x_1,x_2)} &= \frac{p (y_1,y_2) }{p(y_1=0,y_2=0)} \end{aligned} \]

\[ \begin{aligned} p(y_1,y_2) &= \frac{p(y_1 \mid y_2)}{p(y_1=0 \mid y_2)} \frac{p(y_2 \mid y_1=0)}{p(y_2=0 \mid y_1=0)} ~ p(y_1=0,y_2=0) \end{aligned} \]

\[ \begin{aligned} p(y_1,y_2) &\propto \frac{p(y_1 \mid y_2) ~ p(y_2 \mid y_1=0) }{ p(y_1=0 \mid y_2)} \\ &\propto \frac{p(y_2 \mid y_1) ~ p(y_1 \mid y_2=0) }{ p(y_2=0 \mid y_1)} \end{aligned} \]

As applied to a simple CAR

\[ \begin{aligned} y(s_1)\mid y(s_2) \sim N(\phi W_{12}\, y(s_2),\, \sigma^2) \\ y(s_2)\mid y(s_1) \sim N(\phi W_{21}\, y(s_1),\, \sigma^2) \end{aligned} \]

\[ \begin{aligned} p\big(y(s_1),y(s_2)\big) &\propto \frac{p\big(y(s_1) \mid y(s_2)\big) ~ p\big(y(s_2) \mid y(s_1)=0\big)}{p\big(y(s_1)=0 \mid y(s_2)\big)}\\ &\propto \frac{ \exp\left(-\frac{1}{2\sigma^2}\left(y(s_1)-\phi \, W_{12} \, y(s_2)\right)^2\right) \exp\left(-\frac{1}{2\sigma^2}\left(y(s_2)-\phi \, W_{21} \, 0\right)^2\right) }{ \exp\left(-\frac{1}{2\sigma^2}\left(0-\phi W_{12} y(s_2)\right)^2 \right) }\\ &\propto \exp\left(-\frac{1}{2\sigma^2}\left(\left(y(s_1)-\phi \, W_{12} \, y(s_2)\right)^2 + y(s_2)^2- (\phi W_{21} y(s_2))^2\right)\right) \\ &\propto \exp\left(-\frac{1}{2\sigma^2}\left(y(s_1)^2-\phi \, W_{12} \, y(s_1)\,y(s_2) -\phi \, W_{21} \, y(s_1)\,y(s_2) + y(s_2)^2\right)\right) \\ &\propto \exp\left(-\frac{1}{2\sigma^2} (\boldsymbol{y}-0) \begin{pmatrix} 1 & -\phi W_{12} \\ -\phi W_{21} & 1 \end{pmatrix} (\boldsymbol{y}-0)^{t} \right) \end{aligned} \]

Implications for \(\boldsymbol{y}\)

\[ \boldsymbol{\mu} = 0 \]

\[ \begin{aligned} \boldsymbol{\Sigma}^{-1} &= \frac{1}{\sigma^2} \begin{pmatrix} 1 & -\phi W_{12} \\ -\phi W_{21} & 1 \end{pmatrix} \\ &= \frac{1}{\sigma^2}(\boldsymbol{I} - \phi \, \boldsymbol{W}) \end{aligned} \]

\[ \boldsymbol{\Sigma} = \sigma^2(\boldsymbol{I} - \phi \, \boldsymbol{W})^{-1} \]

we can then conclude that for \(\boldsymbol{y} = (y(s_1),~y(s_2))^t\),

\[ \boldsymbol{y} \sim N\left( \boldsymbol{0}, ~ \sigma^2 (\boldsymbol{I} - \phi \, \boldsymbol{W})^{-1} \right) \]

which generalizes for all mean 0 CAR models.

General Proof

Let \(\boldsymbol{y} = (y(s_1),\ldots,y(s_n))\) and \(\boldsymbol{0} = (y(s_1) = 0, \ldots, y(s_n)=0)\) then by Brook’s lemma,

\[ \begin{aligned} \frac{p(\boldsymbol{y})}{p(\boldsymbol{0})} &= \prod_{i=1}^n \frac{p(y_i \mid y_1,\ldots,y_{i-1},0_{i+1},\ldots,0_{n})}{p(0_i \mid y_1,\ldots,y_{i-1},0_{i+1},\ldots,0_{n})} = \prod_{i=1}^n \frac{ \exp\left(-\frac{1}{2\sigma^2} \left(y_i - \phi \sum_{j<i} W_{ij} \, y_j - \phi \sum_{j>i} 0_j \right)^2 \right) }{ \exp\left(-\frac{1}{2\sigma^2} \left(0_i - \phi \sum_{j<i} W_{ij} \, y_j - \phi \sum_{j>i} 0_j \right)^2 \right) } \\ &= \exp\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n \left(y_i - \phi \sum_{j<i} W_{ij} \, y_j\right)^2 + \frac{1}{2\sigma^2} \sum_{i=1}^n \left( \phi \sum_{j<i} W_{ij} \, y_j \right)^2 \right) \\ &= \exp\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n y_i^2 - 2 \phi y_i \,\sum_{j<i} W_{ij} \, y_j \right) \\ &= \exp\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n y_i^2 - \phi \sum_{i=1}^n \sum_{j=1}^n y_i \, W_{ij} \, y_j \right) \quad \mathit{\big(\text{if } W_{ij} = W_{ji}\big)} \\ &= \exp\left(-\frac{1}{2\sigma^2} (\boldsymbol{y}-0)^t (\boldsymbol{I} - \phi \boldsymbol{W}) (\boldsymbol{y}-0) \right) \end{aligned} \]

CAR vs SAR

  • Simultaneous Autogressve (SAR)

\[ y(s) = \phi \sum_{s'} W_{s\;s'} \, y(s') + \epsilon \]

\[ \boldsymbol{y} \sim N(0,~\sigma^2 \, ((\boldsymbol{I}-\phi \boldsymbol{W})^{-1}) ((\boldsymbol{I}-\phi \boldsymbol{W})^{-1})^t )\]

  • Conditional Autoregressive (CAR)

\[ y(s) \mid y(-s) \sim N\left(\sum_{s'} W_{s\;s'} \, y(s'),~ \sigma^2 \right) \]

\[ \boldsymbol{y} \sim N(0,~\sigma^2 \, (\boldsymbol{I}-\phi \boldsymbol{W})^{-1})\]

Generalizations

  • Adopting different weight matrices \((\boldsymbol{W})\)

    • Between SAR and CAR model we move to a generic weight matrix definition (beyond average of nearest neighbors)

    • In time we varied \(p\) in the \(AR(p)\) model, in space we adjust the weight matrix.

    • In general having a symmetric W is helpful, but not required

  • More complex Variance (beyond \(\sigma^2 \, I\))

    • \(\sigma^2\) can be a vector (differences between areal locations)

    • i.e. since areal data tends to be aggregated - adjust variance based on sample size

    • i.e. scale based on the number of neighbors

Some specific generalizations

Generally speaking we want to work with a scaled / normalized version of the weight matrix, \[ W_{ij}/W_{i\boldsymbol{\cdot}} \]

When \(W\) is derived from an adjacency matrix, \(\boldsymbol{A}\), we can express this as \[ \boldsymbol{W} = \boldsymbol{D}^{-1} \boldsymbol{A} \] where \(\boldsymbol{D}^{-1} = \text{diag}(1/|N(s_i)|)\).

We can also allow \(\sigma^2\) to vary between locations, we can define this as \(\boldsymbol{D}_{\sigma^2} = \text{diag}(\sigma^2_i)\) and most often we use \[ \boldsymbol{D}_{\sigma^2}^{-1} = \text{diag}\left(\frac{\sigma^2}{|N(s_i)|}\right) = \sigma^2 \boldsymbol{D}^{-1}. \]

Revised SAR Model

  • Formula Model \[ y(s_i) = X_{i\cdot}\beta + \phi \sum_{j=1}^n D^{-1}_{jj} \, A_{ij} \, \big(y(s_j) - X_{j\cdot}\beta\big) + \epsilon_i \] \[ \boldsymbol{\epsilon} \sim N(\boldsymbol{0},\,\boldsymbol{D}_{\sigma^2}^{-1}) = N(\boldsymbol{0},\, \sigma^2 \boldsymbol{D}^{-1}) \]
  • Joint Model \[ \boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \phi \boldsymbol{D}^{-1} \boldsymbol{A} ~\big(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\big) + \boldsymbol{\epsilon} \]

\[ \boldsymbol{y} \sim N\left(\boldsymbol{X}\boldsymbol{\beta}, (\boldsymbol{I} - \phi \boldsymbol{D}^{-1} \boldsymbol{A})^{-1} \sigma^2 \boldsymbol{D}^{-1} \big((\boldsymbol{I} - \phi \boldsymbol{D}^{-1} \boldsymbol{A})^{-1}\big)^t \right) \]

Revised CAR Model

  • Conditional Model \[ y(s_i) \mid y_{-s_i} \sim N\left(X_{i\cdot}\beta + \phi\sum_{j=1}^n \frac{W_{ij}}{D_{ii}} ~ \big(y(s_j)-X_{j\cdot}\beta\big),~ \sigma^2 D^{-1}_{ii} \right) \]
  • Joint Model

\[ \boldsymbol{y} \sim N(\boldsymbol{X}\boldsymbol{\beta},~\Sigma_{CAR}) \]

\[ \begin{aligned} \Sigma_{CAR} &= (\boldsymbol{D}_{\sigma} \, (\boldsymbol{I}-\phi \boldsymbol{D}^{-1}\boldsymbol{A}))^{-1} \\ &= (1/\sigma^2 \boldsymbol{D} \, (\boldsymbol{I}-\phi \boldsymbol{D}^{-1}\boldsymbol{A}))^{-1} \\ &= (1/\sigma^2 (\boldsymbol{D}-\phi \boldsymbol{A}))^{-1} \\ &= \sigma^2(\boldsymbol{D}-\phi \boldsymbol{A})^{-1} \end{aligned} \]

Toy CAR Example

\[ \boldsymbol{A} = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} \qquad\qquad \boldsymbol{D} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{pmatrix} \]

\[ \boldsymbol{\Sigma} = \sigma^2 \, (\boldsymbol{D} - \phi \, \boldsymbol{W}) = \sigma^2~\begin{pmatrix} 1 & -\phi & 0 \\ -\phi & 2 & -\phi \\ 0 & -\phi & 1 \end{pmatrix}^{-1} \]

When does \(\Sigma\) exist?

check_sigma = function(phi) {
  Sigma_inv = matrix(c(1,-phi,0,-phi,2,-phi,0,-phi,1), ncol=3, byrow=TRUE) 
  solve(Sigma_inv)
}

check_sigma(phi=0)
     [,1] [,2] [,3]
[1,]    1  0.0    0
[2,]    0  0.5    0
[3,]    0  0.0    1
check_sigma(phi=0.5)
          [,1]      [,2]      [,3]
[1,] 1.1666667 0.3333333 0.1666667
[2,] 0.3333333 0.6666667 0.3333333
[3,] 0.1666667 0.3333333 1.1666667
check_sigma(phi=-0.6)
         [,1]     [,2]     [,3]
[1,]  1.28125 -0.46875  0.28125
[2,] -0.46875  0.78125 -0.46875
[3,]  0.28125 -0.46875  1.28125

check_sigma(phi=1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma(phi=-1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma(phi=1.2)
           [,1]      [,2]       [,3]
[1,] -0.6363636 -1.363636 -1.6363636
[2,] -1.3636364 -1.136364 -1.3636364
[3,] -1.6363636 -1.363636 -0.6363636
check_sigma(phi=-1.2)
           [,1]      [,2]       [,3]
[1,] -0.6363636  1.363636 -1.6363636
[2,]  1.3636364 -1.136364  1.3636364
[3,] -1.6363636  1.363636 -0.6363636

When is \(\Sigma\) positive semidefinite?

check_sigma_pd = function(phi) {
  Sigma_inv = matrix(c(1,-phi,0,-phi,2,-phi,0,-phi,1), ncol=3, byrow=TRUE) 
  chol(solve(Sigma_inv))
}

check_sigma_pd(phi=0)
     [,1]      [,2] [,3]
[1,]    1 0.0000000    0
[2,]    0 0.7071068    0
[3,]    0 0.0000000    1
check_sigma_pd(phi=0.5)
         [,1]      [,2]      [,3]
[1,] 1.080123 0.3086067 0.1543033
[2,] 0.000000 0.7559289 0.3779645
[3,] 0.000000 0.0000000 1.0000000
check_sigma_pd(phi=-0.6)
         [,1]       [,2]       [,3]
[1,] 1.131923 -0.4141182  0.2484709
[2,] 0.000000  0.7808688 -0.4685213
[3,] 0.000000  0.0000000  1.0000000

check_sigma_pd(phi=1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=-1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive definite
check_sigma_pd(phi=-1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive definite

Conclusions

Generally speaking just like the AR(1) model for time series we require that \(|\phi| < 1\) for the CAR model to be proper.

These results for \(\phi\) also apply in the context where \(\sigma^2_i\) is constant across locations, i.e.  \[ \boldsymbol{\Sigma} = \big(\sigma^2 \, (\boldsymbol{I}-\phi \boldsymbol{D}^{-1}\boldsymbol{A})\big)^{-1} \]

As a side note, the special case where \(\phi=1\) is known as an intrinsic autoregressive (IAR) model and they are popular as an improper prior for spatial random effects. An additional sum constraint is necessary for identifiability \[\sum_{i=1}^n y(s_i) = 0\]