Instead we need to think about things in terms of their neighbors / neighborhoods. We define \(N(s_i)\) to be the set of neighbors of location \(s_i\).
If we define the neighborhood based on “touching” then \(N(s_3) = \{s_2, s_4\}\)
If we use distance within 2 units then \(N(s_3) = \{s_1,s_2,s_3,s_4\}\)
Defining the Spatial AR model
Here we will consider a simple average of neighboring observations, just like with the temporal AR model we have two options in terms of defining the autoregressive process,
Using \[ y(s) = \phi \frac{1}{|N(s)|}\sum_{s' \in N(s)} y(s') + N(0,\sigma^2) \] we want to find the distribution of \(\boldsymbol{y} = \Big(y(s_1),\, y(s_2),\,\ldots,\,y(s_n)\Big)^t\).
First we can define a weight matrix \(\boldsymbol{W}\) where \[
\{\boldsymbol{W}\}_{ij} = \begin{cases}
1/|N(s_i)| & \text{if $j \in N(s_i)$} \\
0 & \text{otherwise}
\end{cases}
\] then we can write \(\boldsymbol{y}\) as follows, \[ \boldsymbol{y} = \phi \, \boldsymbol{W} \, \boldsymbol{y} + \boldsymbol{\epsilon} \] where \[ \boldsymbol{\epsilon} \sim N(0,\sigma^2 \, \boldsymbol{I}) \]
This is a bit trickier, in the case of the temporal AR process we actually went from joint distribution \(\to\) conditional distributions (which we were then able to simplify).
Since we don’t have a natural ordering we can’t get away with this (at least not easily).
Going the other way, conditional distributions \(\to\) joint distribution is difficult because it is possible to specify conditional distributions that lead to an improper joint distribution.
Brooks’ Lemma
For sets of observations \(\boldsymbol{x}\) and \(\boldsymbol{y}\) where \(p(x) > 0~~\forall ~ x\in\boldsymbol{x}\) and \(p(y) > 0~~\forall ~ y\in\boldsymbol{y}\) then
Adopting different weight matrices \((\boldsymbol{W})\)
Between SAR and CAR model we move to a generic weight matrix definition (beyond average of nearest neighbors)
In time we varied \(p\) in the \(AR(p)\) model, in space we adjust the weight matrix.
In general having a symmetric W is helpful, but not required
More complex Variance (beyond \(\sigma^2 \, I\))
\(\sigma^2\) can be a vector (differences between areal locations)
i.e. since areal data tends to be aggregated - adjust variance based on sample size
i.e. scale based on the number of neighbors
Some specific generalizations
Generally speaking we want to work with a scaled / normalized version of the weight matrix, \[ W_{ij}/W_{i\boldsymbol{\cdot}} \]
When \(W\) is derived from an adjacency matrix, \(\boldsymbol{A}\), we can express this as \[ \boldsymbol{W} = \boldsymbol{D}^{-1} \boldsymbol{A} \] where \(\boldsymbol{D}^{-1} = \text{diag}(1/|N(s_i)|)\).
We can also allow \(\sigma^2\) to vary between locations, we can define this as \(\boldsymbol{D}_{\sigma^2} = \text{diag}(\sigma^2_i)\) and most often we use \[ \boldsymbol{D}_{\sigma^2}^{-1} = \text{diag}\left(\frac{\sigma^2}{|N(s_i)|}\right) = \sigma^2 \boldsymbol{D}^{-1}. \]
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=-1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive definite
check_sigma_pd(phi=-1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive definite
Conclusions
Generally speaking just like the AR(1) model for time series we require that \(|\phi| < 1\) for the CAR model to be proper.
These results for \(\phi\) also apply in the context where \(\sigma^2_i\) is constant across locations, i.e. \[
\boldsymbol{\Sigma} = \big(\sigma^2 \, (\boldsymbol{I}-\phi \boldsymbol{D}^{-1}\boldsymbol{A})\big)^{-1}
\]
As a side note, the special case where \(\phi=1\) is known as an intrinsic autoregressive (IAR) model and they are popular as an improper prior for spatial random effects. An additional sum constraint is necessary for identifiability \[\sum_{i=1}^n y(s_i) = 0\]