# Iterative INLA method

## The INLA method for linear predictors

The INLA method is used to compute fast approximative posterior distribution for Bayesian generalised additive models. The hierarchical structure of such a model with latent Gaussian components $$\boldsymbol{u}$$, covariance parameters $$\boldsymbol{\theta}$$, and measured response variables $$\boldsymbol{y}$$, can be written as \begin{aligned} \boldsymbol{\theta} &\sim p(\boldsymbol{\theta}) \\ \boldsymbol{u}|\boldsymbol{\theta} &\sim \mathcal{N}\!\left(\boldsymbol{\mu}_u, \boldsymbol{Q}(\boldsymbol{\theta})^{-1}\right) \\ \boldsymbol{\eta}(\boldsymbol{u}) &= \boldsymbol{A}\boldsymbol{u} \\ \boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta} & \sim p(\boldsymbol{y}|\boldsymbol{\eta}(\boldsymbol{u}),\boldsymbol{\theta}) \end{aligned} where typically each linear predictor element, $$\eta_i(\boldsymbol{u})$$, is linked to a location parameter of the distribution for observation $$y_i$$, for each $$i$$, via a (non-linear) link function $$g^{-1}(\cdot)$$. In the R-INLA implementation, the observations are assumed to be conditionally independent, given $$\boldsymbol{\eta}$$ and $$\boldsymbol{\theta}$$.

## Approximate INLA for non-linear predictors

The premise for the inlabru method for non-linear predictors is to build on the existing implementation, and only add a linearisation step. The properties of the resulting approximation will depend on the nature of the non-linearity.

Let $$\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})$$ be a non-linear predictor, and let $$\overline{\boldsymbol{\eta}}(\boldsymbol{u})$$ be the 1st order Taylor approximation at $$\boldsymbol{u}_0$$, $\overline{\boldsymbol{\eta}}(\boldsymbol{u}) = \widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) + \boldsymbol{B}(\boldsymbol{u} - \boldsymbol{u}_0) = \left[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}_0) - \boldsymbol{B}\boldsymbol{u}_0\right] + \boldsymbol{B}\boldsymbol{u} ,$ where $$\boldsymbol{B}$$ is the derivative matrix for the non-linear predictor, evaluated at $$\boldsymbol{u}_0$$.

The non-linear observation model $$p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta})$$ is approximated by replacing the non-linear predictor with its linearisation, so that the linearised model is defined by $\overline{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta}) = p(\boldsymbol{y}|\overline{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = p(\boldsymbol{y}|g^{-1}[\overline{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) \approx p(\boldsymbol{y}|g^{-1}[\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})],\boldsymbol{\theta}) = p(\boldsymbol{y}|\widetilde{\boldsymbol{\eta}}(\boldsymbol{u}),\boldsymbol{\theta}) = \widetilde{p}(\boldsymbol{y}|\boldsymbol{u},\boldsymbol{\theta})$ The non-linear model posterior is factorised as $\widetilde{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \widetilde{p}(\boldsymbol{\theta}|\boldsymbol{y})\widetilde{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}),$ and the linear model approximation is factorised as $\overline{p}(\boldsymbol{\theta},\boldsymbol{u}|\boldsymbol{y}) = \overline{p}(\boldsymbol{\theta}|\boldsymbol{y})\overline{p}(\boldsymbol{u}|\boldsymbol{y},\boldsymbol{\theta}) .$

### Fixed point iteration

The remaining step of the approximation is how to choose the linearisation point $$\boldsymbol{u}_0$$. We start by introducing a functional $$f(\overline{p}_{\boldsymbol{v}})$$ of the posterior distribution linearised at $$\boldsymbol{v}$$, that generates a latent field configuration. We then seek a fix point of the functional, so that $$\boldsymbol{u}_0=f(\overline{p}_{\boldsymbol{u}_0})$$. Potential choices for $$f(\cdot)$$ include the posterior expectation $$\overline{E}(\boldsymbol{u}|\boldsymbol{y})$$ and the joint conditional mode, $f(\overline{p}_{\boldsymbol{v}})=\mathop{\mathrm{arg\,max}}_{\boldsymbol{u}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{u}|\boldsymbol{y},\widehat{\boldsymbol{\theta}}),$ where $$\widehat{\boldsymbol{\theta}}=\mathop{\mathrm{arg\,max}}_{\boldsymbol{\theta}} \overline{p}_{\boldsymbol{v}}(\boldsymbol{\theta}|\boldsymbol{y})$$ (used in inlabru from version 2.2.31) One key to the fix point iteration is that the observation model is linked to $$\boldsymbol{u}$$ only through the non-linear predictor $$\widetilde{\boldsymbol{\eta}}(\boldsymbol{u})$$.

1. Let $$\boldsymbol{u}_0$$ be an initial linearisation point for the latent variables.
2. Compute the predictor linearisation at $$\boldsymbol{u}_0$$,
3. Compute the linearised INLA posterior $$\overline{p}(\boldsymbol{\theta}|\boldsymbol{y})$$
4. Let $$\boldsymbol{u}_1=f(\overline{p}_{\boldsymbol{u}_0})$$ be the initial candidate for new linearisation point.
5. Let $$\boldsymbol{u}_\alpha=(1-\alpha)\boldsymbol{u}_0+\alpha\boldsymbol{u}_1$$, and find the value $$\alpha$$ that minimises $$\|\widetilde{\eta}(\boldsymbol{u}_\alpha)-\overline{\eta}(\boldsymbol{u}_1)\|$$.
6. Set the linearisation point equal to $$\boldsymbol{u}_\alpha$$ and repeat from step 1, unless the iteration has converged to a given tolerance.

1. Note: In inlabru version 2.1.15, $f(\overline{p}_{\boldsymbol{v}})=\left\{\mathop{\mathrm{arg\,max}}_{u_i} \overline{p}_{\boldsymbol{v}}(u_i|\boldsymbol{y}),\,i=1,\dots,n\right\},$ was used, which caused problems for some nonlinear models.↩︎