The `synlik`

package provides Synthetic Likelihood methods for intractable likelihoods. The package is meant to be
as general purpose as possible: as long as you are able to simulate data from your model you should be able to fit it.

A `synlik`

object is mainly composed of the `simulator`

, which is the function that simulates data from the model of interest.
In addition, it is possible to specify a function `summaries`

, which transforms the data generated by `simulator`

into
summary statistics. The `simulator`

can generate any kind of output, as long as `summaries`

is able to transform it into a matrix
where each row is a simulated vector of statistics. If `summaries`

is not specified, then `simulator`

has to output such a matrix.

Here we set-up a `synlik`

object representing the Ricker map. The observations are given by \(Y_t \sim Pois(\phi N_t)\), where the hidden state has the following dynamics: \(N_t = r N_{t-1} exp( -N_{t-1} + e_t )\). This is how we create the object:

```
library(synlik)
```

```
## Loading required package: Rcpp
```

```
ricker_sl <- synlik(simulator = rickerSimul,
summaries = rickerStats,
param = c( logR = 3.8, logSigma = log(0.3), logPhi = log(10) ),
extraArgs = list("nObs" = 50, "nBurn" = 50)
)
```

Here:

`rickerSimul`

and`rickerStats`

are functions provided by`synlik`

.`param`

is a named vector that contains the log-parameters that will be used by`rickerSimul(param, nsim, extraArgs, ...)`

.`extraArgs`

contains additional arguments required by`rickerSimul`

, see`?rickerSimul`

for details.

Now we are ready to simulate data from the object:

```
ricker_sl@data <- simulate(ricker_sl, nsim = 1, seed = 54)
```

Here `ricker_sl@data`

is just a vector, but **synlik** allows the simulator to simulate any kind of object, so it is often
necessary encapsulate an adequate plotting function into the object:

```
ricker_sl@plotFun <- function(input, ...) plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
plot(ricker_sl)
```

If we want to simulate several datasets we simply do:

```
tmp <- simulate(ricker_sl, nsim = 10)
dim(tmp)
```

```
## [1] 10 50
```

So far we have just simulated data, not summary statistics. In this particular example `rickerStats`

needs to be passed the reference data in `ricker_sl@data`

in order to be able to calculate the statistics. We can do that by using the slot `extraArgs`

:

```
ricker_sl@extraArgs$obsData <- ricker_sl@data
```

Now we are ready to simulate summary statistics:

```
tmp <- simulate(ricker_sl, nsim = 2, stats = TRUE)
tmp
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.8016316 0.0005366660 2.427694e-05 0.03743647 -0.2248205 36.90 17
## [2,] 0.9022046 0.0003317784 1.959014e-05 0.05564859 -0.2073172 37.38 21
## [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 3180.570 -903.5390 -370.0129 -526.6177 46.91435 -203.1856
## [2,] 3555.596 -863.9911 -930.2048 547.7001 -842.12995 225.4435
```

and to check their approximate normality:

```
checkNorm(ricker_sl)
```

If we want to estimate the value of the synthetic likelihood at a certain location in the parameter space, we can do it by using the function `slik`

as follows:

```
slik(ricker_sl,
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1e3)
```

```
## [1] -20.69029
```

We can also look at slices of this function wrt each parameter:

```
slice(object = ricker_sl,
ranges = list("logR" = seq(3.5, 3.9, by = 0.01),
"logPhi" = seq(2, 2.6, by = 0.01),
"logSigma" = seq(-2, -0.5, by = 0.02)),
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1000)
```

Finally we can have 2D slices with respect to pairs of parameters:

```
slice(object = ricker_sl,
ranges = list("logR" = seq(3.5, 3.9, by = 0.02),
"logPhi" = seq(2, 2.6, by = 0.02)),
pairs = TRUE,
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1000,
multicore = TRUE,
ncores = 2)
```

Notice that here we have used the `multicore`

option, which distributes the computation among different cores or cluster nodes.
Also `slik`

provides this option, but for such a simple model the time needed to set up the cluster is longer than the simulation time.

The unknown model parameters can be estimated by MCMC, using the `smcmc`

function.
Here is an example (you might want to increase `niter`

):

```
ricker_sl <- smcmc(ricker_sl,
initPar = c(3.2, -1, 2.6),
niter = 10,
burn = 3,
priorFun = function(input, ...) sum(input),
propCov = diag(c(0.1, 0.1, 0.1))^2,
nsim = 500)
```

Notice that `priorFun`

returns the log-density of the prior. If we have not reached convergence we can do some more MCMC iterations by using the `continue`

generic:

```
ricker_sl <- continue(ricker_sl, niter = 10)
```

Finally we can plot the MCMC output (here we plot a pre-computed object):

```
data(ricker_smcmc)
addline1 <- function(parNam, ...) abline(h = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3)
addline2 <- function(parNam, ...) abline(v = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3)
plot(ricker_smcmc, addPlot1 = "addline1", addPlot2 = "addline2")
```

```
## [1] "Plotting the MCMC chains"
```

```
## Press <Enter> to see the next plot...
## [1] "The posterior densities"
```

```
## Press <Enter> to see the next plot...
## [1] "Plotting the log-likelihood chain"
```

```
## Press <Enter> to see the next plot...
## [1] "Plotting correlation structure of the posterior sample"
```

```
## Press <Enter> to see the next plot...
```

were we have added the green dotted lines, indicating the position of the true parameters.

As a more challenging example, let us consider the Blowfly model proposed by Wood (2010):
\[
N_{t} = R_t + S_t,
\]
where
\[
R_t \sim \text{Pois}(PN_{t-\tau}e^{-\frac{N_{t-\tau}}{N_0}}e_t),
\]
represents delayed recruitment process, while:
\[
S_t \sim \text{binom}(e^{-\delta\epsilon_t}, N_{t-1}),
\]
denotes the adult survival process. Finally \(e_t\) and \(\epsilon_t\) are independent gamma distributed random variables, with unit means and variances equal to \(\sigma_p^2\) and \(\sigma_d^2\) respectively. We start by creating a `synlik`

object:

```
blow_sl <- synlik(simulator = blowSimul,
summaries = blowStats,
param = log( c( "delta" = 0.16, "P" = 6.5, "N0" = 400,
"var.p" = 0.1, "tau" = 14, "var.d" = 0.1) ),
extraArgs = list("nObs" = 200, "nBurn" = 200, "steps" = 1),
plotFun = function(input, ...){
plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
}
)
```

for more details see `?blow_sl`

. As before we simulate data and we store it back into the object:

```
blow_sl@data <- simulate(blow_sl, seed = 84)
blow_sl@extraArgs$obsData <- blow_sl@data
```

As for the Ricker example, `blowStats`

needs the observed data to calculate the statistics, hence we store it
into `extraArgs$obsData`

. Notice that this is just a peculiarity of the chosen `summaries`

function.
Then, we can estimate the parameters by adaptive MCMC (you might want to increase `niter`

):

```
blow_sl <- smcmc(blow_sl,
initPar = log( c( "delta" = 0.1, "P" = 8, "N0" = 600,
"sig.p" = 0.2, "tau" = 17, "sig.d" = 0.3) ),
niter = 2,
burn = 0,
propCov = diag(rep(0.001, 6)),
nsim = 500,
prior = function(input, ...){
sum(input) +
dunif(input[4], log(0.01), log(1), log = TRUE) +
dunif(input[6], log(0.01), log(1), log = TRUE)
},
targetRate = 0.15,
multicore = FALSE
)
```

We plot the results on the original scale (here we plot a pre-computed object):

```
data(blow_smcmc)
tmpTrans <- rep("exp", 6)
names(tmpTrans) <- names(blow_smcmc@param)
plot(blow_smcmc, trans = tmpTrans)
```

```
## [1] "Plotting the MCMC chains"
```

```
## Press <Enter> to see the next plot...
## [1] "The posterior densities"
```

```
## Press <Enter> to see the next plot...
## [1] "Plotting the log-likelihood chain"
```

```
## Press <Enter> to see the next plot...
## [1] "Plotting correlation structure of the posterior sample"
```

```
## Press <Enter> to see the next plot...
```

The package includes some of Nicholson (1954) experimental datasets, which can be loaded into the object:

```
data(bf1)
blow_sl@data <- bf1$pop
blow_sl@extraArgs$obsData <- blow_sl@data
```

Then it is possible to fit the model using the experimental data by MCMC.

Let us consider a model that does not produce time series data: the alpha-stable distribution, as
described in Nolan (2012). For a quick reference do `library("stableDist")`

and `?rstable`

.
As a first step we need to wrap the function `rstable`

, which simulates random variables from this distribution,
into a function that fits the `synlik`

framework:

```
stableSimul <- function(param, nsim, extraArgs, ...)
{
if( !is.loaded("stabledist") ) library(stabledist)
# Some sanity check
if( !( c("nObs") %in% names(extraArgs) ) ) stop("extraArgs should contain nObs")
nObs <- extraArgs$nObs
stopifnot( length(param) == 4 )
param[c(1, 3)] <- exp(param[c(1, 3)])
if(abs(param[1] - 1) < 0.01) stop("alpha == 1 is not allowed")
# Actual simulation
output <- rstable(nObs * nObs, alpha = param[1], beta = param[2], gamma = param[3], delta = param[4])
return( matrix(output, nsim, nObs) )
}
```

We need also to set up a function that calculates the summary statistics, which in our case are 10 quantiles:

```
stableStats <- function(x, extraArgs, ...){
if (!is.matrix(x)) x <- matrix(x, 1, length(x))
X0 <- t( apply(x, 1, quantile, probs = seq(0.1, 0.9, length.out = 10)) )
unname(X0)
}
```

We then create a `synlik`

object:

```
stable_sl <- synlik( simulator = stableSimul,
summaries = stableStats,
param = c(alpha = log(1.5), beta = 0.1, gamma = log(1), delta = 2),
extraArgs = list("nObs" = 1000),
plotFun = function(input, ...) hist(input, xlab = "x", main = "The data", ...)
)
stable_sl@data <- simulate(stable_sl, seed = 67)
plot(stable_sl)
```

As we see from the histogram, the distribution is quite fat-tailed.
As before, can then estimate the parameters by MCMC (you might want to increase `niter`

):

```
stable_sl <- smcmc(stable_sl,
initPar = c(alpha = log(1.7), beta = -0.1, gamma = log(1.3), delta = 1.5),
niter = 2,
burn = 0,
priorFun = function(input, ...) {
dunif(input[1], log(1), log(2), log = TRUE) +
dunif(input[2], -1, 1, log = TRUE)
},
propCov = diag(c(0.1, 0.1, 0.1, 0.1))^2,
targetRate = 0.25,
nsim = 200)
# plot(stable_sl, trans = c("alpha" = "exp", "gamma" = "exp"))
```

By slicing the likelihood we check whether the model is well identified:

```
slice(object = stable_sl,
ranges = list("alpha" = log(seq(1.2, 1.9, by = 0.05)),
"beta" = seq(-0.5, 0.5, by = 0.05),
"gamma" = log(seq(0.5, 1.9, by = 0.05)),
"alpha" = seq(1.1, 2.2, by = 0.05)),
param = stable_sl@param,
trans = list("alpha" = "exp", "gamma" = "exp"),
nsim = 1000,
multicore = TRUE,
ncores = 2)
```

We probably could do better by putting some more effort in choosing the summary statistics.

Simon N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104, 2010.

Alexander J Nicholson. An outline of the dynamics of animal populations. Australian Journal of Zoology, 2(1):9–65, 1954.

Diethelm Wuertz, Martin Maechler and Rmetrics core team members. (2013). stabledist: Stable Distribution Functions. R package version 0.6–6.

h