foieGras-basics

2019-10-07

Disclaimer

This vignette is an extended set of examples to highlight the foieGras package’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!

foieGras models

This vignette provides a (very) brief overview of how to use foieGras to filter animal track locations obtained via the Argos satellite system or via processed light-level geolocation (GLS). foieGras provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):

Both models are continuous-time models, that is, they account for time intervals between successive observations, thereby naturally accounting for the irregularly-timed nature of most Argos data. We won’t dwell on the details of the models here, those will come in a future paper, except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.

Additionally, foieGras provides fast models (mpm, jmpm) for estimating a behavioural index along animals’ tracks (see Jonsen et al. 2019 Ecology 100:e02566 for details). The mpm is fit to individual tracks, whereas the jmpm is fit to multiple individual track simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model can often better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements.

input data

foieGras expects data to be provided in one of several possible formats.

  1. a data.frame or tibble that looks like this
#> # A tibble: 6 x 8
#>   id    date                lc      lon   lat  smaj  smin   eor
#>   <chr> <dttm>              <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1      111. -66.4  2442   416    42
#> 2 54591 2012-03-06 04:55:14 0      110. -66.4 49660   391    90
#> 3 54591 2012-03-07 04:23:10 A      110. -66.5  5032   472    93
#> 4 54591 2012-03-07 21:23:06 A      110. -66.4  4007   286   116
#> 5 54591 2012-03-09 04:27:49 B      110. -66.5 13063   956    82
#> 6 54591 2012-03-10 00:10:41 A      111. -66.4  5099   478    79

where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.

  1. a data.frame or tibble that looks like this
#> # A tibble: 6 x 5
#>   id        date                lc      lon   lat
#>   <chr>     <dttm>              <fct> <dbl> <dbl>
#> 1 ct36-F-09 2009-02-10 19:42:44 A      70.6 -49.7
#> 2 ct36-F-09 2009-02-11 07:56:36 A      70.2 -50.2
#> 3 ct36-F-09 2009-02-12 01:53:07 A      70.1 -51.1
#> 4 ct36-F-09 2009-02-12 19:06:55 B      69.5 -52.0
#> 5 ct36-F-09 2009-02-13 12:13:19 B      71.0 -53.1
#> 6 ct36-F-09 2009-02-14 01:10:58 B      70.1 -53.4

where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.

  1. a data.frame or tibble that includes observations with missing KF error ellipse information
#> # A tibble: 6 x 8
#>   id    date                lc      lon   lat  smaj  smin   eor
#>   <chr> <dttm>              <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1      111. -66.4  2442   416    42
#> 2 54591 2012-03-06 04:55:14 0      110. -66.4 49660   391    90
#> 3 54591 2012-03-07 04:23:10 A      110. -66.5    NA    NA    NA
#> 4 54591 2012-03-07 21:23:06 A      110. -66.4    NA    NA    NA
#> 5 54591 2012-03-09 04:27:49 B      110. -66.5    NA    NA    NA
#> 6 54591 2012-03-10 00:10:41 A      111. -66.4  5099   478    79

in this situation, foieGras treats observations with missing error ellipse information as though they are LS-based observations.

  1. an sf object where observations have any of the previous 3 structures and also include CRS information
#> Simple feature collection with 6 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 2426.138 ymin: -912.3109 xmax: 2437.96 ymax: -903.7763
#> epsg (SRID):    NA
#> proj4string:    +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs
#> # A tibble: 6 x 7
#>   id    date                lc     smaj  smin   eor             geometry
#>   <chr> <dttm>              <chr> <dbl> <dbl> <dbl>         <POINT [km]>
#> 1 54591 2012-03-05 05:09:33 1      2442   416    42 (2430.943 -912.3109)
#> 2 54591 2012-03-06 04:55:14 0     49660   391    90  (2437.96 -903.7763)
#> 3 54591 2012-03-07 04:23:10 A      5032   472    93 (2429.753 -907.3739)
#> 4 54591 2012-03-07 21:23:06 A      4007   286   116 (2437.367 -905.2335)
#> 5 54591 2012-03-09 04:27:49 B     13063   956    82 (2426.138 -905.8045)
#> 6 54591 2012-03-10 00:10:41 A      5099   478    79 (2431.234 -909.0692)
  1. a data.frame, tibble or sf object where processed GLS data are provided and include longitude and latitude error SD’s (in degrees)
#> # A tibble: 5 x 7
#>      id date                lc      lon   lat lonerr laterr
#>   <dbl> <dttm>              <chr> <dbl> <dbl>  <dbl>  <dbl>
#> 1 54632 2019-10-07 12:37:03 G      100    -55 0.191   0.179
#> 2 54632 2019-10-08 00:37:03 G      100.   -54 0.731   1.78 
#> 3 54632 2019-10-08 12:37:03 G      101    -53 0.328   4.09 
#> 4 54632 2019-10-09 00:37:03 G      102.   -52 0.793   0.359
#> 5 54632 2019-10-09 12:37:03 G      102    -51 0.0566  2.94

fitting a foieGras model

model fitting for quality control of locations is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see ?foieGras::prefilter for details), including applying the argsofilter::sdafilter to identify extreme outlier observations. Additionally, if the input data are not supplied as an sf object, prefilter guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the fit_ssm function:

## load foieGras example data
data(ellie)
## prefilter and fit Random Walk SSM using a 24 h time step
fit <- fit_ssm(ellie, model = "rw", time.step = 24, verbose = 0)
#> Warning in sqrt(as.numeric(object$diag.cov.random)): NaNs produced

these are the minimum arguments required: the input data, the model (“rw” or “crw”) and the time.step (in h) to which locations are predicted. Additional control can be exerted over the prefiltering step, via the vmax, ang, distlim, spdf and min.dt arguments. see ?foieGras::fit_ssm for details, the defaults for these arguments are quite conservative, usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted but these should rarely need to be accessed by users and will not be dealt with here.

accessing and visualizing model fit objects

Simple summary information about the foieGras fit can be obtained by calling the fit object:

fit$ssm[[1]]
#> Process model: rw 
#> Time interval: 24 hours 
#> number of observations: 64 
#> number of regularised state estimates: 113 
#> 
#> parameter estimates
#> -------------------
#>       Estimate Std. Error
#> rho_p   -0.848      0.037
#> sigma  103.777      9.283
#> sigma   89.944      8.026
#> rho_o    0.000      0.000
#> tau      1.000      0.000
#> tau      1.000      0.000
#> psi      1.007        NaN
#> -------------------
#> negative log-likelihood: 732.1398 
#> convergence: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH

and a summary plot method allows a quick visual of the SSM fit to the data:

# plot time-series of the predicted values
plot(fit, what = "predicted", type = 1)

plot(fit, what = "fitted", type = 2)

The predicted values (red) are the state estimates predicted at regular time intervals, specified by time.step (here every 24 h). These estimates are plotted on top of the observations that passed the prefilter stage (blue points and blue rug at bottom). Fitted values are the state estimates corresponding to the time of each observation; their time series are plotted by default - plot(fit). A 2-D time series plot of the track is invoked by the argument type = 2.

As SSMs are latent variable models, evaluating their goodness of fit is less straightforward than simpler statistical models without latent variables. We can use One-Step-Ahead (prediction) residuals via foieGras::osar. Here we use osar to compare SSM fits of the rw and crw model to the same example southern elephant seal data.


## fit crw SSM
fitc <- fit_ssm(ellie, model = "crw", time.step = 24, verbose = 0)

## calculate OSA resids for both models
fit_res <- osar(fit)
fitc_res <- osar(fitc)

## plot residuals
plot(fit_res)

plot(fitc_res)

The crw model appears provide a better fit than the rw model, with standardized OSA residuals conforming more closely to a theoretical Normal distribution. One note of caution when calculating OSA residuals, the underlying TMB::oneStepPredict method is currently experimental and can require considerable computation time, especially when calculating across multiple individual fits.

Estimated tracks can be mapped using the fmap function, which uses the foieGras-applied projection (Global Mercator). Projections can be changed easily via the crs argument in the form of a proj4string (as in the example, below).

## map ssm-predicted values without observations
fmap(fitc, what = "predicted", obs = FALSE)


## change projection to Antarctic Polar Stereographic centred on 
##  the approximate mid-point of the track
fmap(fitc, what = "predicted", 
     crs = "+proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=85 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=km +no_defs")

The estimated locations can be accessed for further analysis, custom mapping, etc… by using the grab function. They can be returned as a projected sf tibble or as a simple unprojected tibble. Note, that for all foieGras outputs the x, y, x.se and y.se units are in km.

## grab fitted locations from fit object as a projected sf object 
plocs_sf <- grab(fitc, what = "f")

## grab predicted locations in unprojected form, returning as a tibble
plocs <- grab(fitc, "p", as_sf = FALSE)

## unprojected form looks like this
plocs
#> # A tibble: 113 x 12
#>    id    date                  lon   lat      x      y     x.se     y.se
#>    <chr> <dttm>              <dbl> <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
#>  1 54591 2012-03-05 05:00:00  111. -66.4 12309. -9956. 10.00e-6 10.00e-6
#>  2 54591 2012-03-06 05:00:00  111. -66.4 12303. -9947.  1.31e+1  7.57e-3
#>  3 54591 2012-03-07 05:00:00  110. -66.5 12297. -9963.  3.46e+0  2.34e-1
#>  4 54591 2012-03-08 05:00:00  110. -66.4 12286. -9946.  6.06e+0  5.28e+0
#>  5 54591 2012-03-09 05:00:00  110. -66.5 12296. -9973.  7.52e+0  1.07e+0
#>  6 54591 2012-03-10 05:00:00  111. -66.4 12305. -9956.  4.70e+0  2.83e+0
#>  7 54591 2012-03-11 05:00:00  111. -66.5 12328. -9970.  3.62e+0  3.07e+0
#>  8 54591 2012-03-12 05:00:00  111. -66.5 12306. -9967.  1.54e+1  3.65e+0
#>  9 54591 2012-03-13 05:00:00  110. -66.4 12291. -9953.  5.97e+0  4.87e+0
#> 10 54591 2012-03-14 05:00:00  110. -66.4 12290. -9939.  4.76e+0  4.40e+0
#> # … with 103 more rows, and 4 more variables: u <dbl>, v <dbl>,
#> #   u.se <dbl>, v.se <dbl>

fit_ssm can be applied to single tracks as shown, it can also fit to multiple individual tracks in a single input tibble opr data.frame. The SSM is fit to each individual separately. The resulting output is a compound tibble with rows corresponding to each individual foieGras fit object. The converged column indicates whether each model fit converged successfully.

# load 2 southern elephant seal example data
data(ellies)

fit2 <- fit_ssm(ellies, vmax = 10, model = "crw", time.step = 48, verbose = 0)

# list fit outcomes for both seals
fit2
#> # A tibble: 2 x 3
#>   id         ssm    converged
#>   <chr>      <list> <lgl>    
#> 1 ct36-F-09  <ssm>  TRUE     
#> 2 ct96-16-13 <ssm>  TRUE

individual id is displayed in the 1st column, all fit output (ssm) in the 2nd column, and convergence status of each model fit is displayed in the 3rd column

fmap automatically handles ssm fit objects with multiple individuals, plotting all on a single map

## map predicted values and observations
fmap(fit2, "p", obs = TRUE)

A behavioural index can be estimated from locations provided they occur regularly in time and they either have minimal location error (i.e. GPS data) or they have been ssm filtered. We can fit the mpm to foieGras ssm-predicted locations. Here we use the ssm fits to the s. elephant seal data

## fit mpm separately to each individual track
fmp <- fit2 %>% 
  grab(., "p", as_sf = FALSE) %>%
  select(id, date, lon, lat) %>%
  fit_mpm(., model = "mpm")
#> 
#> fitting mpm...

fmp
#> # A tibble: 2 x 3
#>   id         mpm    converged
#>   <chr>      <list> <lgl>    
#> 1 ct36-F-09  <mpm>  TRUE     
#> 2 ct96-16-13 <mpm>  TRUE

We can visualize the estimated behavioural index (move persistence) as a time series for each seal. The move persistence parameter \(g_t\) ranges continuously from 0 (little persistence, indicative of area-restricted movements) to 1 (high persistence, indicative of directed movements).

## plot mpm estimates by individual seal
grab(fmp, "fitted") %>% 
  ggplot() +
  geom_point(aes(date, g, colour = g)) +
  scale_colour_viridis_c(limits = c(0,1)) +
  ylim(0,1) +
  facet_wrap(~ id, scales = "free_x", ncol = 1)

A joint move persistence model jmpm is also available for fitting to multiple individuals. This model fits to all individuals simultaneously, estimating a joint random walk variance parameter that can often better resolve subtle variations in \(g_t\).

We can explore the spatio-temporal variation in movement behaviour by plotting the \(g_t\) values along each seal’s track, but first we have to merge the ssm-predicted locations with the move persistence estimates using foieGras::join()


## join ssm predicted locations and move persistence values together
fmp_locs <- join(fit2, fmp, as_sf = FALSE)

ggplot(fmp_locs) +
  geom_point(aes(lon, lat, colour = g)) +
  scale_colour_viridis_c(limits = c(0,1))