Introduction

xtractomatic is an R package developed to subset and extract satellite and other oceanographic related data from a remote server. The program can extract data for a moving point in time along a user-supplied set of longitude, latitude and time points; in a 3D bounding box; or within a polygon (through time). The xtractomatic functions were originally developed for the marine biology tagging community, to match up environmental data available from satellites (sea-surface temperature, sea-surface chlorophyll, sea-surface height, sea-surface salinity, vector winds) to track data from various tagged animals or shiptracks (xtracto). The package has since been extended to include the routines that extract data a 3D bounding box (xtracto_3D) or within a polygon (xtractogon). The xtractomatic package accesses data that are served through the ERDDAP server at the NOAA/SWFSC Environmental Research Division in Santa Cruz, California. In order that the user not be overwhelmed when searching for datasets, only a selected number of the datasets on that ERDDAP server, those viewed as most useful, are supported. The ERDDAP server can also be directly accessed at http://coastwatch.pfeg.noaa.gov/erddap. ERDDAP is a simple to use yet powerful web data service developed by Bob Simons.

This is the third version of xtractomatic. The original version was created by Dave Foley for Matlab and was modified for R by Cindy Bessey and Dave Foley. That version used a precurser to ERDDAP called BobDAP. BobDAP was less flexible and provided access to fewer datasets than does ERDDAP. The second version of xtractomatic was written by Roy Mendelssohn to use ERDDAP rather than BobDAP. This version, also written by Roy Mendelssohn, has many improvements including access to many more datasets, improved speed, better error checking, crude search capabilites, and plotting routines, and is the first version as a true R package.

This version also includes the latest version of the cmocean colormaps, designed by Kristen Thyng (see http://matplotlib.org/cmocean/ and https://github.com/matplotlib/cmocean). These colormaps were initally developed for Python, and a version of the colormaps is used in the oce package by Dan Kelley and Clark Richards, but the colormaps used here are the version as of late 2017. The colormaps are loaded automatically as colors, see str(colors). Several of the examples make use of the cmocean colormaps.

IMPORTANT - If you have used xtractomatic before

The code has undergone a major refactoring, in particular there is no longer separate code to handle bathymetry and other datasets without time. In order to have a cleaner design to go with these changes, the order of the arguments to the functions has changed, “tpos” no longer has to be passed for a dataset without time, and “xlen” and “ylen” have default values (see next section). Also in xtracto() the return now includes the parameter name, so instead of one element being just “mean” it will be “mean sst” for example. The changes in the function arguments makes them closer to that in rerddapXtracto, see next section.

IMPORTANT - Likely last version of xtractomatic

This will likely be the last version of xtractomatic, except for bug fixes or changes needed to stay compatible with new versions of R, or changes in ERDDAP and the applications it uses. Development effort will be put into improving the rerddapXtracto package, with the intent that users will eventually migrate to rerddapXtracto. As mentioned above, the changes in the function arguments will help to simplify this migration.

The Main xtractomatic functions

There are three main data extraction functions in the xtractomatic package:

  • xtracto <- function(dtype, xpos, ypos, tpos = NA, xlen = 0., ylen = 0., verbose=FALSE)

  • xtracto_3D <- function(dtype, xpos, ypos, tpos = NA, verbose=FALSE)

  • xtractogon <- function(dtype, xpos, ypos, tpos = NA, verbose = FALSE)

There are two information functions in the xtractomatic package:

  • `searchData <- function(searchList= “varname:chl”)

  • getInfo <- function(dtype)

The dtype parameter in the data extraction routines specifies a combination of which dataset on the ERDDAP server to access, and as well as which parameter from that dataset. The first sections demonstrate how to use these functions in realistic settings. The Selecting a dataset section shows how to find the dtype or other parameters of interest from the available datasets.

Setting up

xtractomatic uses the httr, ncdf4, readr and sp packages, and these packages must be installed first or xtractomatic will fail to install.

install.packages("httr", dependencies = TRUE)
install.packages("ncdf4",dependencies = TRUE) 
install.packages("readr", dependencies = TRUE)
install.packages("sp", dependencies = TRUE)

The xtractomatic package is available from CRAN and can be installed by:

install.packages("xtractomatic")

or the development version is available from [Github](https://github.com/rmendels/xtractomatic and can be installed from Github,

install.packages("devtools")
devtools::install_github("rmendels/xtractomatic")

Once installed, to use xtractomatic do

library("xtractomatic")

If the other libraries (httr, ncdf4, readr and sp) have been installed they will be found and do not need to be explicitly loaded.

Using the R code examples

Besides the xtractomatic packages, the examples below depend on DT, ggplot2, lubridate, mapdata, and reshape2. These can be loaded beforehand (assuming they have been installed):

library("DT")
library("ggplot2")
library("lubridate")
library("mapdata")
library("reshape2")

In order that the code snippets can be more stand-alone, the needed libraries are always required in the examples.

It should be emphasized that these other packages are used to manipulate and plot the data in R, other packages could be used as well. The use of xtractomatic does not depend on these other packages.

There are also several R functions defined within the document that are used in other code examples. These include chlaAvg, upwell, and plotUpwell.

Getting Started

An xtracto example

In this section we extract data along a trackline found in the Marlintag38606 dataset, which is the track of a tagged marlin in the Pacific Ocean (courtesy of Dr. Mike Musyl of the Pelagic Research Group LLC), and show some simple plots of the extracted data.

The Marlintag38606 dataset is loaded when you load the xtractomatic library. The structure of this dataframe is shown by:

require(xtractomatic)
str(Marlintag38606)
#> 'data.frame':    152 obs. of  7 variables:
#>  $ date  : Date, format: "2003-04-23" "2003-04-24" ...
#>  $ lon   : num  204 204 204 204 204 ...
#>  $ lat   : num  19.7 19.8 20.4 20.3 20.3 ...
#>  $ lowLon: num  204 204 204 204 204 ...
#>  $ higLon: num  204 204 204 204 204 ...
#>  $ lowLat: num  19.7 18.8 18.8 18.9 18.9 ...
#>  $ higLat: num  19.7 20.9 21.9 21.7 21.7 ...

The Marlintag38606 dataset consists of three variables, the date, longitude and latitude of the tagged fish, which are used as the xpos, ypos and tpos arguments in xtracto. The parameters xlen and ylen specify the spatial “radius” (actually a box, not a circle) around the point to search in for existing data. These can also be input as a time-dependent vector of values, but here a set value of 0.2 degrees is used. The example extracts SeaWiFS chlorophyll data, using a dataset that is an 8 day composite. This dataset is specified by the dtype of "swchla8day" in the xtracto function. In a later section it is explained how to access different satellite datasets other than the one shown here.

require(xtractomatic)

# First we will copy the Marlintag38606 data into a variable 
# called tagData  so that subsequent code will be more generic.  

tagData <- Marlintag38606
xpos <- tagData$lon
ypos <- tagData$lat
tpos <- tagData$date
swchl <- xtracto("swchla8day", xpos, ypos, tpos = tpos, , xlen = .2, ylen = .2)

This command can take a while to run depending on your internet speed and how busy is the server. With a fairly decent internet connection it took ~25 seconds. The default is to run in quiet mode, if you would prefer to see output, as confirmation that the script is running, use verbose=TRUE

When the extraction has completed the data frame swchl will contain as many columns as data points in the input file (in this case 152) and will have 11 columns:

str(swchl)
#> 'data.frame':    152 obs. of  11 variables:
#>  $ mean chlorophyll  : num  0.073 NaN 0.074 0.0653 0.0403 ...
#>  $ stdev chlorophyll : num  NA NA 0.00709 0.00768 0.02278 ...
#>  $ n                 : int  1 0 16 4 7 9 4 3 0 6 ...
#>  $ satellite date    : chr  "2003-04-19" "2003-04-27" "2003-04-27" "2003-04-27" ...
#>  $ requested lon min : num  204 204 204 204 204 ...
#>  $ requested lon max : num  204 204 204 204 204 ...
#>  $ requested lat min : num  19.6 19.7 20.3 20.2 20.2 ...
#>  $ requested lat max : num  19.8 19.9 20.5 20.4 20.4 ...
#>  $ requested date    : num  12165 12166 12172 12173 12174 ...
#>  $ median chlorophyll: num  0.073 NA 0.073 0.0645 0.031 ...
#>  $ mad chlorophyll   : num  0 NA 0.00297 0.00741 0.0089 ...

Plotting the results

We plot the track line with the locations colored according to the mean of the satellite chlorophyll around that point. Positions where there was a tag location but no chlorophyll values are also shown.

require(ggplot2)
#> Loading required package: ggplot2
require(mapdata)
#> Loading required package: mapdata
#> Loading required package: maps
# First combine the two dataframes (the input and the output) into one, 
# so it will be easy to take into account the locations that didn’t 
# retrieve a value.

alldata <- cbind(swchl, tagData)

# adjust the longitudes to be (-180,180)
alldata$lon <- alldata$lon - 360
# Create a variable that shows if chla is missing
alldata$missing <- is.na(alldata$mean) * 1
colnames(alldata)[1] <- 'mean'
# set limits of the map
ylim <- c(15, 30)
xlim <- c(-160, -105)
# get outline data for map
w <- map_data("worldHires", ylim = ylim, xlim = xlim)
# plot using ggplot
myColor <- colors$algae
z <- ggplot(alldata,aes(x = lon, y = lat)) + 
   geom_point(aes(colour = mean, shape = factor(missing)), size = 2.) + 
   scale_shape_manual(values = c(19, 1))
z + geom_polygon(data = w, aes(x = long, y = lat, group = group), fill = "grey80") + 
  theme_bw() + 
  scale_colour_gradientn(colours = myColor, limits = c(0., 0.32), "Chla") + 
  coord_fixed(1.3, xlim = xlim, ylim = ylim) + ggtitle("Mean chla values at marlin tag locations")

Comparing with the median

The xtracto routine extracts the environment data in a box (defined by xlen and ylen) around the given location, and calculates statistics of the data in that box. The median chla concentration may make more sense in this situation

require(ggplot2)
colnames(alldata)[10] <- 'median'
# plot using ggplot
myColor <- colors$algae
z <- ggplot(alldata,aes(x = lon,y = lat)) + 
   geom_point(aes(colour = median,shape = factor(missing)), size = 2.) + 
  scale_shape_manual(values = c(19, 1))
z + geom_polygon(data = w, aes(x = long, y = lat, group = group), fill = "grey80") + 
  theme_bw() + 
  scale_colour_gradientn(colours = myColor, limits = c(0., 0.32), "Chla") + 
  coord_fixed(1.3, xlim = xlim, ylim = ylim) + ggtitle(" Median chla values at marlin tag locations")