# Blink and Artifact Cleanup (Detailed)

#### 2020-03-09

This vignette contains detailed information regarding the automatic and manual cleanup of blinks and artifacts in the pupil size data. The package contains functions which are meant to provide cleaning that is reproducible and that can be done across multiple sessions.

## Automatic cleanup

There are two functions (clean_blink and clean_artifact) for detecting and removing artifacts related to blinks and other eye movement. The functions, which can be used independently or sequentially, implement differential and distributional detection algorithms (explained in more detail below). The detected data points are then marked with NA. In both cases, the algorithm can be adjusted using parameters which control the scope and sensitivity of the detection. Note that the two functions can either be used in conjunction with one another sequentially, or completely independently.
The example below shows how to use them sequentially

Here will we will focus on Event 16892.8, seen in the figure below.
There is one marked blink (the first) and one un-marked artifact (the second).

### Verifying the cleanup

Because clean_blink is an automatic cleaning function, users may want to verify the effect of the cleaning. This may be helpful in determining if cleaning was effective, or to selective revert events for which cleaning was overly aggressive. The function verify_cleanup_app opens and interactive app and loads the desired log file. The user can quickly scan through the events, and if desired, can click the Revert Event Cleanup button to revert the event back to its most recent previous state (prior to running the cleanup function). Upon clicking the revert button, the status of the event in the log file will be changed and the file will be rewritten into the working directory.

verify_cleanup_app(datblink, LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))

Importantly, this app only modifies the entry in the log file, not the pupil size data in the data frame. This is to provide both a record of changes as well as control over the processing. In order to carry out any modifications to the cleanup, it is necessary to use the function apply_cleanup_change. As the changes are based solely on the log file, the specific filename must be provided.

datblink <- apply_cleanup_change(datblink, LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))
## Loading file /tmp/RtmpBoEzMG/BlinkCleanupLog.rds
## Changing cleanup based on  /tmp/RtmpBoEzMG/BlinkCleanupLog.rds

### Unmarked artifacts

Not all artifacts related to blinks are automatically detected by the SR eyetracking software, see figure above. To detect these unmarked blinks and other artifacts, a second function, clean_artifact, is provided.
It implements a distributional method (described in more detail below) to detect potentially extreme data points.

This algorithm first divides the times series into windows, the size of which is specified in milliseconds using MADWindow. Within each window the median absolute deviation (MAD) of the pupil size data is calculated. This is used to detect which windows contain extreme variability (potentially containing outliers). This is determined based on the value provided in MADConstant, which controls the sensitivity threshold. The higher the constant the more extreme value is needed to trigger cleaning.

Next the identified extreme windows have padding added around them using MADPadding (again in milliseconds). Within this padded window, a multidimensional distributional distance (specifically Mahalanobis distance) is calculated. This distance can be calculated using one of two methods: Basic or Robust.

The Basic method uses the standard Mahalanobis distance and the Robust uses a robust version of the Mahalanobis distance. The latter is based on Minimum Covariance Determinant (as implemented in the package robustbase), which uses a sampling method for determining multivariate location and scatter. Both the basic and robust calculations are based on multiple variables covarying with pupil size. By default, the calcuation uses the following columns: Pupil, Velocity_Y, and Acceleration_Y. However, the parameter XandY can be set to TRUE in which case the calculation will additionally include the X-axis: Velocity_X and Acceleration_X. This is intended to capture potential horizontal eye movement affecting pupil size. N.B., missing values are automatically excluded from the distance calculation.

The function will inform the user if a particular window is skipped as there are safeguards built in which will skip a given window if: 1) there are not enough data points or 2) there are not enough columns with non-zero data to estimate covariance. To determine whether a given pupil size is extreme, the argument MahaConstant is used to set the sensitivity. The default value of the parameter is 2 (standard deviations). The higher the constant, the more extreme value of the parameter is needed to trigger cleaning.

Lastly, this function can optionally perform a second pass (setting Second to TRUE), which is identical to the second pass in clean_blink. This attempts to identify remaining data points or islands of data points (small runs surrounded by NAs) which remain.
The arguments MaxValueRun and NAsAroundRun are identical in function and meaning. As with clean_blink, the argument LogFile specifies the path and name of the file into which information about the cleaning status is written. Again, please note that for the purposes of this vignette we are saving the file into the vignette’s temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "ArtifactCleanupLog.rds".

datart <- clean_artifact(datblink, MADWindow = 100, MADConstant = 2,
MADPadding = c(200, 200), MahaConstant = 2,
Method = "Robust", XandY = TRUE, Second = T,
MaxValueRun = 5, NAsAroundRun = c(2,2),
LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))
##
## Running cleanup based on MAD (median absolute deviation) and robust Mahalanobis distance.
##
## Writing cleanup info to /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds.

Looking, yet again, at Event 16892.8, we can see that the function clean_artifact successfully detected and partially cleaned the un-marked artifact, using the default values. Below, we will describe in detail how to manually clean the remainder of the artifact using the functionality provided in the package.

### Verifying the cleanup, part II

Again, because clean_artifact is an automatic cleaning function, users may want to verify the effect of the cleaning. It is possible that this automated cleaning procedure, depending on the parameters specified, can remove more or less data points that appear to be “good”. This is part and parcel of automatic detection and cleaning. However, the algorithm is designed to detect extreme values based on the data in as targeted a way as possible. Based on our experience and testing, we have set default values which perform well in most scenarios.

The function verify_cleanup_app can be used to scan through the events, and if desired, revert an event back to its most recent previous state (in this case to the state of the data in data frame dat4a, i.e., prior to having run clean_artifact, but after having run clean_blink).

verify_cleanup_app(datart, LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))

Again, any changes made using verify_cleanup_app must be applied to the data using apply_cleanup_change and specifying “ArtifactCleanupLog” as the log file.

datart <- apply_cleanup_change(datart, LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))
## Loading file /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds
## Changing cleanup based on  /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds

### Further evaluation of the cleanup

In order to help evaluate the results of the cleanup, we provide a data visualization tool that explicitly displays the difference in the pupil data before and after carrying out the automatic cleanup. The function plot_compare_app opens an interactive Shiny app for viewing the results of the cleanup. It plots each event and shows which data points are now different. Additionally, it states how much of the data (as a percentage) is missing, i.e., was removed by the cleaning procedure.

plot_compare_app(datart)

Alternatively, the function compare_summary produces a summary output of the comparison by Event. The data can be returned by setting the argument ReturnData = TRUE.

compare_summary(datart)
## There are 4 events with differences between Pupil and Pupil_Previous.
## Set ReturnData to TRUE to output full information.

## Manual cleanup

Automatic cleanup may not capture and clean all artifacts. Thus a manual cleanup function (user_cleanup_app) is provided. This function opens an interactive Shiny app for viewing Events and specifying which data points to remove. Data can be removed either by specifying a point in time (i.e., removing one specific data point) or a range of time (i.e., removing a sequence of data points), or any combination of these. For example, type in 1550 to remove a data point at time 1550 ms; type in 1600:1700 to remove all data points between 1600 ms and 1700 ms (inclusive); or type in 1550, 1600:1700 to remove the data point at 1550 ms and all data points between 1600 ms and 1700 ms (inclusive).

The user-specified data points are saved into a log file. The path and filename are specified in LogFile. Again, please note that for the purposes of this vignette we are saving the file into the vignette’s temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "UserCleanupLog.rds".

This allows the user to clean part of the data in one session, and return to cleaning at later point. The function will read the log file from the working directory the next time the app is opened. Additionally, this log file ensures that the manual preprocessing step can be repeated if necessary as long as the log file exists. In the example below we will finish cleaning Event 16892.8.

user_cleanup_app(datart, LogFile = paste0(tempdir(),"/UserCleanupLog.rds"))

Here is a brief example of how the Shiny app works.

1. Open the app specifying the data set you would like to clean.
2. Navigate to a specific event using the dropdown menu or the “Previous”/“Next”" buttons.
3. Once, the event is selected you will see it plotted in left-most panel (“Original”).
4. In the Time points box input the data points that require cleaning. In this example, we would like to remove all data points between 1835 ms and 1995 ms, so we will enter a range of values 1835:1995. The selected data points will be highlighted in red in the middle panel (“Preview”).
5. When you are satified with the result, press the “Commit Current Event Cleanup” button to save the cleanup information to the log file.
6. If you make a mistake or would like to re-do a particular event cleanup, you can press the “Reset Current Event Cleanup” button.
7. When cleaning is complete, press the “Exit and Save” button.

Note that, while the example above uses the cleanup app to further clean the data already processed with the automatic cleaning functions, the app can be used completely independently (i.e., without first doing automatic cleanup) if the user wishes to manually clean all events.

Once manual cleanup is done, the user must apply the cleanup to the data based on the contents of the log file. This is done with the function apply_user_cleanup. Note that it is also possible to visualize the results of the applied manual cleanup using plot_compare_app.

datclean <- apply_user_cleanup(datart, LogFile = paste0(tempdir(),"/UserCleanupLog.rds"))
## Loading file /tmp/RtmpBoEzMG/UserCleanupLog.rds
## Applying cleanup based on  /tmp/RtmpBoEzMG/UserCleanupLog.rds

Looking, one last time, at Event 16892.8, we can see that both the marked blink and the un-marked artifact are now both fully cleaned.

## Proceed with preprocessing

At this point it is possible to proceed with preprocessing as usual. Please refer back to the Basic Preprocessing vignette and continue by removing events with sparse data.