The compicc package is intended for practicioners in a wide range of fields, most notably psychology, medicine, and sports science. It allows users to compare the reliability of two measurement systems or one system at two different time points. Specifically, the functions calculate a (1-\(\alpha\))% confidence interval (CI) for the difference between two intraclass correlation coefficients (ICCs). These methods were first proposed by Ramasundarahettige et al. (2009). For example, one could compare the reliability of two different medical practicioners’ measurements on patients’ shoulder mobility in degrees of rotation (deVet et al. 2011).

There are two functions in compicc, `dep_ci()`

and `indep_ci()`

. The `dep_ci()`

function calculates the difference of ICCs for the dependent case, when the two samples consist of the same subjects. On the other hand, the `indep_ci()`

function calculates the difference of ICCs for the independent case, when the two samples consist of different subjects.

In addition, the package contains two sets of two dataframes (four total dataframes) to be used as demonstrations of the functions’ capabilities. These dataframes are titled `dep_df1`

and `dep_df2`

, as well as `indep_df1`

and `indep_df2`

. In Section 3, this document will call the package’s functions with these sets of dataframes and interpret the results for instructional purposes.

To start, the package is loaded with the code:

`library(compicc)`

The compicc package includes two different functions: `dep_ci()`

and `indep_ci()`

. Determining which function to use depends on whether the same set of subjects or a different set of subjects were tested in each dataset being compared.

Dependent data refers to the scenario in which the two dataframes consist of the same set of subjects. For example, observations in row 1 of the first dataset are taken from the same subject as observations in row 1 of the second dataset. This must hold true for every row of data, so the observations between datasets are “matched.”

The `dep_ci()`

function is called with three arguments: `data1`

, `data2`

and `conf_level`

. The arguments `data1`

and `data2`

refer to the two different datasets. The argument `conf_level`

refers to the confidence level of the confidence interval. This value defaults to 0.95 when not defined by the user, representing a 95% confidence interval.

The function returns three values:

`data1ICC`

: ICC of data1`data2ICC`

: ICC of data2`confidenceIntervalDifference`

: dataframe with the lower bound and upper bound of the confidence interval for the difference of the ICC of data1 and data2`confidenceIntervalDifference$lowerBound`

: lower bound of confidence interval`confidenceIntervalDifference$upperBound`

: upper bound of confidence interval

The confidence interval represents the interval for the difference *ICC(data1) - ICC(data2)*.

Independent data refers to the scenario in which the two dataframes consist of entirely different sets of subjects. This means there are no subjects with scores in the first dataframe and the second dataframe.

The `indep_ci()`

function is called with three arguments: `data1`

, `data2`

and `conf_level`

. The arguments `data1`

and `data2`

refer to the two different datasets. The argument `conf_level`

refers to the confidence level of the confidence interval. This value defaults to 0.95 when not defined by the user, representing a 95% confidence interval.

The function returns three values:

`data1ICC`

: ICC of data1`data2ICC`

: ICC of data2`confidenceIntervalDifference`

: dataframe with the lower bound and upper bound of the confidence interval for the difference between the ICC of data1 and data2`confidenceIntervalDifference$lowerBound`

: lower bound of confidence interval`confidenceIntervalDifference$upperBound`

: upper bound of confidence interval

The confidence interval represents the interval for the difference *ICC(data1) - ICC(data2)*.

The compicc package contains four datasets so that the user may work through examples of the functions within the package. `dep_df1`

and `dep_df2`

are the dataframes for the dependent case, and `indep_df1`

and `indep_df2`

are the dataframes for the independent case.

First, consider the two dependent dataframes, `dep_df1`

and `dep_df2`

. Both consist of simulated data of four trials of measurements for 100 subjects. The data represents a hypothetical score assigned to the subjects, where the overall mean score is zero. Such an example could be the measurements of subjects by a sensor at two times, and the user is looking to quantify how the sensor’s reliability has changed over time. Each dataframe contains the same 100 subjects (paired observations), meaning the dataframes are dependent. The dataframes are in wide format, meaning each row represents the measurements of one subject across four trials split column-wise. This format is required for all functions in the compicc package to run. Displayed below are the first few rows of `dep_df1`

to show the proper formatting of the datasets.

```
## Trial 1 Trial 2 Trial 3 Trial 4
## 1 2.7908151 2.9914997 3.1671994 3.7316656
## 2 -0.3696103 -0.9891107 -1.7069014 -1.3998243
## 3 0.1631003 -1.0879259 -1.6040934 -1.1937224
## 4 0.2611442 0.6936286 1.1886768 2.3882749
## 5 -1.1405719 -1.5913244 -1.4706451 -0.8625495
## 6 -0.1340875 -0.5238328 0.2003782 -0.6403733
```

In contrast, `indep_df1`

and `indep_df2`

consist of simulated data of four trials of measurements for 100 and 80 subjects each, respectively. In this case, consider the situation where the 100 subjects tested in the first dataframe are different than the 80 subjects tested in the second dataframe (i.e. the two samples are independent or non-overlapping). An example of this application is each dataframe containing measurements from one rater, and the user is interested in comparing the reliability of scores from the rater across dataframes. Like the dependent case, both dataframes are in wide format with rows representing the subjects and columns representing the trials.

**Note**: The number of subjects is equal in both dataframes in the dependent case. This is required, since every subject of each dataframe must be found in the other dataframe. This is not the case in the independent dataframes: since the two independent dataframes have completely different sets of subjects, they are allowed to have a different number of subjects/rows.

This package provides easily accessible data with the included datasets. The purpose of this is to demonstrate the functions’ outputs and interpretation. Both the dependent and independent case are used as examples.

This section provides an example of the usage of the `dep_ci()`

function with the package’s provided datasets.

The arguments of the `dep_ci()`

function are:

`data1`

: the first dataframe (must be in wide format)`data2`

: the second dataframe (must be in wide format)`conf_level`

: confidence level of the confidence interval (defaults to 0.95 when not specified)

For example, the following code computes a 95% confidence interval for the difference between the ICC of `dep_df1`

and `dep_df2`

, storing the output in a variable called *result*:

`<- dep_ci(dep_df1, dep_df2) result `

The function yields the following output:

- ICC of dep_df1 =
`result$data1ICC`

= 0.795187 - ICC of dep_df2 =
`result$data2ICC`

= 0.6377282 - Confidence interval for the difference between dep_df1 ICC and dep_df2 ICC =
`result$confidenceIntervalDifference`

- Lower bound of confidence interval = result$confidenceIntervalDifference$lowerBound = 0.0848324
- Upper bound of confidence interval = result$confidenceIntervalDifference$upperBound = 0.2329263

In this case, we are 95% confident that the true difference between the ICC of dep_df1 and the ICC of dep_df2 lies in the interval (0.0848324, 0.2329263). Since the interval is strictly positive and does not include zero, we have enough evidence to conclude that the true ICCs of the sensor at the initial and final timea are not equal. This means the reliability of the measurements of the sensor has changed over time.

This section provides an example of the usage of the `indep_ci()`

function with the package’s provided datasets.

The inputs of the `indep_ci()`

function are:

`data1`

: the first dataframe (must be in wide format)`data2`

: the second dataframe (must be in wide format)`conf_level`

: confidence level of the confidence interval (defaults to 0.95 when not specified)

For example, the following code computes the 90% confidence interval for the difference between the ICC of `indep_df1`

and `indep_df2`

, storing the output in a variable called *result2*:

`<- indep_ci(indep_df1, indep_df2, conf_level = 0.9) result2 `

The function yields the following output:

- ICC of indep_df1 =
`result2$data1ICC`

= 0.6624369 - ICC of indep_df2 =
`result2$data2ICC`

= 0.6913065 - Confidence interval for the difference between indep_df1 ICC and indep_df2 ICC =
`result2$confidenceIntervalDifference`

- Lower bound of confidence interval =
`result2$confidenceIntervalDifference$lowerBound`

= -0.1268077 - Upper bound of confidence interval =
`result2$confidenceIntervalDifference$upperBound`

= 0.0702803

- Lower bound of confidence interval =

In this case, we are 90% confident that the true difference between the ICC of `indep_df1`

and the ICC of `indep_df2`

lies in the interval (-0.1268077, 0.0702803). Since the interval includes zero, there is not enough evidence to conclude that the true ICCs of the two dataframes differ.

To demonstrate the possible errors encountered when using the functions in the compicc package, examples of dataframes that lead to certain errors are described below.

To compute the ICC of a dataframe, every subject must go through the same number of trials during testing. In other words, each row in the dataframe must have the same number of columns. Similarly, each dataframe being compared must have the same number of trials/columns. This means if dataframe 1 consists of four trials per subject, dataframe 2 must consist of exactly four trials per subject. An example of dataframes violating this condition is:

```
<- c(34, 33, 36)
d1_trial1 <- c(41, 38, 40)
d1_trial2 <- c(37, 36, 37)
d1_trial3 <- data.frame(d1_trial1, d1_trial2, d1_trial3)
data1
<- c(33, 33, 35)
d2_trial1 <- c(43, 41, 42)
d2_trial2 <- c(36, 36, 38)
d2_trial3 <- c(29, 30, 29)
d2_trial4 <- data.frame(d2_trial1, d2_trial2, d2_trial3, d2_trial4)
data2
indep_ci(data1, data2)
```

`## Error in indep_ci(data1, data2): number of columns in data1 must equal that of data2`

The dataframe `data1`

consists of three trials per subject, but `data2`

holds four trials per subject. The error message informs the user that the number of trials/columns must be equal across dataframes. This error applies to both the `dep_ci()`

and `indep_ci()`

functions.

In order for the two dataframes to be dependent, the subjects/rows must be matched across dataframes. This means row 1 of dataframe1 represents the same subject as row 1 of dataframe2, and so on for each additional row of data. Therefore, if there are an unequal number of subjects/rows between the two dataframes, the function `dep_ci()`

will return an error message. The dataframes cannot be dependent when they have an unequal number of rows of data. Below is an example:

```
<- c(34, 33, 36)
d1_trial1 <- c(41, 38, 40)
d1_trial2 <- c(37, 36, 37)
d1_trial3 <- data.frame(d1_trial1, d1_trial2, d1_trial3)
data1
<- c(33, 33, 35, 32)
d2_trial1 <- c(43, 41, 42, 43)
d2_trial2 <- c(36, 36, 38, 38)
d2_trial3 <- data.frame(d2_trial1, d2_trial2, d2_trial3)
data2
dep_ci(data1, data2)
```

`## Error in dep_ci(data1, data2): number of rows in data1 must equal that of data2`

The error message tells the user that the number of rows must be equal across dataframes. When receiving this message, the user should adjust the dataframes to make sure that the subjects tested in each dataframe match each other and are placed in the same row in both dataframes.

Note: This is not an issue for the independent case. The `indep_ci()`

function accepts dataframes with unequal numbers of observations, since the subjects should not match across dataframes.

The embedded functions in the compicc functions do not work with dataframes that consist of missing values. Therefore, if the user tries to call either the `dep_ci()`

or `indep_ci()`

function with a dataframe that has one or more NA or NaN value, the function will stop running and return an error message. An example of this is shown below:

```
<- c(34, 33, 36)
d1_trial1 <- c(41, 38, 40)
d1_trial2 <- c(37, NA, 37)
d1_trial3 <- data.frame(d1_trial1, d1_trial2, d1_trial3)
data1
<- c(33, 33, 35)
d2_trial1 <- c(43, 41, 42)
d2_trial2 <- c(36, 36, 38)
d2_trial3 <- data.frame(d2_trial1, d2_trial2, d2_trial3)
data2
dep_ci(data1, data2)
```

`## Error in dep_ci(data1, data2): cannot have NA values in dataframe`

The error message states that there cannot be a missing value in either dataframe. The NA value in data1 (d1_trial3) must either be replaced with an imputed value, or subject 2’s results must be discarded in order to use either function.

As mentioned above, the functions in the compicc package include intensive calculations derived by Ramasundarahettige et al. (2009).

The approach to estimating the difference between two ICCs begins with the simple case of one ICC. In this case, the formula of the confidence interval is derived from the central limit theorem and Slutsky’s Theorem:

\(L, H = \widehat{\rho} \pm (z_{\alpha/2})\sqrt{(\widehat{var}(\widehat{\rho})}\)

Where L, H are the lower, upper bounds of the confidence interval, \(\widehat{\rho}\) is the point estimate of the ICC, and \(z_{\alpha/2}\) is the \({\alpha/2}\) quantile of the normal distribution.

Extending this to the difference of two ICCs, the formula becomes:

**Lower bound:** \(L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{var(\widehat{\rho}_1)+var(\widehat{\rho}_2)}\)

**Upper bound:** \(U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{var(\widehat{\rho}_1)+var(\widehat{\rho}_2)}\)

The formulas are derived from Ramasundarahettige et al. (2009) to yield the following equations:

For the independent case, the confidence interval is calculated by:

**Lower bound:** \(L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{(\widehat{\rho_1}-l_1)^2+(u_2-\widehat{\rho_2})^2}\)

**Upper bound:** \(U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{(u_1-\widehat{\rho_1})^2+(\widehat{\rho_2}-l_2)^2}\)

For the dependent case, the confidence interval is calculated by:

**Lower bound:** \(L = \widehat{\rho_1}-\widehat{\rho_2}-\sqrt{(\widehat{\rho_1}-l_1)^2-2*\widehat{corr({\rho}_{1}{\rho}_{2})}*(\widehat{\rho_1}-l_1)*(u_2-\widehat{\rho_2})+(u_2-\widehat{\rho_2})^2}\)

**Upper bound:** \(U = \widehat{\rho_1}-\widehat{\rho_2}+\sqrt{(u_1-\widehat{\rho_1})^2-2*\widehat{corr({\rho}_{1}{\rho}_{2})}*(u_1-\widehat{\rho_1})*(\widehat{\rho_2}-l_2)+(\widehat{\rho_2}-l_2)^2}\)

Where:

\(\widehat{corr({\rho}_{1}{\rho}_{2})} = \widehat{\rho}_{12}^2*\frac{\sqrt{k_1*k_2*(k_1-1)*(k_2-1)}}{(1+(k_1-1)\widehat{\rho}_{1})(1+(k_2-1)\widehat{\rho}_{2})}\)

\(\widehat{\rho_1}\) and \(\widehat{\rho_2}\) are the point estimates of the two dataframes’ ICCs, \(l_1\) and \(u_1\) are the lower and upper bounds of the CI of the ICC of dataframe 1, \(l_2\) and \(u_2\) are the lower and upper bounds of the CI of the ICC of dataframe 2, \(k_1\) and \(k_2\) are the number of trials for dataframe 1 and dataframe 2.

For further reading into the calculations used in this package, refer to Ramasundarahettige et al. (2009).

de Vet, H.C., Terwee, C.B., Mokknink, L. B., & Knol, D.L. (2011). Measurement in medicine: A practical guide. New York, NY: Cambridge University Press.

Matthias Gamer, Jim Lemon and Ian Fellows Puspendra Singh puspendra.pusp22@gmail.com (2019). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr

Ramasundarahettige, C. F., Donner, A., & Zou, G. Y. (2009). Confidence Interval Construction for a Difference Between Two Dependent Intraclass Correlation Coefficients. *Statistics in Medicine*, 28(7), 1041–1053. https://doi.org/10.1002/sim.3523