cvcrand is an R package for the design and analysis of cluster randomized trials (CRTs).

Given the baseline values of some cluster-level covariates, users can perform a constrained randomization on the clusters into two arms, with an optional input of user-defined weights on the covariates.

At the end of the study, the individual outcome is collected. The
`cvcrand`

package also performs clustered permutation test on
either continuous outcome or binary outcome adjusted for some
individual-level covariates, producing p-value of the intervention
effect.

In the design of CRTs with two arms, users can use the
`cvrall()`

function to perform covariate-constrained
randomization or `cvrcov()`

function to perform
covariate-by-covariate constrained randomization. And for the analysis
part, user would use the `cptest()`

function for clustered
permutation test.

A cluster is the unit of randomization for a cluster randomized trial. Thus, when the number of clusters is small, there might be some baseline imbalance from the randomization between the arms. Constrained randomization constrained the randomization space to randomization schemes with smaller difference among the covariates between the two arms.

The balance score for constrained randomization in the program is developed from (Raab and Butcher 2001). Suppose \(n\), \(n_T\), and \(n_C\) are the total number of clusters, the number of clusters in the treatment arm and the control arm respectively. Suppose also that there are \(K\) cluster-level variables including the continuous covariates as well as the dummy variables created from the categorical covariates. \(x_{ik}\) is the \(k\)th covariate (\(k=1,\ldots,K\)) of cluster \(i\). \(\bar{x}_{Tk}=\sum_{i=1}^{n_T}x_{ik}/n_T\) and \(\bar{x}_{Ck}=\sum_{i=n_T+1}^{n}x_{ik}/n_C\) are the means of the \(k\)th cluster-level variable in the treatment arm and the control arm, respectively, and \(\omega_{k}\) is a pre-determined weight for the \(k\)th variable. We choose \(\omega_{k}\) to be the inverse of the variance of the \(k\)th variable across all clusters following (Raab and Butcher 2001) and (Fan Li et al. 2016), namely \(\omega_k={1}/{s_k^2}=\frac{n-1}{\sum_{i=1}^n(x_{ik}-\bar{x}_k)^2}\) with \(\bar{x}_k=\sum_{i=1}^nx_{ik}/n\).

There are two choices of metric for the balance score. The balance
score of the `"l2"`

metric is defined as \(B_{(l2)}=\sum_{k=1}^{K}\omega_k(\bar{x}_{Tk}-\bar{x}_{Ck})^2\).
And if `"l1"`

metric from (F. Li et
al. 2017) is specified, the balance score is defined as \(B_{(l1)}=\sum_{k=1}^{K}\tilde{\omega}_k\left|\bar{x}_{Tk}-\bar{x}_{Ck}\right|\),
where \(\tilde{\omega}_k\) is chosen to
be the inverse of the standard deviation of the \(k\)th variable \(s_k\).

To reflect the relative importance of different baseline covariates,
one may include user-defined weights in the `"l1"`

and
`"l2"`

balance metrics. The `"l2"`

balance metric
is set to be \(B_{(l2)}=\sum_{k=1}^{K}d_k\omega_k(\bar{x}_{Tk}-\bar{x}_{Ck})^2\),
where \(d_k\) is the user-defined
weight for the \(k\)th variable. By
default, \(d_k=1\) for all variables. A
large user-defined weight \(d_k>1\)
could be assigned to a variable of importance when assessing the balance
scores. Similarly, we modify the \(l1\)
balance metric by allowing for user-defined weight as \(B_{(l1)}=\sum_{k=1}^{K}d_k\tilde{\omega}_k\left|\bar{x}_{Tk}-\bar{x}_{Ck}\right|\).

With the baseline values of the specified cluster-level covariates in
a cluster randomized trail, the `cvrall()`

function in the
`cvcrand`

package is used to perform the
covariate-constrained randomization.

Each categorical variable is transformed into dummy variables to
calculate the balance score. When transforming a categorical variable to
dummy variables, the reference level will be dropped if the categorical
variable is specified as a factor. Otherwise, the first level in the
alphanumerical order will be dropped. Users can also specify a certain
reference level for each categorical variable by manually coding dummy
variables before running the `cvrall()`

function. In addition
to constraining the randomization space via a scalar summary score, we
developed the `cvrcov()`

function to implement constrained
randomization with baseline balance defined directly through each
covariate.

We followed the routine developed by (Greene
2017) to give covariate-by-covariate constraints based on arm
mean difference or arm total difference. For each covariate to be
considered for constrained randomization, a specific constraint is to be
specified by users. The constraint of `"any"`

means no
constraints. If not `"any"`

, the first character letter of
`"m"`

denotes absolute mean difference, and `"s"`

means absolute sum difference. If the second character is
`"f"`

, the previous metric is constrained to be smaller or
equal to the fraction with the number followed of the overall mean for
`"m"`

or mean arm total for `"s"`

. If not
`"f"`

at the second character, the metric is just constrained
to be smaller or equal to the value followed.

To check the randomization validity ((Bailey
and Rowley 1987)), the argument of `check_validity`

in
both `cvrall()`

and `cvrcov()`

functions could be
specified to be `TRUE`

. Then the functions would provide
summary statistics on cluster pairs that always or never appear together
in the same arm, which might imply the validity of randomization.

`cvrall()`

example for covariate-constrained
randomizationA study presented by (Dickinson et al.
2015) is about two approaches (interventions) for increasing the
“up-to-date” immunization rate in 19- to 35-month-old children. They
planned to randomize 16 counties in Colorado 1:1 to either a
population-based approach or a practice-based approach. There are
several county-level variables. The program will randomize on a subset
of these variables. The continuous variable of average income is
categorized to illustrate the use of the `cvrall()`

on
multi-category variables. The percentage in Colorado Immunization
Information System (CIIS) variable is trancated at 100%.

county | location | inciis | numberofchildrenages1935months | uptodateonimmunizations | africanamerican |
---|---|---|---|---|---|

1 | Rural | 94 | 366 | 37 | 2 |

2 | Rural | 85 | 1274 | 39 | 0 |

3 | Rural | 85 | 614 | 42 | 5 |

4 | Rural | 93 | 1720 | 39 | 1 |

5 | Rural | 82 | 242 | 31 | 1 |

6 | Rural | 80 | 350 | 27 | 3 |

7 | Rural | 94 | 401 | 49 | 1 |

8 | Rural | 100 | 234 | 37 | 1 |

9 | Urban | 93 | 3779 | 51 | 4 |

10 | Urban | 89 | 11807 | 51 | 10 |

11 | Urban | 83 | 9453 | 54 | 2 |

12 | Urban | 70 | 12354 | 29 | 8 |

13 | Urban | 93 | 10008 | 50 | 2 |

14 | Urban | 85 | 5343 | 36 | 2 |

15 | Urban | 82 | 3143 | 38 | 3 |

16 | Urban | 84 | 6056 | 43 | 1 |

hispanic | pediatricpracticetofamilymedicin | communityhealthcenters | incomecat | income |
---|---|---|---|---|

44 | 1.00 | 1 | Low | 35988 |

23 | 0.08 | 0 | High | 67565 |

12 | 0.33 | 3 | Low | 35879 |

18 | 0.33 | 6 | High | 63617 |

6 | 0.20 | 0 | High | 59118 |

15 | 0.00 | 3 | Med | 57179 |

38 | 0.20 | 3 | Low | 29738 |

39 | 0.00 | 1 | Low | 37350 |

35 | 0.15 | 11 | Med | 52923 |

17 | 0.45 | 6 | Med | 58302 |

7 | 0.61 | 1 | High | 93819 |

13 | 0.26 | 10 | Med | 54839 |

13 | 0.34 | 3 | High | 63857 |

10 | 0.18 | 7 | Med | 53502 |

39 | 0.27 | 7 | Low | 39570 |

28 | 0.10 | 8 | Med | 52457 |

For the covariate-constrained randomization, we used the
`cvrall()`

function to randomize 8 out of the 16 counties
into the practice-based. For the definition of the whole randomization
space, if the total number of all possible schemes is smaller than
`50,000`

, we enumerate all the schemes as the whole
randomization space. Otherwise, we simulate `50,000`

schemes
and choose the unique shemes among them as the whole randomization
space. We calculate the balance scores of `"l2"`

metric on
three continuous covariates as well as two categorical covariates of
location and income category. Location has `"Rural"`

and
`"Urban"`

. The level of `"Rural"`

was then dropped
in `cvrall()`

. As income category has three levels of
`"low"`

, `"med"`

, and `"high"`

, the
level of `"high"`

was dropped to create dummy variables
according to the alphanumerical order as well. Then we constrained the
randomization space to the schemes with `"l2"`

balance scores
less than the `0.1`

quantile of that in the whole
randomization space. Finally, a randomization scheme is sampled from the
constrained space.

We saved the constrained randomization space in a CSV file in
`"dickinson_constrained.csv"`

, the first column of which is
an indicator variable of the finally selected scheme (`1`

) or
not (`0`

). We also saved the balance scores of the whole
randomization space in a CSV file in
`"dickinson_bscores.csv"`

, and output a histogram displaying
the distribution of all balance scores with a red line indicating our
selected cutoff (the `0.1`

quantile).

```
Design_result <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(Dickinson_design[ , c("location", "inciis",
"uptodateonimmunizations", "hispanic", "incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
###### Option to save the constrained space #####
# savedata = "dickinson_constrained.csv",
bhist = TRUE,
cutoff = 0.1,
seed = 12345)
```

The we had the following output:

`## [1] "l2"`

```
## clustername allocation
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 0
## 5 5 0
## 6 6 0
## 7 7 0
## 8 8 1
## 9 9 0
## 10 10 1
## 11 11 1
## 12 12 1
## 13 13 0
## 14 14 1
## 15 15 0
## 16 16 0
```

```
##
## 1 score (selected scheme) 2.684
## 2 cutoff score 7.638
## 3 Mean 24.000
## 4 SD 15.775
## 5 Min 1.161
## 6 5% 5.826
## 7 10% 7.638
## 8 20% 10.849
## 9 25% 12.221
## 10 30% 13.840
## 11 50% 20.578
## 12 75% 31.621
## 13 95% 55.486
## 14 Max 116.656
```

```
# the statement about how many clusters to be randomized to the intervention and the control arms respectively
Design_result$assignment_message
```

`## [1] "You have indicated that you want to assign 8 clusters to treatment and 8 to control"`

```
# the statement about how to get the whole randomization space to use in constrained randomization
Design_result$scheme_message
```

`## [1] "Enumerating all the 12870 schemes for 8 clusters in the treatment arm out of 16 clusters in total"`

`## [1] "The quantile cutoff value is 0.1 based on the l2 balance metric, the cutoff balance score is 7.638"`

```
# the statement about the selected scheme from constrained randomization
Design_result$choice_message
```

`## [1] "Balance score of selected scheme by l2 is 2.684"`

```
# the data frame containing the allocation scheme, the clustername as well as the original data frame of covariates
Design_result$data_CR
```

```
## arm clustername location inciis uptodateonimmunizations hispanic incomecat
## 1 1 1 Rural 94 37 44 Low
## 2 1 2 Rural 85 39 23 High
## 3 1 3 Rural 85 42 12 Low
## 4 0 4 Rural 93 39 18 High
## 5 0 5 Rural 82 31 6 High
## 6 0 6 Rural 80 27 15 Med
## 7 0 7 Rural 94 49 38 Low
## 8 1 8 Rural 100 37 39 Low
## 9 0 9 Urban 93 51 35 Med
## 10 1 10 Urban 89 51 17 Med
## 11 1 11 Urban 83 54 7 High
## 12 1 12 Urban 70 29 13 Med
## 13 0 13 Urban 93 50 13 High
## 14 1 14 Urban 85 36 10 Med
## 15 0 15 Urban 82 38 39 Low
## 16 0 16 Urban 84 43 28 Med
```

```
# the descriptive statistics for all the variables by the two arms from the selected scheme
Design_result$baseline_table
```

```
## arm = 0 arm = 1
## n 8 8
## location = Urban (%) 4 (50.0) 4 (50.0)
## inciis (mean (SD)) 87.62 (6.12) 86.38 (8.75)
## uptodateonimmunizations (mean (SD)) 41.00 (8.93) 40.62 (8.23)
## hispanic (mean (SD)) 24.00 (12.65) 20.62 (13.80)
## incomecat (%)
## High 3 (37.5) 2 (25.0)
## Low 2 (25.0) 3 (37.5)
## Med 3 (37.5) 3 (37.5)
```

```
# the cluster pair descriptive, which is useful for valid randomization check
Design_result$cluster_coin_des
```

`## NULL`

```
## overall allocations checked allocations accepted allocations overall % acceptable
## 1 12870 12870 1287 10%
```

From the output of `Design_result$baseline_table`

, the
selected scheme is able to properly balance the baseline values of the
covariates. The selected scheme is shown in
`Design_result$allocation`

.

`cvrall()`

example for stratified constrained
randomizationUser-defined weights can be used to induce stratification on one or
more categorical variables. In the study presented by (Dickinson et al. 2015), there are 8
`"Urban"`

and 8 `"Rural"`

counties. A user-defined
weight of `1,000`

is added to the covariate of
`location`

, while these weights for other covariates are all
`1`

. Intuitively, a large weight assigned to a covariate
sharply penalizes any imbalance of that covariates, therefore including
schemes that are optimally balanced with respect to that covariate in
the constrained randomization space. In practice, the resulting
constrained space approximates the stratified randomization space on
that covariate. In our illustrative data example, since half of the
counties are located in rural areas, perfect balance is achieved by
considering constrained randomization with the large weight for
`location`

variable. Alternatively, the option of
`stratify`

is able to perform the equivalent stratification
on the stratifying variables specified.

```
# Stratification on location, with constrained randomization on other specified covariates
Design_stratified_result1 <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(Dickinson_design[ , c("location", "inciis",
"uptodateonimmunizations",
"hispanic", "incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
weights = c(1000, 1, 1, 1, 1),
cutoff = 0.1,
seed = 12345)
```

```
## arm = 0 arm = 1
## n 8 8
## location = Urban (%) 4 (50.0) 4 (50.0)
## inciis (mean (SD)) 87.62 (6.12) 86.38 (8.75)
## uptodateonimmunizations (mean (SD)) 41.00 (8.93) 40.62 (8.23)
## hispanic (mean (SD)) 24.00 (12.65) 20.62 (13.80)
## incomecat (%)
## High 3 (37.5) 2 (25.0)
## Low 2 (25.0) 3 (37.5)
## Med 3 (37.5) 3 (37.5)
```

```
# An alternative and equivalent way to stratify on location
Design_stratified_result2 <- cvrall(clustername = Dickinson_design$county,
balancemetric = "l2",
x = data.frame(Dickinson_design[ , c("location", "inciis",
"uptodateonimmunizations",
"hispanic", "incomecat")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
categorical = c("location", "incomecat"),
stratify = "location",
cutoff = 0.1,
seed = 12345,
check_validity = TRUE)
```

```
## arm = 0 arm = 1
## n 8 8
## location = Urban (%) 4 (50.0) 4 (50.0)
## inciis (mean (SD)) 87.62 (6.12) 86.38 (8.75)
## uptodateonimmunizations (mean (SD)) 41.00 (8.93) 40.62 (8.23)
## hispanic (mean (SD)) 24.00 (12.65) 20.62 (13.80)
## incomecat (%)
## High 3 (37.5) 2 (25.0)
## Low 2 (25.0) 3 (37.5)
## Med 3 (37.5) 3 (37.5)
```

The results from
`Design_stratified_result1$baseline_table`

and
`Design_stratified_result2$baseline_table`

are the same. The
final selected scheme from cvrall() now has 4 `"Urban"`

counties in both arms. The `location`

covariate has been
stratified for the randomization for the randomization through the
`weights`

or `stratify`

argument in the
`cvrall()`

function.

`cvrcov()`

example for covariate-by-covariate constrained
randomizationFor the covariate-by-covariate randomization, we used the
`cvrcov()`

function to randomize 8 out of the 16 counties
into the practice-based. For the definition of the whole randomization
space, if the total number of all possible schemes is smaller than
`100,000`

, we enumerate all the schemes as the whole
randomization space. Otherwise, we simulate `100,000`

unique
schemes. Location has `"Rural"`

and `"Urban"`

. The
level of `"Rural"`

was then kept as 1 in
`cvrcov()`

and `"Urban"`

is 0. Then we constrained
the randomization space to have the schemes with absolute total
difference of location be smaller than or equal to `5`

,
absolute mean difference of percentages of children ages 19-35 months in
the CIIS less than or equal to `0.5`

fraction of the overall
mean, and absolute mean difference of income to be less than or equal to
the `0.4`

fraction of the overall mean. Finally, a
randomization scheme is sampled from the constrained space.

We saved the constrained randomization space in a CSV file in
`"dickinson_cov_constrained.csv"`

, the first column of which
is an indicator variable of the finally selected scheme (`1`

)
or not (`0`

).

```
# change the categorical variable of interest to have numeric representation
Dickinson_design_numeric <- Dickinson_design
Dickinson_design_numeric$location = (Dickinson_design$location == "Rural") * 1
Design_cov_result <- cvrcov(clustername = Dickinson_design_numeric$county,
x = data.frame(Dickinson_design_numeric[ , c("location", "inciis",
"uptodateonimmunizations",
"hispanic", "income")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
constraints = c("s5", "mf.5", "any", "any", "mf0.4"),
categorical = c("location"),
###### Option to save the constrained space #####
# savedata = "dickinson_cov_constrained.csv",
seed = 12345,
check_validity = TRUE)
```

The we had the following output:

```
## id allocation
## [1,] 1 0
## [2,] 2 1
## [3,] 3 0
## [4,] 4 1
## [5,] 5 1
## [6,] 6 1
## [7,] 7 0
## [8,] 8 0
## [9,] 9 1
## [10,] 10 0
## [11,] 11 0
## [12,] 12 0
## [13,] 13 1
## [14,] 14 0
## [15,] 15 1
## [16,] 16 1
```

```
# the statement about how many clusters to be randomized to the intervention and the control arms respectively
Design_cov_result$assignment_message
```

`## [1] "You have indicated that you want to assign 8 clusters to treatment and 8 to control"`

```
# the statement about how to get the whole randomization space to use in constrained randomization
Design_cov_result$scheme_message
```

`## [1] "Enumerating all the 12870 schemes for 8 clusters in the treatment arm out of 16 clusters in total"`

```
# the data frame containing the allocation scheme, the clustername as well as the original data frame of covariates
Design_cov_result$data_CR
```

```
## arm id location inciis uptodateonimmunizations hispanic income
## 1 0 1 1 94 37 44 35988
## 2 1 2 1 85 39 23 67565
## 3 0 3 1 85 42 12 35879
## 4 1 4 1 93 39 18 63617
## 5 1 5 1 82 31 6 59118
## 6 1 6 1 80 27 15 57179
## 7 0 7 1 94 49 38 29738
## 8 0 8 1 100 37 39 37350
## 9 1 9 0 93 51 35 52923
## 10 0 10 0 89 51 17 58302
## 11 0 11 0 83 54 7 93819
## 12 0 12 0 70 29 13 54839
## 13 1 13 0 93 50 13 63857
## 14 0 14 0 85 36 10 53502
## 15 1 15 0 82 38 39 39570
## 16 1 16 0 84 43 28 52457
```

```
# the descriptive statistics for all the variables by the two arms from the selected scheme
Design_cov_result$baseline_table
```

```
## arm = 0 arm = 1
## n 8 8
## location = 1 (%) 4 (50.0) 4 (50.0)
## inciis (mean (SD)) 87.50 (9.12) 86.50 (5.58)
## uptodateonimmunizations (mean (SD)) 41.88 (8.69) 39.75 (8.33)
## hispanic (mean (SD)) 22.50 (15.13) 22.12 (11.32)
## income (mean (SD)) 49927.12 (20670.81) 57035.75 (8847.89)
```

```
# the cluster pair descriptive, which is useful for valid randomization check
Design_cov_result$cluster_coin_des
```

```
## Mean Std Dev Minimum 25th Pctl Median 75th Pctl Maximum
## samecount 5937.867 35.142 5892.000 5902.000 5962.000 5972.000 5978.000
## samefrac 0.467 0.003 0.463 0.464 0.469 0.469 0.470
## diffcount 6786.133 35.142 6746.000 6752.000 6762.000 6822.000 6832.000
## difffrac 0.533 0.003 0.530 0.531 0.531 0.536 0.537
```

```
## overall allocations checked allocations accepted allocations overall % acceptable
## 1 12870 12870 12724 98.87%
```

From the output of `Design_cov_result$baseline_table`

, the
selected scheme is able to properly balance the baseline values of the
covariates. The selected scheme is shown in
`Design_cov_result$allocation`

.

At the end of cluster randomized trials, individual outcomes are collected. Permutation test based on (Gail et al. 1996) and (Fan Li et al. 2016) is then applied to the continuous or binary outcome with some individual-level covariates.

The permutation test is implemented in a two-step procedure. In the first step, an outcome regression model is fitted for response \(Y_{ij}\) with covariates \(\textbf{z}_{ij}\). This is done by fitting a linear regression model for continuous responses and a logistic regression model for binary responses (Gail et al. 1996), ignoring the clustering of responses. The individual residual \(r_{ij}=Y_{ij}-\hat{Y}_{ij}\) can be calculated from the predicted response for each individual by \(\hat{Y}_{ij}\). In the second step, cluster-specific residual means are obtained as \(\bar{r}_{i\cdot}=\sum_{j=1}^{m_i}r_{ij}/m_i\). The observed test statistic is then computed as \(U=\frac{1}{n_T}\sum_{i=1}^nW_i\bar{r}_{i\cdot}- \frac{1}{n_C}\sum_{i=1}^n(1-W_i)\bar{r}_{i\cdot}\), where \(W_i=1\) if the \(i\)th cluster is assigned to the treatment arm and \(W_i=0\) otherwise, and \(n_T=\sum_{i=1}^nW_i\), \(n_C=\sum_{i=1}^n(1-W_i)\) are the number of treated and control clusters.

Suppose there are \(S\) randomization schemes in the constrained randomization space. To obtain the permutation distribution of the test statistic, we permute the labels of the treatment indicator according to the constrained randomization space, and compute a value of \(U_s\) (\(s=1,\ldots,S\)). The collection of these values \(\{U_s:s=1,\ldots,S\}\) forms the null distribution of the permutation test statistic. The p-value is then computed by \(\text{p-value}=\frac{1}{S}\sum_{s=1}^S \mathbb{I}(|U_s|\geq |U|)\).

The `cptest()`

function in the `cvcrand`

package is used to perform the permutation test for the intervention
effect of cluster randomized trials.

Each categorical variable is transformed into dummy variables to fit
in the linear model or logistic regression for the permutation test.
When transforming a categorical variable to dummy variables, the
reference level will be dropped if the categorical variable is specified
as a factor. Otherwise, the first level in the alphanumerical order will
be dropped. Users can also specify a certain reference level for each
categorical variable by manually coding dummy variables before running
the `cptest()`

function.

`cptest()`

exampleSuppose that the researchers were able to assess 300 children in each
cluster in a study presented by (Dickinson et al.
2015), and the cluster randomized trial is processed with the
selected randomization scheme from the example above of the
`cvrall()`

function. We expanded the values of the
cluster-level covariates on the covariates’ values of the individuals,
according to which cluster they belong to. The correlated individual
outcome of up-to-date on immunizations (`1`

) or not
(`0`

) is then simulated using a generalized linear mixed
model (GLMM) with a logistic link to induce correlation by including a
random effect at the county level. The intracluster correlation (ICC)
was set to be 0.01, using the latent response definition provided in
(Eldridge, Ukoumunne, and Carlin 2009).
This is a reasonable value for population health studies (Hannan et al. 1994). We simulated one data set,
with the outcome data dependent on the county-level covariates used in
the constrained randomization design and a positive treatment effect so
that the practice-based intervention increases up-to-date immunization
rates more than the community-based intervention. For each individual
child, the outcome is equal to `1`

if he or she is up-to-date
on immunizations and `0`

otherwise.

county | location | inciis | uptodateonimmunizations | hispanic | incomecat | outcome |
---|---|---|---|---|---|---|

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 0 |

1 | Rural | 94 | 37 | 44 | 0 | 0 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

1 | Rural | 94 | 37 | 44 | 0 | 1 |

We used the `cptest()`

function to process the clustered
permutation test on the binary outcome of the status of up-to-date on
immunizations. We input the file about the constrained space with the
first column indicating the final scheme. The permutation test is on the
continuous covariates of `"inciis"`

,
`"uptodateonimmunizations"`

, `"hispanic"`

, as well
as categorical variables of `"location"`

and
`"incomecat"`

. Location has `"Rural"`

and
`"Urban"`

. The level of `"Rural"`

was then dropped
in `cptest()`

. As income category has three levels of
`"low"`

, `"med"`

, and `"high"`

, the
level of `"high"`

was dropped to create dummy variables
according to the alphanumerical order as well.

```
Analysis_result <- cptest(outcome = Dickinson_outcome$outcome,
clustername = Dickinson_outcome$county,
z = data.frame(Dickinson_outcome[ , c("location", "inciis",
"uptodateonimmunizations", "hispanic", "incomecat")]),
cspacedatname = system.file("dickinson_constrained.csv", package = "cvcrand"),
outcometype = "binary",
categorical = c("location","incomecat"))
```

The result of `"cptest()"`

includes the final scheme for
the cluster randomized trial, the p-value from the permutation test as
well as a statement about that p-value.

```
## $FinalScheme
## Cluster_ID Intervention
## 1 1 0
## 2 2 0
## 3 3 0
## 4 4 1
## 5 5 1
## 6 6 0
## 7 7 1
## 8 8 0
## 9 9 1
## 10 10 1
## 11 11 0
## 12 12 1
## 13 13 1
## 14 14 0
## 15 15 1
## 16 16 0
##
## $pvalue
## [1] 0.042
##
## $pvalue_statement
## [1] "Clustered permutation test p-value = 0.042"
```

From the p-value of `0.042`

in
`Analysis_result`

, the probability of up-to-date on
immunizations for the practice-based approach (`1`

) is
significantly different from that for the population-based approach
(`0`

).

```
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cvcrand_0.1.1
##
## loaded via a namespace (and not attached):
## [1] highr_0.10 bslib_0.4.2 compiler_4.2.2 pillar_1.8.1 jquerylib_0.1.4
## [6] class_7.3-20 forcats_1.0.0 tools_4.2.2 digest_0.6.31 jsonlite_1.8.4
## [11] evaluate_0.20 lifecycle_1.0.3 tibble_3.1.8 lattice_0.20-45 pkgconfig_2.0.3
## [16] rlang_1.0.6 Matrix_1.5-1 DBI_1.1.3 cli_3.6.0 rstudioapi_0.14
## [21] yaml_2.3.7 haven_2.5.3 xfun_0.37 fastmap_1.1.1 e1071_1.7-13
## [26] dplyr_1.1.0 knitr_1.42 hms_1.1.3 generics_0.1.3 sass_0.4.5
## [31] vctrs_0.5.2 mitools_2.4 tidyselect_1.2.0 grid_4.2.2 glue_1.6.2
## [36] R6_2.5.1 fansi_1.0.4 survival_3.4-0 rmarkdown_2.20 magrittr_2.0.3
## [41] MASS_7.3-58.1 htmltools_0.5.4 splines_4.2.2 labelled_2.12.0 tableone_0.13.2
## [46] utf8_1.2.3 proxy_0.4-27 survey_4.2-1 cachem_1.0.7 zoo_1.8-12
```

Bailey, RA, and CA Rowley. 1987. “Valid Randomization.”
*Proceedings of the Royal Society of London. A. Mathematical and
Physical Sciences* 410 (1838): 105–24.

Dickinson, L Miriam, Brenda Beaty, Chet Fox, Wilson Pace, W Perry
Dickinson, Caroline Emsermann, and Allison Kempe. 2015. “Pragmatic
Cluster Randomized Trials Using Covariate Constrained Randomization: A
Method for Practice-Based Research Networks (PBRNs).” *The
Journal of the American Board of Family Medicine* 28 (5): 663–72.

Eldridge, Sandra M, Obioha C Ukoumunne, and John B Carlin. 2009.
“The Intra-Cluster Correlation Coefficient in Cluster Randomized
Trials: A Review of Definitions.” *International Statistical
Review* 77 (3): 378–94.

Gail, Mitchell H, Steven D Mark, Raymond J Carroll, Sylvan B Green, and
David Pee. 1996. “On Design Considerations and Randomization-Based
Inference for Community Intervention Trials.” *Statistics in
Medicine* 15 (11): 1069–92.

Greene, Erich J. 2017. “A SAS Macro for Covariate-Constrained
Randomization of General Cluster-Randomized and Unstratified
Designs.” *Journal of Statistical Software* 77 (CS1).

Hannan, Peter J, David M Murray, David R Jacobs Jr, and Paul G McGovern.
1994. “Parameters to Aid in the Design and Analysis of Community
Trials: Intraclass Correlations from the Minnesota Heart Health
Program.” *Epidemiology*, 88–95.

Li, Fan, Yuliya Lokhnygina, David M Murray, Patrick J Heagerty, and
Elizabeth R DeLong. 2016. “An Evaluation of Constrained
Randomization for the Design and Analysis of Group-Randomized
Trials.” *Statistics in Medicine* 35 (10): 1565–79.

Li, F, EL Turner, PJ Heagerty, DM Murray, WM Vollmer, and ER DeLong.
2017. “An Evaluation of Constrained Randomization for the Design
and Analysis of Group-Randomized Trials with Binary Outcomes.”
*Statistics in Medicine* 36 (24): 3791.

Raab, Gillian M, and Izzy Butcher. 2001. “Balance in Cluster
Randomized Trials.” *Statistics in Medicine* 20 (3):
351–65.