The BEBOP phase II trial methodology incorporates predictive baseline information to study co-primary efficacy and toxicity outcomes. BEBOP stands for Bayesian Evaluation of Bivariate Binary Outcomes and Predictive Information. It was developed for the PePS2 trial, investigating pembrolizumab in non-small-cell lung cancer (NSCLC) patients with performance status 2 (PS2). A major factor in the PePS2 trial is the data on PS0 / 1 NSCLC patients published by Garon et al. (2015), showing that patients with greater PD-L1 tumour proportion scores are more likely to achieve an objective response. They introduce this predictive biomarker and validate a three-level categorisation Low, Medium and High PD-L1 score. There was also a suggestion (albeit not shown to be *statistically significant*) that previously untreated patients are more likely to have a response. These same variables will likely be predictive of response in the PS2 population. Brock et al. (publication in submission) developed BEBOP to make efficient use of this information.

BEBOP re-uses the probability model at the core of the EffTox design (Thall and Cook 2004; Thall et al. 2014). Let \(x\) represent a vector of predictive baseline variables and \(\boldsymbol{\theta}\) a vector of parameters. The marginal probabilities of efficacy and toxicity are estimated using general functions \(\text{logit } \pi_E(x, \boldsymbol{\theta})\) and \(\text{logit } \pi_T(x, \boldsymbol{\theta})\) to be specified by the user.

In PePS2, we have \(x_{i1} = 1\) if a patient has been previously treated. To convey PD-L1 expression, the authors use \(x_{i2} = 1\) and \(x_{i3} = 0\) if a patient has a Low PD-L1 score; \(x_{i2} = 0\) and \(x_{i3} = 1\) if a patient has a Medium PD-L1 score; and \(x_{i2} = 0\) and \(x_{i3} = 0\) if a patient has a High PD-L1 score. Thus, we have \(x_i = (x_{i1}, x_{i2}, x_{i3})\). The marginal efficacy and toxicity functions in PePS2 take the form

\(\text{logit } \pi_E(x, \boldsymbol{\theta}) = \alpha + \beta x_{i1} + \gamma x_{i2} + \zeta x_{i3}\)

\(\text{logit } \pi_T(x, \boldsymbol{\theta}) = \lambda\)

As with EffTox, let \((Y_j, Z_j)\) be random variables each taking values \(\{0, 1\}\) respresenting the presence of efficacy and toxicity in patient \(j\). The efficacy and toxicity events are associated by the joint probability function

\(Pr(Y = a, Z = b) = \pi_{a,b}(\pi_E, \pi_T) = (\pi_E)^a (1-\pi_E)^{1-a} (\pi_T)^b (1-\pi_T)^{1-b} + (-1)^{a+b} (\pi_E) (1-\pi_E) (\pi_T) (1-\pi_T) \frac{e^\psi-1}{e^\psi+1}\).

The complete vector of parameters is \(\boldsymbol{\theta} = (\alpha, \beta, \gamma, \zeta, \lambda, \psi)\). Normal priors are specified for the elements of \(\boldsymbol{\theta}\).

The treatment is acceptable for patients with predictive vector \(x\) if

\(\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > \underline{\pi}_E | \mathcal{D} \right\} > p_E\)

and

\(\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < \overline{\pi}_T | \mathcal{D} \right\} > p_T\)

where \(\underline{\pi}_E, \overline{\pi}_T, p_E, p_T\) are chosen for clinical relevance.

PePS2 is an all-comers trial, thus patients are admitted regardless of their PD-L1 or pre-treatment status. This is motivated by the dearth of treatment options for PS2 NSCLC patients who cannot use chemotherapy. The design allows the predictive information to effectively stratify the analysis without stratifying recruitment. The statistical design uses the common Bayesian tool of borrowing strength across groups to improve the performance of the analysis.

`trialr`

The cohorts in the PePS2 trial are

i | Pretreated | PDL1 | x1 | x2 | x3 |
---|---|---|---|---|---|

1 | FALSE | Low | 0 | 1 | 0 |

2 | FALSE | Medium | 0 | 0 | 1 |

3 | FALSE | High | 0 | 0 | 0 |

4 | TRUE | Low | 1 | 1 | 0 |

5 | TRUE | Medium | 1 | 0 | 1 |

6 | TRUE | High | 1 | 0 | 0 |

The trial uses a sample size of 60. Let us simulate a set of outcomes with the following efficacy and toxicity rates

```
library(trialr)
peps2_sc <- function() peps2_get_data(num_patients = 60,
prob_eff = c(0.167, 0.192, 0.5, 0.091, 0.156, 0.439),
prob_tox = rep(0.1, 6),
eff_tox_or = rep(1, 6))
set.seed(123)
dat <- peps2_sc()
```

In this example, we have used efficacy rates that increase in PD-L1 and are slightly higher in previously-uintreated patients. We use the uniform toxicity rate of 10% across all cohorts, and no association between efficacy and toxicity events, represented by odds-ratios equal to 1.

The `dat`

object contains, for example, the prior parameters

`## [1] -2.2 2.0`

and simulated predictive variables and efficacy and toxicity outcomes

eff | tox | x1 | x2 | x3 |
---|---|---|---|---|

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 0 |

0 | 1 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

1 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

0 | 0 | 0 | 0 | 1 |

We fit the data to the BEBOP model and obtain samples from the posterior distribution using `rstan`

. The `BebopInPeps2`

model is provided by `trialr`

and compiled when the package is installed.

It is informative to view plots of posterior beliefs. Posterior samples of parameter values are available but they are less meaningful to us than the modelled efficacy rates, for example. View posterior distributions using code like

We see that the modelled rates of efficacy are highest in cohorts 3 and 6, but likely to be greater than the critical threshold of 10% in cohort 2 and perhaps cohort 5 as well. In contrast, there is not much evidence of clinical benefit in cohorts 1 and 4. This is confirmed by invoking the formally described analysis and associated decision rules.

```
decision <- peps2_process(fit)
knitr::kable(
with(decision, data.frame(ProbEff, ProbAccEff, ProbTox, ProbAccTox, Accept)),
digits = 3
)
```

ProbEff | ProbAccEff | ProbTox | ProbAccTox | Accept |
---|---|---|---|---|

0.076 | 0.268 | 0.068 | 1 | FALSE |

0.253 | 0.984 | 0.068 | 1 | TRUE |

0.429 | 0.993 | 0.068 | 1 | TRUE |

0.041 | 0.102 | 0.068 | 1 | FALSE |

0.151 | 0.750 | 0.068 | 1 | TRUE |

0.286 | 0.951 | 0.068 | 1 | TRUE |

ProbAccEff is the posterior probability that the efficacy rate in a cohort is greater than the 10% threshold. ProbAccTox is the probability that toxicity is less than 30%. The treatment is acceptable in a cohort if it is sufficiently efficacious and non-toxic. We see that in this simulated iteration, the treatment would be approved in all cohorts except 1 and 4.

We can perform the simulations on a greater number of iterations to learn about the operating characteristics of the design.

```
set.seed(123)
run_sims <- function(num_sims = 10, sample_data_func = peps2_sc,
summarise_func = peps2_process, ...) {
sims <- list()
for(i in 1:num_sims) {
print(i)
dat <- sample_data_func()
fit <- stan_peps2(dat$eff, dat$tox, dat$cohorts, ...)
sim <- summarise_func(fit)
sims[[i]] <- sim
}
return(sims)
}
sims <- run_sims(num_sims = 10, sample_data_func = peps2_sc,
summarise_func = peps2_process)
```

In `run_sims`

, the second and third args are delegates to simulate trial outcomes and post-process the `rstan`

sample respectively. The outcome samping delelgate is called without arguments. The post-process delegate is called with first argument the object returned by the outcome sampling delegate (e.g. `dat`

above), and second argument the posterior sample from `rstan`

(e.g. `samp`

above). The objects returned by the post-process delegate form the items in the `sims`

object that are returned to the user by `peps2_run_sims`

.

Thus, the probaility of approving the treatment using the statistical design in this scenario can be calculated using

`trialr`

is available at https://github.com/brockk/trialr and https://CRAN.R-project.org/package=trialr

Garon, Edward B, Naiyer a Rizvi, Rina Hui, Natasha Leighl, Ani S Balmanoukian, Joseph Paul Eder, Amita Patnaik, et al. 2015. “Pembrolizumab for the Treatment of Non-Small-Cell Lung Cancer.” *The New England Journal of Medicine* 372 (21): 2018–28. https://doi.org/10.1056/NEJMoa1501824.

Thall, PF, and JD Cook. 2004. “Dose-Finding Based on Efficacy-Toxicity Trade-Offs.” *Biometrics* 60 (3): 684–93.

Thall, PF, RC Herrick, HQ Nguyen, JJ Venier, and JC Norris. 2014. “Effective Sample Size for Computing Prior Hyperparameters in Bayesian Phase I-II Dose-Finding.” *Clinical Trials* 11 (6): 657–66. https://doi.org/10.1177/1740774514547397.