Medicine and Health

The design and evaluation of hybrid controlled trials that leverage external data and randomization

S. Ventz, S. Khozin, et al.

This innovative research, conducted by a talented team including Steffen Ventz, Sean Khozin, and others, unveils a hybrid clinical trial design that melds external control datasets with randomization, optimizing treatment effect inference and addressing confounders in patient characteristics. Through simulations involving ES-SCLC and GBM studies, the study showcases the superiority of this approach over traditional trial methodologies.

00:00

~3 min • Beginner • English

Index

Introduction

Randomized controlled trials (RCTs) are essential to demonstrate efficacy. With the increasing availability of data from past trials, the prospective use of external control (EC) data in the design, conduct, and analysis of clinical trials has the potential to reduce the cost and time of evaluating new treatments. In this work, the authors introduce and examine a hybrid trial (HT) design that combines EC data and randomization to test experimental therapeutics. They evaluate power, control of false positive rates (type I error), average sample size, and study duration using simulations and in silico trials generated from extensive-stage small cell lung cancer (ES-SCLC) and glioblastoma (GBM) datasets. They compare HT to single-arm externally controlled trials (ECTs) and conventional RCTs under scenarios including measured and unmeasured confounding and differences in conditional outcome distributions between studies.

Literature Review

The authors performed a literature review to identify pre-treatment characteristics associated with overall survival (OS) in ES-SCLC, which informed variable selection for analyses. From identified prognostic factors, only sex, age, and ECOG performance status were available in the ES-SCLC datasets used (CALGB-9732, GALES, and Pirker et al.). They also discuss prior methodological contributions on integrating external control data into single-arm trials (e.g., marginal structural models, matching, inverse-probability weighting) and previous demonstrations of external controls in oncology, highlighting assumptions and limitations reported in the literature.

Methodology

The study proposes a two-stage hybrid trial (HT) design that integrates external control (EC) data with internal randomization. Stage 1 enrolls n1 patients randomized 1:1 to internal control (IC) and experimental arms. At an interim analysis (IA), they compute a dissimilarity index W1 comparing conditional outcome distributions Pr(Y|X, A=0, S) for IC (S=0) versus EC (S=1). If W1 exceeds a prespecified threshold w1, EC data are excluded from the futility IA and the study maintains 1:1 randomization in stage 2. If W1 ≤ w1, EC data are leveraged at the futility IA and the stage 2 randomization ratio is updated to r2,c:r2,e (considered values 1:1, 1:2, or 0:1). At trial completion, a second dissimilarity index W2 is computed; EC data are included in final analyses only if W2 ≤ w2. Externally controlled trials (ECTs) are the special case without randomization, assuming identical conditional outcome distributions across trial and EC populations. Treatment effect (TE) is defined as TE = Σx {E[Y|X=x, A=1] − E[Y|X=x, A=0]} Pr(X=x) and is estimated using adjustment methods: matching, inverse-probability weighting (IPW), or marginal structural models (MSMs); MSMs were used in primary analyses. Hypothesis testing: For ECTs and HTs that include EC data at final analysis (W2 ≤ w2), MSMs estimate TE and test H0: TE ≤ 0. For RCTs and HTs with W2 > w2, analyses rely on trial data only (difference in empirical response rates; two-sample z-test for proportions). An alternative permutation test for HTs that include EC data is described, which maintains nominal type I error even under violations of standard adjustment assumptions. Evaluation: Operating characteristics (type I error, power, early stopping for futility, average sample size, study duration) were assessed via (1) model-based simulations and (2) a leave-one-study-out resampling algorithm using real datasets. Model-based simulations: Considered maximum sample size 120, IA after 60, α=0.05. EC dataset size 1000. RCTs used 1:1 randomization for all 120; ECTs enrolled all 120 to experimental. Simulated three binary pre-treatment variables X1, X2, X3 with specified prevalences in EC and HT populations, logistic outcome model p(Y=1|X,A,S)=F(θA+Xθ), and scenarios allowing unmeasured confounding (e.g., X1 unavailable) and differing conditional outcome distributions between EC and HT (Tables 1–2). Treatment effects considered TE=0 and TE>0 (log-odds 0.8). Resampling (in silico) trials: A leave-one-study-out approach resampled patient profiles and outcomes from SOC arms of ES-SCLC and GBM studies to form trial IC/experimental data and used remaining studies as EC. HT IA decisions and final analyses used W1/W2 thresholds. Power scenarios were created by probabilistically relabeling some experimental non-responders to responders (π=0.4 ES-SCLC, π=0.5 GBM). Time-to-event outcomes (OS, OS-9, OS-12) were also considered (details in Supplementary Information). Datasets: ES-SCLC from Project Data Sphere: CALGB-9732 (N=283), Pirker et al. (N=232; 80% subsample available), GALES (N=455), including cisplatin or carboplatin plus etoposide SOC. GBM datasets included Chinot et al., and institutional cohorts from DFCI and UCLA treated with SOC temozolomide plus radiotherapy. Prognostic variables available differed by disease area (GBM had more complete covariates than ES-SCLC). Cox models with study-specific random effects assessed heterogeneity across studies and informed suitability of EC data.

Key Findings

Model-based simulations (Tables 1–2): - Scenario 1 (no unmeasured confounding; EC superior to IC under control): All designs controlled type I error near 5%. Power: ECT 93%, HT 70–73% (depending on r2), RCT 67%. Early futility stops (TE=0): ECT 44%, HT 15–20%, RCT 7%. Average sample size/duration: ECT 93/18, HT 108–112/21–22, RCT 115/23. - Scenarios 2–5 (unmeasured confounding and/or differing conditional distributions): ECT performance degraded. Type I error inflated for ECT: 71% (Scenario 2), >99% (Scenario 4), 15% (Scenario 5), while HT 5–9% and RCT ~5%. In Scenario 3, ECT power dropped to 42% vs HT ~54% and RCT 53%. Across confounded scenarios, HT maintained type I error near nominal with power comparable to RCT. - When the experimental treatment is inferior to SOC (TE<0), HT reduced type I error relative to ECT under confounding and more frequently terminated early for futility than RCT. ES-SCLC in silico (Fig. 2): - Ideal setting (after permuting study labels to remove heterogeneity): ECT most powerful with 94–97% power across studies; HT 65–80%; RCT 43–62%; type I error near 5% for all. - Actual leave-one-study-out resampling (heterogeneity present): ECT inflated type I error up to 59% (GALES). HT type I error remained low (5%, 8%, 5% for CALGB-9732, GALES, Pirker, respectively), indicating the dissimilarity analyses effectively limited EC usage when inappropriate. - For OS outcomes, ECT type I error deviated from 5% (<1% for CALGB-9732 and Pirker, 14% for GALES), while HT maintained ~5% (5.0–5.2%). GBM in silico (Table 3): - With TE=0, all designs had type I error near 5%. Early futility stop rates: ECT 42–50%, HT 24–27%, RCT 6–7%. Average sample sizes: RCT 96; ECT 75–79; HT 86–88. Average study duration shorter for ECT/HT (15–17 months) vs RCT (19 months). - With TE>0, power improvements: ECT 85–92%; HT 73–77% (r2 1:1), 78–82% (r2 1:2), 74–78% (r2 0:1); RCT 58–63%. Average sample size and duration 100 and 20 months for all in positive TE scenarios. Additional findings: - HT’s dissimilarity-guided adaptation curbed ECT’s risks under confounding while preserving efficiency when EC data were suitable. - Strategies to mitigate conditional power loss when EC excluded at final analysis include sample size extension after IA or preselecting randomization ratios to bound conditional power reductions.

Discussion

The findings demonstrate that while externally controlled trials (ECTs) can be highly efficient and powerful under ideal conditions (complete, comparable covariate information; identical conditional outcome distributions), their operating characteristics can degrade severely when assumptions are violated, leading to biased treatment effect estimates and inflated type I error. These assumptions (no unmeasured confounding; consistent measurement/definitions; identical conditional distributions) are difficult to verify prospectively, and single-arm ECTs provide no internal data to detect violations. The proposed hybrid trial (HT) design addresses these challenges by combining randomization with EC data and prospectively conducting dissimilarity analyses (W1/W2) to determine whether and how to leverage EC information at interim and final analyses. This approach preserves robustness—maintaining type I error near nominal and avoiding bias when confounding or distributional differences exist—while enabling efficiency gains (higher power, reduced sample size/duration) when EC data are reliable. In ES-SCLC, where heterogeneity and limited covariates undermine EC suitability, HTs maintained type I error control whereas ECTs did not. In GBM, where EC data were more complete and consistent, HTs achieved meaningful efficiency gains and improved power over RCTs with appropriate control of false positives. HTs also improved interim decision-making, increasing the likelihood of appropriate futility stopping compared with RCTs when true effects were absent. Design choices (e.g., r2,c:r2,e) and safeguards (e.g., sample size extension plans) can be tuned via simulation to balance efficiency with guaranteed conditional power if EC data are excluded at final analysis. Overall, HTs offer a pragmatic compromise between ECTs and RCTs, leveraging EC data when appropriate and defaulting to randomized comparisons when not.

Conclusion

This work introduces a two-stage hybrid trial design that integrates external control data with internal randomization and prospective dissimilarity analyses to guide EC usage at interim and final analyses. Through model-based simulations and in silico evaluations using ES-SCLC and GBM datasets, the study shows that HTs can (i) substantially mitigate the risks of bias and inflated type I error that can affect ECTs under confounding or heterogeneity, and (ii) deliver efficiency gains over RCTs when EC data are suitable. The methodology includes flexible testing options (MSMs or permutation tests), and design tuning to maintain conditional power when EC data are ultimately excluded. Future work includes applying HTs in additional indications, expanding meta-analytic and resampling approaches to scrutinize EC suitability, incorporating richer biomarker data to enable subgroup evaluations with low accrual, and extending the framework to other objectives such as non-inferiority. Ensuring access to contemporaneous, high-quality patient-level EC datasets remains critical, supported by ongoing data sharing initiatives.

Limitations

Key limitations include the relatively small number of ES-SCLC and GBM datasets, which may limit generalizability of operating characteristic estimates; incomplete availability of known prognostic variables in ES-SCLC, hampering adjustments; heterogeneity across ES-SCLC studies (e.g., open-label design, partially randomized study, differing eligibility criteria); and the presence of two SOC regimens (cisplatin- vs carboplatin-based) that may have subtle differences. Under these constraints, ECT type I error was as high as 59% in ES-SCLC. These factors underscore the need for careful assessment of EC data quality and comparability.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Effectiveness of virtual reality therapy in the treatment of anxiety disorders in adolescents and adults: a systematic review and meta-analysis of randomized controlled trials

W. Zeng, J. Xu, et al.

Medicine and Health

Randomised controlled trial and economic evaluation of a targeted cancer awareness intervention for adults living in deprived areas of the UK

Y. Moriarty, M. Lau, et al.

Medicine and Health

Taurine reduces the risk for metabolic syndrome: a systematic review and meta-analysis of randomized controlled trials

C. Tzang, L. Chi, et al.

Engineering and Technology

A framework for the general design and computation of hybrid neural networks

R. Zhao, Z. Yang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny