Introduction
The development of disease-modifying therapies for multiple sclerosis (MS) has largely focused on the focal inflammatory aspects of relapsing-remitting MS (RRMS), leveraging short, small phase 2 trials with MRI endpoints. This approach has been successful in establishing proof-of-concept and optimal dosing before proceeding to larger, more expensive phase 3 trials. However, the absence of analogous MRI endpoints for disability progression independent of relapses has hindered the development of drugs targeting this crucial aspect of the disease, particularly in progressive forms of MS where progressive biology plays a dominant role. Brain atrophy has been explored as a biomarker, but its predictive ability for disability progression in phase 3 trials remains uncertain. The high cost and risk associated with directly proceeding to large phase 3 trials have resulted in many programs failing to demonstrate efficacy. Predictive enrichment, a technique to select a subgroup of patients likely to respond to treatment for clinical trial inclusion, offers a solution. By enriching trials with predicted responders, the power of smaller trials increases, preventing efficacious medications from being diluted by heterogeneity and reducing risks for patients unlikely to benefit. This has been demonstrated successfully in RRMS with Cox Proportional Hazards (CPH) models. Deep learning, with its capacity to uncover complex relationships between patient characteristics and treatment responsiveness, presents a powerful tool for predictive enrichment. However, in treatment response tasks, the treatment effect is not directly observable, requiring adaptations to traditional machine learning frameworks through causal inference techniques. This study presents a new deep learning framework to estimate individual treatment effects on disability progression in MS using readily available clinical and MRI data, aiming to improve the feasibility and efficiency of clinical trials for progressive MS.
Literature Review
Previous research has explored using Cox Proportional Hazards (CPH) models to predict responsive subgroups in MS clinical trials. For instance, Bovis et al. (2019) successfully used CPH models to identify a more responsive subgroup of RRMS patients to laquinimod. The application of machine learning, specifically deep learning, to this problem is relatively novel. While traditional machine learning methods often focus on predicting outcomes directly, treatment effect estimation requires considering counterfactual outcomes. Tree-based methods and meta-learning approaches are some of the most popular in the literature. Durso-Finley et al. (2022) used a meta-learning approach to estimate treatment effect (measured by the suppression of new/enlarging T2 lesions) in RRMS. Other studies have investigated biomarkers for predicting treatment response in MS, with varying degrees of success. Brain atrophy has been commonly used as a surrogate marker, but its correlation with clinical disability progression remains uncertain. Studies examining the relationship between lesion burden (T2 and Gad lesions) and disability progression have yielded mixed results, hindering the development of reliable prediction models.
Methodology
This study utilizes data pooled from six randomized clinical trials (OPERA I, OPERA II, BRAVO, ORATORIO, OLYMPUS, and ARPEGGIO), encompassing both RRMS and primary progressive MS (PPMS) patients (n=3830). The data is divided into three subsets: RRMS data for pre-training (n=2520), PPMS data for fine-tuning (n=695), and held-out PPMS test sets for anti-CD20 antibodies (n=297) and laquinimod (n=318). A multi-headed MLP is employed to estimate the CATE. The model has a shared trunk and two output heads predicting outcomes for treatment and placebo arms, with their difference representing the CATE. The model is pre-trained on the RRMS data using a transfer learning approach before fine-tuning on the PPMS training set. The change in Expanded Disability Status Scale (EDSS) over time, obtained from a linear regression model fit to individual EDSS values, is used as the outcome. The model’s performance is evaluated using the average difference curve (AD(c)) and the ADwabc metric, which assess its ability to rank individuals based on treatment responsiveness. Kaplan-Meier curves are used to visualize the survival probabilities for different subgroups (responders and non-responders) defined by various prediction thresholds. Comparisons are made to baseline models including ridge regression and CPH models, and a prognostic model that predicts response based solely on the prediction of placebo progression. Finally, the study simulates phase 2 clinical trials with predictive enrichment to assess the impact on sample size.
Key Findings
The fully trained MLP model effectively ranked patients based on their responsiveness to anti-CD20 antibodies, as indicated by a positive and relatively large ADwabc (0.0565). Predictive enrichment, using the top 50% and 30% of predicted responders, resulted in significantly lower hazard ratios (HRs) for time-to-24-week confirmed disability progression (CDP24) compared to the entire test set (HR 0.492, p=0.0218 and HR 0.361, p=0.008 respectively). The model generalized well to the laquinimod test set (ADwabc=0.0211), achieving significant HR reduction with enrichment (HR 0.492, p=0.0803 and HR 0.338, p=0.0186 for the top 50% and 30% responders, respectively). Subgroup analysis revealed better model performance in women, older patients (≥51), those with longer disease durations (≥5 years), and those with lower baseline EDSS scores (<4.5). Predicted responders were characterized by younger age, shorter disease duration, higher baseline disability scores, and greater lesion activity (particularly T2 lesion volume). The MLP outperformed baseline models (ridge regression, CPH) on the ADwabc metric, although some baselines performed comparably on specific datasets. Simulation of phase 2 trials with predictive enrichment demonstrated substantial reductions in sample size needed to detect a significant effect (up to sixfold reduction for a two-year trial using the top 50% of predicted responders). Using a traditional phase 2 approach with brain atrophy as the primary outcome did not reveal a significant treatment effect for anti-CD20-Abs.
Discussion
This study addresses the critical challenge of identifying predictive biomarkers for disability progression in MS, which has hampered the development of effective therapies. The deep-learning approach presented significantly improves the efficiency of early clinical trials. The model's ability to identify responders to both anti-CD20 antibodies and laquinimod, drugs with different mechanisms of action, suggests the presence of disease-agnostic predictors of response. The enrichment of predicted responders in several baseline features provides insights into patient characteristics associated with greater treatment benefit. The superior performance of the non-linear model compared to linear models highlights the complex relationship between baseline features and treatment effect. The promising results from simulation studies support the use of predictive enrichment for designing efficient phase 2 trials. The findings underscore the potential of deep learning in personalized medicine and suggest the feasibility of smaller, shorter, and more cost-effective clinical trials for progressive MS.
Conclusion
This study demonstrates the potential of a deep-learning model for predictive enrichment in MS clinical trials. The model successfully identified responders to different treatments, significantly improving the efficiency of trial design. Future research should focus on validating these findings in independent datasets, exploring additional biomarkers (e.g., voxel-level MRI data), and investigating the long-term effects of treatment in patients with varying predicted responses. The application of this method could accelerate the development of effective treatments for progressive MS.
Limitations
The interpretability of the deep learning model remains a limitation. The model, while outperforming linear baselines, is a black box and further analysis may be needed to elucidate how it reaches predictions. The study's reliance on existing clinical trial data may limit generalizability to other populations or settings. The pre-training strategy, while successful, might be further optimized. Further research is needed to determine if predictors of response differ depending on the stage of disease and sex. Finally, the study has not studied whether minimal predicted effect patients could benefit after longer durations of medication.
Related Publications
Explore these studies to deepen your understanding of the subject.