logo
ResearchBunny Logo
A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

Medicine and Health

A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

C. Wang, H. Tachimori, et al.

This innovative study, conducted by Caihua Wang, Hisateru Tachimori, Hiroyuki Yamaguchi, Atsushi Sekiguchi, Yuanzhong Li, and Yuichi Yamashita, presents a groundbreaking AI-driven method to enhance the randomization process in Alzheimer's disease clinical trials, significantly reducing participant allocation bias and trial size. Discover how they utilized a multimodal deep learning model to predict cognitive decline accurately!

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses persistent failures of Alzheimer’s disease (AD) clinical trials to demonstrate efficacy on cognitive endpoints, particularly CDR-SB change in patients with MCI/prodromal AD. A central challenge is the large inter-individual variability in rates of cognitive decline, which leads to two issues: inclusion of many slow/non-decliners diminishes observable treatment effects, and randomization can inadvertently allocate different proportions of fast/slow decliners to treatment vs placebo, introducing allocation bias that over- or underestimates treatment effects. Existing stratification on individual risk factors (e.g., ApoE ε4, Aβ42, pTau) offers limited predictive power for cognitive decline. The authors hypothesize that using AI-predicted cognitive decline as a stratification index in randomization can reduce allocation bias, improve power, yield more reliable effect estimation, and ultimately increase trial efficiency.
Literature Review
The paper situates its contribution within several strands of work: (1) high failure rates of AD drug development and challenges in demonstrating effects on cognitive decline; (2) prior attempts to enrich or select participants (e.g., focusing on high-risk groups) and to predict disease progression using machine learning and multimodal data (MRI, biomarkers, cognitive scores); (3) limited use of single biomarkers (e.g., ApoE ε4) for stratified randomization in recent trials, which only modestly correlates with cognitive decline; and (4) recent recognition of allocation bias in cognitive decline as a major source of over-/underestimation of treatment effects, with calls to leverage AI for bias reduction. However, embedding AI-predicted decline directly as a stratification index and quantifying its impact on allocation bias and trial efficiency had not been systematically investigated.
Methodology
Data: 1194 longitudinal samples from 506 participants were extracted from the NA-ADNI (ADNI-1, GO, 2) using inclusion criteria aligned with recent trials. Baseline multimodal data included T1-weighted MRI, demographics, cognitive scores (CDR-SB, MMSE, ADAS-cog), and biomarkers (Aβ42, pTau, ApoE ε4). Tenfold cross-validation with held-out test folds produced participant-level predictions at baseline for simulation. Image preprocessing: 3D T1 MRI were registered to MNI152 NLIN 2009a space using a landmark-based initial linear alignment (landmarks detected via a Faster R-CNN-like approach) followed by mutual-information-based registration. Example-based local intensity normalization was applied to mitigate site/scanner intensity biases. Skull stripping used a 3D V-net. Region extraction: Normalized, skull-stripped images were used to extract fixed-size (64×64×64) segments for left/right hippocampus and left/right anterior temporal lobe based on atlas-defined coordinates. DNN feature extraction: Two DenseNet3D backbones (hippocampus pair; anterior temporal lobe pair) extracted imaging features. Training employed a multitask loss combining: (1) regression (MAE) for CDR-SB change prediction, (2) binary classification (decliner vs non-decliner) with cross-entropy, and (3) an image-recovery loss. Left/right homologous regions were paired to exploit symmetry. Global average pooled features from each network were taken as imaging features. Final predictor: Due to limited sample size, a hybrid approach was used. PCA reduced each segment’s features to one dimension. Reduced imaging features were concatenated with baseline non-imaging features (age, CDR-SB, MMSE, ADAS-cog, ApoE ε4 genotype-coded risk, Aβ42, pTau), quartile-normalized, and fed to a linear SVR to output continuous predicted 2-year CDR-SB change. Randomization simulations: Two-arm trials (treatment vs placebo) with block randomization were simulated. Methods compared: (a) non-stratified randomization; (b) stratified randomization using one of: age, ApoE ε4, Aβ42, pTau, baseline CDR-SB, MMSE, ADAS-cog; (c) stratified randomization using AI-predicted CDR-SB change; and (d) an oracle reference using actual (ground-truth, GT) CDR-SB change for stratification. Participants (n=500 per simulated trial) were allocated using equal-sized blocks; for stratification, participants were binned into strata based on the chosen index and randomized within strata to maintain 1:1 balance. For some analyses, distributions were generated from 10,000 simulation runs; for SAE estimation, 1000 repetitions were also used as specified. Allocation bias was defined as the between-arm difference in mean actual CDR-SB change. Standard allocation error (SAE) and PES: For each method, the SD of the allocation-bias distribution was the SAE. The 95% range of the possible effect size (PES) under a null treatment was approximated as ±1.96×SAE. Sample size requirement: For each method, simulations across sample sizes (500–3000) estimated SAE as a function of N to find the minimal N needed to keep the 95% PES within a given bound (e.g., ±0.3 in CDR-SB change). Treatment-effect detection: Trials with true treatment effects (10–50% proportional reduction of GT CDR-SB change in the treatment arm) were simulated (n=500) with 10,000 repetitions per effect size and method. Two-sample t-tests (two-tailed unless otherwise specified) yielded P-value distributions and detection rates at α=0.05. Multi-phase trial simulation: A proof-of-concept phase (N=500, one-tailed α=0.1) was followed by an effect-confirmation phase (α=0.05) whose sample size was estimated from the observed effect size of the POC phase to achieve stringent power (targeting P=0.01 during planning, then tested at 0.05). Success required significance in both phases. Simulations compared non-stratified vs AI-based stratified randomization across true effect sizes.
Key Findings
- Prediction performance: For 506 baseline samples used in simulations, the AI model achieved MAE ≈ 1.07 and correlation ≈ 0.58–0.60 with actual 2-year CDR-SB change. Inner-sample subset (n=481) MAE=0.892, r=0.620; outer extreme subset (n=25) MAE=4.567, r≈−0.036, indicating larger errors for the most rapidly progressing cases. - Allocation bias reduction (N=500): The standard allocation error (SAE) decreased from 0.1704 (non-stratified) to 0.1322 with AI-based stratified randomization (≈22.4% reduction). Among conventional indices, ADAS-cog achieved ≈10.5% SAE reduction vs non-stratified; the oracle GT-based stratification reduced SAE by ≈73.9% (upper bound of what perfect prediction could achieve). - Prior effect size (PES) narrowing: Because PES ≈ ±1.96×SAE under null treatment, AI-based stratification substantially narrowed the 95% PES range compared to non-stratified randomization, improving interpretability and reliability of trial outcomes. - Sample size savings: To constrain the 95% PES to approximately ±0.3 CDR-SB change, non-stratified randomization required roughly 651 participants, whereas AI-based stratification required about 391—a ≈37% reduction. Age, CDR-SB, ApoE ε4, Aβ42, pTau, MMSE each yielded modest savings (~2–8%), while ADAS-cog yielded ~15–19%. Oracle GT stratification enabled ≈84% reduction. - Treatment-effect detection: AI-based stratification both reduced over-detection when true effects were small and improved detection when true effects were sufficiently large. Example (N=500, α=0.05): at a true 38% effect, detection improved from 52.6% (non-stratified) to 64.9% (AI), and to 98.5% with oracle GT. At a true 20% effect, detection was 49.4% (non-stratified), suppressed to 37.3% (AI), and to 1% (GT), limiting spurious positives around the detection threshold. - Multi-phase trials: With a proof-of-concept phase followed by a confirmatory phase sized from observed effects, AI-based stratification improved overall success when true effects exceeded ~18%. At a 30% true effect, overall success rose from 52.1% (non-stratified) to 60.8% (AI).
Discussion
Using AI-predicted cognitive decline as a stratification index directly targets the principal source of randomization-induced bias in AD trials: unequal allocation of fast vs slow decliners across arms. This approach meaningfully reduced allocation bias (SAE) compared to non-stratified randomization and outperformed stratification based on individual risk factors or baseline cognitive scores. Narrowing the prior PES range improves the interpretability of trial outcomes and reduces the chance that early-phase overestimation of treatment effects leads to underpowered later-phase trials. Conversely, suppressing detections when true effects are small can prevent advancing weak candidates, improving resource allocation across phases. Simulations further suggest that while conventional enrichment by risk factors can modestly reduce bias, AI-based stratification more directly and effectively balances expected decline, improving power and reducing required sample sizes. The oracle GT condition highlights the headroom for further gains if predictive models of cognitive decline improve. Overall, embedding AI prediction into randomization procedures appears to be a practical route to more efficient, reliable AD trials using cognitive endpoints.
Conclusion
The study introduces and evaluates an AI-based stratified randomization framework that uses predicted CDR-SB change at baseline to balance expected cognitive decline across trial arms. On ADNI-based simulations, the method reduced allocation bias by ~22%, narrowed the null-effect PES range, and enabled ~37% sample size reduction for a representative PES target, with improved detection of moderate-to-large true treatment effects and reduced spurious detection of small effects. These findings indicate that AI-driven stratification can materially improve the efficiency and reliability of AD clinical trials using cognitive decline endpoints. Future work should (1) validate the approach on prospective and real trial data; (2) extend to other primary endpoints (e.g., biomarker changes), multi-arm and multi-index stratification, and adaptive designs; and (3) improve predictive accuracy (e.g., robust losses, larger diverse datasets, and advanced architectures) to approach the oracle upper bound.
Limitations
- Generalizability: All analyses used NA-ADNI data, which primarily includes participants of European ancestry; results may not generalize to broader populations or other cohorts. - External validation: Effectiveness was assessed via simulation; validation on actual clinical trial randomizations and outcomes is needed. - Endpoint scope: Many phase II trials use non-cognitive endpoints; new predictive models are required for those outcomes. - Design scope: Simulations focused on two-arm trials and single-index stratification; performance in multi-arm, multi-index, or adaptive randomization requires further study. - Model performance: Prediction errors were larger for extreme decliners; targets are noisy; limited training data constrained performance. More robust loss functions and larger, more diverse datasets (and potentially newer architectures) are needed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny