logo
ResearchBunny Logo
Dual Semi-Supervised Learning for Classification of Alzheimer's Disease and Mild Cognitive Impairment Based on Neuropsychological Data

Computer Science

Dual Semi-Supervised Learning for Classification of Alzheimer's Disease and Mild Cognitive Impairment Based on Neuropsychological Data

F. D. Lorenzo, A. Antonioni, et al.

Discover how Francesco Di Lorenzo and his team developed a groundbreaking dual semi-supervised learning method to classify Alzheimer's disease, mild cognitive impairment, and normal controls using neuropsychological scores, achieving impressive accuracy that could enhance clinical diagnosis.

00:00
00:00
~3 min • Beginner • English
Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder with early-stage mild cognitive impairment (MCI) often converting to AD. While imaging (MRI, PET) and CSF biomarkers are useful, their cost and invasiveness limit widespread clinical screening. There is a need for non-invasive, reliable, and accessible diagnostic biomarkers. Neuropsychological tests are routinely used, low-cost, and may have screening potential comparable to imaging and CSF biomarkers. Deep learning (DL) methods have advanced AD diagnosis but are hindered by limited labeled data, motivating semi-supervised learning (SSL) approaches that can leverage large unlabeled datasets. This study addresses the challenge of scarce labels by proposing a dual semi-supervised learning (DSSL) framework that uses only neuropsychological test scores to classify AD, MCI, and normal controls (NC). The method performs feature selection via Pearson's correlation coefficient (PCC), learns two complementary feature representations via dual encoders, and combines consistency regularization with pseudo-labeling. The work aims to build an accurate, label-efficient, and clinically practical tool for cognitive impairment tri-classification.
Literature Review
The paper reviews DL-based AD diagnosis, highlighting strong performance of CNNs and GCNs on neuroimaging but noting label scarcity and costs. It surveys SSL principles and techniques: - Consistency regularization: models should output similar predictions under perturbations; implemented via sample perturbations (augmentations like mixup, RandAugment/CTAugment) and model perturbations (dropout, temporal ensembling, Mean Teacher). - Pseudo-labeling: converts confident predictions on unlabeled data into hard labels using a confidence threshold to enforce low-entropy predictions. - Label propagation: graph-based SSL to propagate labels based on sample similarity; can generate pseudo-labels or pairwise losses (e.g., SimPLE). - Contrastive/self-supervised learning: learns representations that cluster same-class samples and separate different classes, integrated into SSL (e.g., CCSSL, LaSSL). Prior work applied ML to neuropsychological tests (e.g., SVM) and multimodal imaging with GCNs, but neural network SSL using only neuropsychological data remains underexplored. This motivates an SSL method tailored to tabular neuropsychological features without image-specific augmentations.
Methodology
Dataset: ADNI-1 baseline neuropsychological data from 819 subjects: 188 AD, 402 MCI, 229 NC. Features: 64 itemized scores from seven tests: ADAS-Cog (15), MMSE (31), CDR (1), RAVLT (4), FAQ (11), NPIQ (1), GDS (1). Each itemized score is a feature. Feature selection: Pearson's correlation coefficient (PCC) computed between each feature and diagnostic label; features ranked by absolute PCC; top 15 selected for modeling. High-correlation features include CDR-SB, MMSE total, ADAS totals/subscores, FAQ total and key items, RAVLT immediate and percent forgetting. Model: Dual Semi-Supervised Learning (DSSL). - Architecture: Two distinct encoders (Encoder1, Encoder2) process the same input to produce two feature vectors f1 and f2. The encoders differ primarily in pooling operations (e.g., max vs average pooling). Each encoder is followed by a shared or separate MLP classifier to yield predictions q1 and q2. - Difference regularization (RD): Encourages f1 and f2 to capture complementary features by widening their (normalized) distance using Frobenius norm: RD = ||Norm(f1) − Norm(f2)||_F. - Consistency regularization with pseudo-labeling: Treats the hard label from one branch as the pseudo-label for the other if its max probability exceeds a confidence threshold τ. Cross-entropy computed in both directions (q2 supervises q1 and vice versa). Loss: Total objective lT = lx1 + lx2 + λ(lu1 + lu2) + βRD, where lx1 and lx2 are supervised cross-entropy losses on labeled data for each encoder branch; lu1 and lu2 are unsupervised consistency losses using pseudo-labels; λ and β are weights (λ set to 1). Training protocol: - Labeled set sizes: two regimes with 60 or 120 labeled subjects; remaining training samples treated as unlabeled. - 5-fold cross-validation: random split into 5 folds; one for testing and four for training; results averaged. - Optimizer: Adam. Exponential moving average (EMA) of parameters with decay 0.999 used to stabilize convergence. - Confidence threshold τ: explored over [0, 0.99] to study trade-off between passing rate (fraction of unlabeled samples exceeding τ) and impurity rate (error rate among passed samples). Optimal τ identified empirically for best accuracy. - Difference regularization weight β: tuned; model robust across values with best performance around β = 2 in 60-label regime. - Data augmentation: For fairness in comparisons on tabular data, mixup is used as augmentation where needed for baselines; image-specific strong augmentations are not used. - Hardware: PC with 2.0 GHz 8-core CPU, 8 GB RAM, Windows 10; end-to-end training per experiment < 3 minutes. Evaluation metrics: Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), Recall (REC), and F1-score. Stability assessed via variance across 100 random selections of labeled sets.
Key Findings
- Feature selection: 15 top features (by absolute PCC) strongly correlated with diagnosis; global scores (e.g., CDR-SB 0.828, MMSE total 0.767, ADAS totals ~0.744/0.691, FAQ total 0.692) outperform most sub-scores; FAQ and RAVLT metrics also important. - Main performance (5-fold CV): - With 60 labeled subjects: DSSL ACC 85.47%, SEN 83.77%, SPE 84.14%, REC 91.82%, F1 81.92%. - With 120 labeled subjects: DSSL ACC 88.40%, SEN 86.99%, SPE 87.07%, REC 93.20%, F1 85.53%. - Comparison to baselines (60 labels): DSSL (85.47% ACC) outperformed MixMatch (77.29%), FixMatch (81.44%), SimPLE (80.34%), CCSSL (81.07%), LaSSL (79.47%). Similar superiority observed with 120 labels (DSSL 88.40% vs best baseline 85.10%). - Stability: Over 100 random label selections, DSSL had the lowest variance (60 labels: 2.91; 120 labels: 2.30) among methods, indicating higher robustness; variance generally decreased with more labeled data. - Dual-encoder design: Using different pooling types between Encoder1 and Encoder2 improved performance; Max+Avg pooling achieved best results (ACC 85.47% at 60 labels; 88.40% at 120 labels), outperforming same-structure pairs. - Confidence threshold τ: Accuracy peaked near τ ≈ 0.90–0.97 in the 60-label regime; higher passing rate increased impurity rate, reflecting expected trade-off. - Data size effect: Performance improved with more training data and plateaued when training set size exceeded ~500 samples. - Practicality: Training time remained under ~3 minutes on modest CPU hardware, supporting clinical usability without imaging data. - Visualization and interpretability: t-SNE showed predicted classes aligning with true labels; SHAP analysis indicated the two encoders learned different contributing features, with stronger impacts for AD and NC than MCI.
Discussion
The proposed DSSL framework effectively addresses label scarcity by leveraging unlabeled neuropsychological data through dual encoders and complementary regularization. Learning two distinct feature representations enhances model perturbation and supports stronger consistency constraints via cross-branch pseudo-labeling, improving accuracy and stability in AD/MCI/NC tri-classification. The approach demonstrates that neuropsychological tests alone, when paired with SSL, can rival more invasive or costly biomarker-based methods for screening. Feature selection via PCC identified clinically meaningful tests (CDR, MMSE, ADAS, FAQ, RAVLT) that strongly correlate with cognitive impairment severity, potentially guiding clinical assessment. Compared to state-of-the-art SSL baselines adapted to tabular data, DSSL yields superior performance across multiple metrics and exhibits lower variance across random label selections, indicating robustness. Computational efficiency and lack of reliance on imaging further support applicability in routine clinical settings. Nonetheless, the reliance on a fixed confidence threshold affects pseudo-label quality and may vary across data splits; automating threshold adaptation could further improve consistency. SHAP analyses confirm the dual encoders capture different facets of the data, though medical interpretability of learned representations remains limited and might benefit from expert-informed constraints.
Conclusion
This work introduces DSSL, a dual semi-supervised framework for tri-classifying AD, MCI, and NC using only neuropsychological test scores and limited labels. By selecting 15 highly correlated features and combining difference regularization with consistency regularization via cross-branch pseudo-labeling, DSSL achieves strong and stable performance (ACC up to 88.40% with 120 labels), outperforming multiple SSL baselines while training quickly on modest hardware. The approach offers a practical, non-invasive tool to aid clinical screening. Future work includes applying DSSL to multimodal biomarkers (MRI, PET), exploring alternative encoder architectures, automating confidence threshold selection, and enhancing medical interpretability by integrating expert knowledge to refine learned representations.
Limitations
- Dependence on a fixed confidence threshold (τ) for pseudo-labeling; optimal τ varies across splits and requires tuning, affecting pseudo-label quality and performance. - Limited medical interpretability of learned feature representations; SHAP indicates differing contributions, but alignment with disease pathology is not fully established. - Use of only neuropsychological test data; while practical, it may miss complementary information available from imaging/CSF. - Dual-encoder design increases model complexity and training time compared to single-branch methods (though still under ~3 minutes on CPU).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny