Psychology
Brain markers predicting response to cognitive-behavioral therapy for social anxiety disorder: an independent replication of Whitfield-Gabrieli et al. 2015
Y. K. Ashar, J. Clark, et al.
Discover groundbreaking research conducted by Yoni K Ashar and colleagues, revealing insights into the predictive power of amygdala connectivity for treatment response to cognitive-behavioral therapy in social anxiety disorder. While the original model showed promise, this independent replication highlights the need for greater explanatory strength in clinical applications.
~3 min • Beginner • English
Introduction
The study addresses the replicability of predictive neuroimaging biomarkers, focusing on a previously reported resting-state fMRI amygdala connectivity marker that predicted response to cognitive-behavioral therapy (CBT) in social anxiety disorder (SAD) (Whitfield-Gabrieli et al., 2015). Despite hundreds of proposed brain-based predictive markers, few have been validated on independent datasets, raising concerns about their generalizability and clinical utility. SAD has repeatedly been associated with alterations in amygdala function and connectivity, and several reports suggest amygdala features predict or change with CBT. The primary aim here was to perform an independent replication of the precisely specified original model predicting CBT response from baseline amygdala-seeded resting-state connectivity, and secondarily to explore a more flexible model to understand amygdala connectivity’s contribution to treatment response in a new dataset. The broader motivation is to advance replicable, generalizable predictive models suitable for scientific and clinical applications.
Literature Review
Prior work indicates that only about 10% of roughly 450 published predictive brain markers have been tested on independent data, with very few undergoing broader tests of generalizability. SAD is consistently characterized by heightened amygdala reactivity to social threat and altered resting connectivity, and CBT has been associated with amygdala-related functional and structural changes correlating with clinical improvement. Parallel to Whitfield-Gabrieli et al., at least two other studies reported amygdala-based predictors of CBT response in SAD; however, none had undergone independent replication. This literature highlights amygdala relevance but leaves the replicability and generalizability of specific predictive models uncertain.
Methodology
Design and datasets: An independent replication was conducted using an existing SAD dataset (NCT00380731) that included baseline resting-state fMRI and subsequent CBT. The dataset was similar to the original in key respects but differed in some sample and treatment characteristics (e.g., individual vs group CBT, scan parameters, and recruitment region). The original study randomized to D-cycloserine versus placebo adjuncts, with no differences between drug conditions; no medication was administered in the replication study.
Participants: Recruited 2007–2010 via referrals and web listings. Inclusion: principal SAD diagnosis (ADIS-IV), greater than moderate fear in five or more social situations, LSAS ≥ 60, ADIS clinical severity rating ≥ 4, unmedicated ≥ 1 year, right-handed, MRI safe. Exclusion: current/past CBT, neurological disorders, most current psychiatric conditions except GAD, agoraphobia without panic attacks, specific phobia, panic disorder, or dysthymia. Final analysis sample n = 42 (n = 25 CBT-immediate; n = 17 CBT post-waitlist), including those completing ≥ 12/16 CBT sessions with last observation carried forward. For completeness, analyses were also performed in the CBT-immediate subset alone (n = 25).
Procedures and treatment: After baseline assessment including fMRI, participants were randomized (biased coin) to immediate individual CBT (n = 38 initially) or waitlist (n = 37) who later received the same CBT. CBT was 16 weekly individual sessions over ~4 months using the Heimberg manual (“Managing Social Anxiety: A Cognitive-Behavioral Therapy Approach”), delivered by four trained therapists with verified adherence.
Outcome measure: Primary clinical outcome was LSAS self-report; treatment response defined as pre-to-post change in LSAS (ΔLSAS). Internal consistency in this study: Cronbach’s alpha = 0.91.
Imaging acquisition (replication dataset): GE 3T Signa; custom quadrature head coil. Resting-state run: 5 minutes fixation, 200 volumes, 22 axial slices; TR = 1.5 s, TE = 30 ms, flip = 60°, FOV = 22 cm, matrix 64×64, single-shot, voxel size ≈ 3.438 mm² × 4.5 mm. High-resolution anatomical scans via fast spin-echo spoiled gradient recall. Original dataset used Siemens Trio Tim, 6-minute resting-state, TR = 6 s, 2 mm isotropic voxels.
Preprocessing: Conn toolbox (wrapping SPM12). Steps: slice-timing correction, motion estimation/realignment, normalization to MNI space, 8 mm FWHM spatial smoothing. Nuisance regression: six motion parameters + derivatives, first three components of white matter and CSF, and ART spike regressors (outlier defined as displacement ≥ 0.5 mm or global intensity ≥ 3 SD). Residual BOLD band-pass filtered (0.01–0.10 Hz). Motion was acceptable: mean FD = 0.13 mm (median = 0.06, SD = 0.13).
Predictive model specification: From personal communications with original authors, an amygdala-seeded connectivity term was constructed by averaging mean connectivity between amygdala and: (1) one positive connectivity cluster (subgenual cingulate/caudate/putamen), (2) two bilateral central sulcus clusters (negative), and (3) one right temporal-occipital cluster (negative). The combined term was Fisher-transformed and Z-scored across subjects. The original final model (with intercept) was: ΔLSAS = 0.6194 × baseline LSAS + 8.6290 × amygdala connectivity term. For replication, the same model terms and coefficients were used but the intercept was dropped, and all variables were mean-centered to avoid intercept-driven differences across studies. Dependence between baseline LSAS and ΔLSAS was evaluated; OLS parameter β for baseline LSAS in the replication data (≈0.66) was very close to the original (0.62), so the original coefficient was retained for the primary replication test.
Alternative models not applied: Original MVPA-based fMRI predictor could not be tested due to incomplete inferior cerebellar coverage. DTI predictors could not be tested because DTI was not acquired in the replication dataset.
Model assessment and statistics: Performance was compared between the full model (baseline LSAS + amygdala connectivity term) and a compact model (baseline LSAS only). Metrics: (1) prediction R² = 1 − NMSE, where NMSE is MSE(full)/MSE(compact); and (2) model-based R² (squared Pearson correlation between observed ΔLSAS and full-model predictions; likewise for the compact model). Statistical significance was assessed by permuting the amygdala connectivity term across subjects 10,000 times to form a null distribution; significance was evaluated against the 95th percentile. Analyses used MATLAB CanlabCore toolbox; data and code are publicly available at the provided GitHub repository.
Additional flexible replication: A GLM was also estimated in the replication data to obtain new coefficients for the same predictors, testing whether the amygdala connectivity term significantly predicts ΔLSAS controlling for baseline LSAS.
Key Findings
- Baseline symptom prediction: In the original report, baseline LSAS alone explained 12% of variance in ΔLSAS. In the replication dataset, baseline LSAS explained 20% of variance. The replication OLS coefficient for baseline LSAS was β = 0.66 (vs original 0.62), yielding negligible improvement over using the original coefficient (<1%, prediction R² = 0.0012), so the original β was used for replication tests.
- Primary replication of amygdala connectivity term: In the original report, adding the amygdala connectivity term increased variance explained to 33% (an additional 21%). In the replication, adding the same amygdala connectivity term provided a small improvement of about 2% (prediction R² = 0.016; model-based R² = 0.020). Significance was marginal: model-based p = 0.097; prediction R² p = 0.101. Distributions of model terms were approximately normal; results were not driven by outliers.
- Flexible GLM in replication data: Controlling for baseline LSAS, the amygdala connectivity term was not a significant predictor of treatment response: β_amyg_conn = 4.78, t(40) = 1.31, p = 0.20.
- CBT-immediate subset (n = 25): The model did not predict treatment response in this subset. Baseline LSAS coefficient was similar to the original (β ≈ 0.65), but adding the amygdala connectivity term did not improve prediction.
- Overall effect sizes: The replication effect size was substantially smaller than the original (≈2% vs 21% additional variance explained).
Discussion
The independent replication yielded a positive but marginal and small prediction effect for the amygdala connectivity marker when added to baseline LSAS, in stark contrast to the substantial effect reported originally. Baseline symptom severity robustly predicted treatment-related change, consistent with regression-to-the-mean dynamics and prior findings. The amygdala connectivity term contributed minimally and non-significantly when coefficients were re-estimated in the replication dataset, suggesting that the originally reported effect may not generalize strongly across datasets and study conditions.
Several factors could underlie the reduced effect: differences in CBT format (individual vs group), scanning hardware and acquisition parameters (GE vs Siemens, TR 1.5 s vs 6 s, voxel sizes), participant demographics and recruitment regions, and the presence of adjunctive D-cycloserine/placebo in the original trial (though randomization occurred after baseline imaging). Additionally, the replication could not test the original MVPA and DTI predictors due to data limitations. Nonetheless, converging literature implicates amygdala features in SAD and treatment response, supporting the relevance of amygdala circuitry. However, for clinical translation, predictive models must demonstrate robust, replicable performance with substantial variance explained in independent samples.
Taken together, the findings underscore the importance of preregistered, precisely specified models tested on independent datasets, transparent sharing of code and masks, and collaborative replication efforts to build cumulative, clinically meaningful predictive neuroscience.
Conclusion
This study provides an independent replication of a resting-state amygdala connectivity marker for predicting CBT response in SAD. While the model showed a marginally positive effect in the replication dataset, the magnitude was small (~2% additional variance explained) compared to the original report (~21%), and was non-significant under a flexible GLM re-estimation and in the CBT-immediate subset. The work highlights both the promise and the current limitations of brain-based predictive markers, emphasizing the need for robust, generalizable models validated across independent datasets. Future research should: (1) conduct further multi-site replications with harmonized acquisition and treatment protocols; (2) explore model refinements and multimodal predictors (e.g., combining resting-state, task-fMRI, and structural/DTI where available); (3) evaluate clinical utility using prospective validation and decision-analytic metrics; and (4) continue transparent data/mask/code sharing to facilitate cumulative science.
Limitations
- Differences between original and replication datasets: group vs individual CBT; adjunctive D-cycloserine/placebo in the original (none in replication); recruitment regions (Boston vs San Francisco Bay Area); demographics and sample characteristics; medication washout duration (≥2 weeks vs ≥1 year).
- Imaging hardware and acquisition differences: Siemens Trio vs GE Signa; TR 6 s vs 1.5 s; resolution differences; head coils; scan duration (6 vs 5 minutes).
- Model constraints: Replication omitted the intercept to avoid study-specific intercept effects; dependence between baseline LSAS and ΔLSAS complicates interpretation.
- Data limitations: Inability to replicate MVPA and DTI predictors (cerebellar coverage incomplete; no DTI collected), potentially missing stronger predictive signals reported originally.
- Sample size and composition: Modest N (n = 42) and inclusion of post-waitlist CBT participants may introduce heterogeneity in non-specific effects; last observation carried forward could affect variance and estimates.
- Generalizability: Findings may not generalize beyond the specific clinical protocol, imaging parameters, and populations studied.
Related Publications
Explore these studies to deepen your understanding of the subject.

