A recent clinical trial comparing psilocybin therapy (PT) and escitalopram treatment (ET) for major depressive disorder (MDD) showed that 14 out of 16 efficacy outcome measures favored PT with >95% confidence. However, the Quick Inventory of Depressive Symptomatology, Self-Report, 16 items (QIDS-SR 16), which was the pre-registered primary outcome measure, did not show a significant difference between the two treatments. This discrepancy raises questions about the QIDS-SR 16's ability to detect true between-condition differences and its suitability for measuring treatment response in depression trials. The high societal burden of depression emphasizes the need for valid assessment tools to improve the accuracy and effectiveness of research and treatment. Many current depression rating scales rely on sum-scoring all items, treating them as a single dimension, which can be problematic if the items lack internal consistency and specificity to the core symptoms of depression that are most strongly related to psychosocial impairment. The QIDS-SR 16, while convenient and widely used, may be subject to such limitations. This study aims to investigate the reasons for the discrepant findings between the QIDS-SR 16 and other measures in the original trial and to provide a more comprehensive analysis of the treatment response using alternative approaches.
Literature Review
The QIDS-SR 16, a shortened version of the Inventory of Depressive Symptoms (IDS-SR), was developed to reduce patient burden and align more closely with DSM criteria for MDD. However, some researchers argue that the DSM definition of depression may not fully capture a core, causally central depression factor strongly linked to psychosocial impairment. The IDS, and subsequently the QIDS-SR 16, may therefore miss an opportunity to focus on core dimensions of depression. While the QIDS-SR 16 has demonstrated good validity in some domains, previous research has highlighted concerns about its test-retest reliability and the high number of compound items, which can lead to increased variability in scores. The use of sum-scores without considering the multidimensional nature of depression can mask relative improvements in some symptoms due to poor improvement in other less clinically relevant domains. Current models of depression support a more granular approach, acknowledging the heterogeneity and complexity of symptoms and their interactions.
Methodology
This study utilizes data from the Carhart-Harris et al. (2021) clinical trial, which randomized 59 MDD patients to either PT (n=30) or ET (n=29). Patients completed self-report questionnaires and clinician-rated interviews at baseline, 5 weeks, and 6 weeks post-treatment. The study employed two main analytical approaches. The first examined the psychometric functioning of the QIDS-SR 16 relative to other depression scales (MADRS, HRS, BDI IA) and an anhedonia scale (SHAPS). Linear mixed effects (LME) models were used to assess between-condition differences in item score changes over time. The second approach used a more granular analysis, employing Ballard et al.’s (2018) factor structure to examine depression facets and deriving a single core depression factor using exploratory factor analysis (EFA). LMEs were again employed to examine differential treatment responses across these facets and the core factor. Treatment response expectancies were also measured. To ensure comparability, item scores across scales were standardized.
Key Findings
The analysis revealed several potential reasons for the discrepancy between the QIDS-SR 16 and other scales. The QIDS-SR 16 exhibited higher variance and standard error than other scales, suggesting possible measurement imprecision. A rational analysis of item content showed that some QIDS-SR 16 items were less sensitive to between-condition changes than comparable items from other scales. This insensitivity might stem from imprecise phrasing or compound items that combine multiple symptoms, obscuring differential responses. The QIDS-SR 16 also lacked items related to specific symptoms (e.g., anhedonia, guilt, sexual dysfunction) that were differentially responsive to PT. Examination of compound criteria in the QIDS-SR 16 revealed inconsistencies in the highest-scored item across timepoints in a substantial portion of patients. Item-level analysis identified symptoms strongly favoring PT, particularly those related to energy level, amotivation (including libido), and anhedonia. Facet-level analysis showed significant differential treatment response favoring PT in depressed mood and anhedonia, but not in other facets. Analysis of a core depression factor, derived from a factor analysis across four scales, also significantly favored PT. Even when controlling for relative expectancy, the facet-level differential response favoring PT in depressed mood and anhedonia remained significant.
Discussion
The findings highlight several limitations of the QIDS-SR 16, particularly its reliance on sum scores and imprecisely worded or compound items, which may mask clinically important treatment differences. The study's granular analysis revealed a clearer picture of differential treatment response, identifying specific symptoms and facets of depression that are more sensitive to PT. PT's superior efficacy in reducing depressed mood and anhedonia, particularly when considering the significant difference in libido, suggests its potential as a valuable treatment for core aspects of depression. These findings emphasize the need for more comprehensive and granular approaches to depression measurement, moving beyond reliance on single scales and sum scores to capture the multidimensional nature of the condition and to allow for the precise identification of different treatment efficacies.
Conclusion
This study demonstrates the limitations of the QIDS-SR 16 in detecting differential treatment response in depression trials. The higher variance, imprecise compound items, and lack of focus on core depression factors contribute to its insensitivity. Granular analyses, at item, facet, and factor levels, provide a more nuanced understanding of treatment effects, highlighting PT's potential advantages in targeting core depressive symptoms like depressed mood and anhedonia. Future research should focus on developing more comprehensive scales that incorporate these insights and use more granular approaches to analyzing treatment response data. Further research is needed to replicate these findings and explore the clinical implications of these results.
Limitations
The study's limitations include the reliance on a single trial's dataset and the possibility of type I error due to post-hoc analyses. The small sample size may affect the generalizability of findings. The facet-level analysis was based on an EFA-derived factor structure not yet fully validated. Finally, the study did not account for clinician expectancy or other rater biases.
Related Publications
Explore these studies to deepen your understanding of the subject.