logo
ResearchBunny Logo
Association between sleep duration, depression and breast cancer in the United States: a national health and nutrition examination survey analysis 2009–2018

Medicine and Health

Association between sleep duration, depression and breast cancer in the United States: a national health and nutrition examination survey analysis 2009–2018

Y. Cai, Y. Zhaoxiong, et al.

This groundbreaking study by Yufan Cai, Yizhou Zhaoxiong, Wei Zhu, and Haiyu Wang explores the intriguing relationship between sleep duration, depression, and breast cancer using NHANES data from 2009 to 2018. Discover how depression is linked to breast cancer risk, while sleep duration seems to have no significant impact. The study also showcases impressive machine learning predictions with AdaBoost leading the charge.

00:00
00:00
~3 min • Beginner • English
Introduction
Breast cancer remains a leading cause of cancer incidence and mortality globally, imposing substantial health and disability burdens. Multiple lifestyle, reproductive, and healthcare-access factors contribute to risk. Prior epidemiologic studies report mixed findings regarding sleep duration and breast cancer risk; some link night-shift work with higher risk, while meta-analyses have shown no association for sleep duration, and Mendelian randomization work suggests potential adverse effects of longer sleep. Depression is prevalent worldwide and has been implicated in breast cancer prognosis and possibly incidence; MR studies suggest a causal link between depression and increased breast cancer risk. Given contradictory evidence, this study aims to clarify associations between sleep duration, depression, and breast cancer using nationally representative NHANES data, and to develop machine-learning models using accessible clinical and demographic variables to predict breast cancer.
Literature Review
The introduction synthesizes prior evidence: night-shift work has been associated with higher breast cancer risk; a 2014 meta-analysis reported no association between sleep duration and breast cancer; a Mendelian randomization study indicated increased risk with longer sleep duration per hour. Depression prevalence has risen substantially and contributes significantly to global disease burden. MR studies suggest genetically predicted depression is associated with higher breast cancer risk (OR around 1.09), and meta-analyses highlight depression and anxiety as independent predictors of recurrence and survival in breast cancer. Large prospective cohorts and the Million Women Study found no association between sleep duration and incident breast cancer, indicating ongoing controversy and the need for further evaluation in representative datasets.
Methodology
Study design and data source: Cross-sectional analysis of five NHANES cycles (2009–2010, 2011–2012, 2013–2014, 2015–2016, 2017–2018) conducted by the US CDC. NHANES collects demographic, socioeconomic, health, and examination data via household interviews and Mobile Examination Center visits. Ethical approval was granted by the NCHS ERB; written informed consent was obtained. Sample: Of 55,018 participants across cycles, after applying inclusion/exclusion criteria and handling missingness, 1,789 participants were included; 263 reported breast cancer. Inclusion: participants from 2009–2018 with comparable data and relevant tests and reported breast cancer outcome. Exclusion: males; samples outside 2009–2018; missing relevant examinations. Outcome: Self-reported cancer type (“What kind of cancer?”) used to identify breast cancer. Exposures: Depression assessed via PHQ-9; total score 0–27; depression defined as PHQ-9 ≥10. Sleep duration collected via “How much sleep do you usually get at night on weekdays or workdays?” Categorized as short (<7 h), normal (7–9 h, reference), long (>9 h). Covariates: Age (continuous), sex, race/ethnicity (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, Other), education (five levels), marital status (seven categories), family income-to-poverty ratio (PIR, continuous), BMI (kg/m²; <25, 25–30, ≥30), smoking (yes/no), alcohol drinking (yes/no), hypertension (SBP ≥140 and/or DBP ≥90 mmHg), and diabetes (yes, no, borderline). Blood pressure measured as the mean of three readings. Statistical analysis: Descriptive statistics used means ± SD, medians (IQR), or proportions. Group comparisons used Student’s t-test for continuous variables and chi-square tests for categorical variables. Multivariable logistic regression estimated odds ratios (ORs) and 95% CIs for associations of depression and sleep duration with breast cancer across four models: Model 1 unadjusted; Model 2 mutually adjusted for sleep duration/depression; Model 3 additionally adjusted for age, sex, PIR, education, race, marital status, BMI, hypertension, diabetes; Model 4 further adjusted for smoking and alcohol. Machine learning: To enhance predictive modeling with complete variables, random forest-based imputation was applied to variables with severe missingness (hypertension and alcohol consumption), with density plots indicating good imputation fidelity. Six algorithms were trained: AdaBoost, Random Forest, Boost tree, Artificial Neural Network, Extreme Gradient Boosting (XGBoost), and Support Vector Machine. Model comparisons used prediction accuracy and Cohen’s kappa; performance was evaluated via ROC and calibration curves, reporting AUCs with 95% CIs. Analyses conducted in R 4.1.2; p<0.05 considered significant.
Key Findings
Sample: 1,789 participants; 263 (14.7%) reported breast cancer; mean age 67.0 years in breast cancer group. Between-group differences: BMI distribution, race, and smoking differed significantly (p<0.05). Association results (fully adjusted Model 4): Depression associated with higher odds of breast cancer (OR = 1.99; 95% CI: 1.55–3.51). Sleep duration not significantly associated: short sleep <7 h OR = 1.25 (95% CI: 0.85–1.37) vs 7–9 h; long sleep >9 h OR = 1.05 (95% CI: 0.95–1.15). Machine learning performance: AUCs: AdaBoost 0.84 (95% CI: 0.81–0.87), Random Forest 0.84, Boost tree 0.844, ANN 0.83, XGBoost 0.82, SVM 0.80–0.82. AdaBoost had the best overall performance with high sensitivity (0.97; 95% CI: 0.94–0.98) and good calibration; specificity around 0.64.
Discussion
Findings address the research questions by demonstrating a significant association between depression and breast cancer in a nationally representative sample, aligning with prior MR and epidemiologic evidence suggesting depression may influence breast cancer risk through inflammatory and stress-related biological pathways. Conversely, no significant association was observed between sleep duration and breast cancer after multivariable adjustment, consistent with large cohort studies and meta-analyses reporting null findings, despite some studies indicating risks with short or long sleep. The predictive modeling shows that accessible clinical and demographic data can be leveraged with machine-learning algorithms—particularly AdaBoost—to identify individuals at higher risk, potentially aiding stratification and preventive strategies. The results underscore the importance of assessing mental health in breast cancer risk evaluation and support further causal investigations.
Conclusion
Depression is associated with increased breast cancer risk, whereas sleep duration shows no significant association in this NHANES analysis. Machine-learning, especially the AdaBoost algorithm, can accurately predict breast cancer occurrence using readily available clinical data. Future work should employ large, prospective cohort designs to clarify causal relationships among sleep, depression, and breast cancer, incorporate objective sleep measures and broader sleep quality metrics, and further refine predictive models using comprehensive electronic health records.
Limitations
The cross-sectional design precludes causal inference and raises potential for reverse causation or bidirectional effects. Sleep duration was self-reported rather than objectively measured; other sleep characteristics (e.g., disturbances, quality) were unavailable. Although multiple covariates were controlled, residual confounding may remain. Missing data were addressed via random forest imputation for some variables, which may introduce bias. Generalizability may be limited to populations comparable to NHANES participants and the reduced analytic sample after exclusions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny