logo
ResearchBunny Logo
Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

Medicine and Health

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

K. Schultebraucks, M. Qian, et al.

This exciting study leverages machine learning to analyze pre-deployment data from 473 active-duty Army personnel, revealing strong predictors of PTSD following deployment. With predictive models achieving remarkable accuracy, the findings promise to enhance deployment readiness and inform interventions, as researched by authors including Katharina Schultebraucks and Amit Etkin.

00:00
00:00
~3 min • Beginner • English
Introduction
Active-duty soldiers face repeated life-threatening exposures and combat-related stressors that elevate PTSD risk relative to civilians. Mitigating deployment-related PTSD requires identifying modifiable pre-deployment factors. Prior studies suggest biological, cognitive, and symptom-based markers may be informative, yet comprehensive multivariate prediction prior to exposure remains underexplored. This study asks whether a broad panel of pre-deployment variables—including multi-omic blood biomarkers, neurocognitive tests, and self-reported symptoms—can predict post-deployment PTSD symptom trajectories over 90–180 days after return from a 10-month Afghanistan deployment, and whether these same variables can predict screening positive for a provisional PTSD diagnosis in that window. The goal is to support deployment readiness assessment and guide risk-mitigation interventions by developing accurate machine-learning models using pre-deployment data.
Literature Review
Recent work indicates pre-deployment risk factors for PTSD are identifiable and potentially modifiable. Associations have been reported for inflammatory and metabolomic alterations, epigenetically altered networks, and a PTSD polygenic risk score. Neurocognitive dysfunction and pre-deployment symptom reports (e.g., nightmares, mental health status) have been linked to later PTSD. Machine-learning studies in military populations show that nonlinear, interacting combinations of heterogeneous factors can be predictive of psychiatric outcomes, including PTSD, and that modeling PTSD symptom development as trajectories captures heterogeneity in temporal courses. These findings motivate a multivariate, data-driven approach combining biological, cognitive, and psychometric predictors to forecast deployment-related PTSD.
Methodology
Design and participants: Prospective, naturalistic longitudinal cohort study of N=473 active-duty Army personnel from the 101st Airborne (Fort Campbell, Kentucky) deployed to Afghanistan in February 2014. Participants were assessed at three phases: Phase 1 (pre-deployment), Phase 2 (approximately 3 days post-return), and Phase 3 (90–180 days post-deployment). Inclusion/exclusion criteria and flow are in supplementary materials. Ethical approvals were obtained from NYU Grossman School of Medicine IRB, the U.S. Army Human Research Protection Office, and Army Command at Fort Campbell; informed consent obtained. Measures: Demographics and military service history; multi-omic blood biomarkers (whole blood, plasma, serum, buffy coat) including routine clinical labs (CBC, lipid panel, liver function), inflammatory markers, metabolomics, DNA methylation, and a PTSD polygenic risk score (PRS) derived from GWAS data available on 1600 participants; computerized neurocognitive assessment (WebNeuro) capturing sustained attention, inhibitory control, cognitive flexibility, processing speed; self-report instruments: PTSD Checklist for DSM-5 (PCL-5), Patient Health Questionnaire (PHQ-8), Generalized Anxiety Disorder scale (GAD-7), Alcohol Use Disorders Identification Test (AUDIT), Pittsburgh Sleep Quality Index (PSQI), Ohio Traumatic Brain Injury Assessment (TBI status), Concussion Symptoms Inventory (current and lifetime), and Deployment Risk and Resilience Inventory-2 (DRRI-2; warzone exposure). Outcomes: (1) Membership in latent PTSD symptom trajectories over Phases 1–3 based on PCL-5 scores, identified via latent growth mixture modeling (LGMM); (2) Provisional PTSD diagnosis at Phase 3 defined by PCL-5 total score ≥31. Latent trajectory modeling: LGMM (Mplus v7) was applied to all participants with PCL-5 at Phases 1 and 3 (using Phase 2 when available). Model selection followed recommended fit criteria; a two-class linear solution with fixed slope was best (entropy 0.98), yielding an “increasing” trajectory (N=43, 9.1%) and a “resilient” trajectory (N=430, 90.9%). Machine learning: Two classification tasks were built using random forest (RF; ranger in R) and support vector machine (SVM; kernlab): (a) predict LGMM class membership; (b) predict PCL-5 ≥31 at Phase 3. Data preprocessing in R 3.5.1/RStudio 1.1.456 included dummy-coding categorical variables; handling 15% missingness via bagged CART tree imputation with training and test sets imputed separately to avoid leakage; removing 12 near-zero-variance variables and 6 variables with >45% missing. The dataset was split with stratified random sampling into 75% training and 25% internal test sets; the test set size was powered to detect AUC > 0.78 at alpha 0.05 with 90% power. Models were tuned via bootstrap resampling with 25 repeats. RF used 1000 trees, minimum node size = 1, and tuned number of variables per split and splitting rule over 100 random combinations. SVM hyperparameters (sigma, cost) were tuned by random search over 100 combinations. Performance metrics included AUC with 95% CI, sensitivity, specificity, confusion matrices; precision-recall and benchmark comparisons are in supplements. Predictor importance was estimated via permutation-based ranking with p values. Statistical tests: One-sided DeLong tests compared model AUCs against non-informative classifiers. Group differences in DRRI-2 combat exposure across LGMM classes were tested (t-tests).
Key Findings
- LGMM identified two PTSD symptom trajectories over Phases 1–3: “increasing” (N=43; 9.1%) and “resilient” (N=430; 90.9%); entropy = 0.98. - Prediction of trajectory membership from pre-deployment data: - Random Forest: Training out-of-bag AUC = 0.79 (SD = 0.07); Internal test AUC = 0.85 (95% CI 0.75–0.96), sensitivity = 0.80, specificity = 0.69. - SVM: Internal test AUC = 0.87 (95% CI 0.79–0.96), sensitivity = 0.80, specificity = 0.85. - Prediction of provisional PTSD diagnosis (PCL-5 ≥31) at Phase 3 from pre-deployment data: - Base rate: 7.6% screened positive (36/473); 92.4% had no/few PTSD symptoms at Phase 3. - Random Forest: Training out-of-bag AUC = 0.78 (SD = 0.08); Internal test AUC = 0.78 (95% CI 0.67–0.89), sensitivity = 0.78, specificity = 0.71. - SVM: Internal test AUC = 0.88 (95% CI 0.78–0.98), sensitivity = 0.89, specificity = 0.79. - Model significance versus non-informative classifiers (DeLong tests): - Trajectory outcome: Z = 6.6476, p = 1.489e-11. - Provisional PTSD outcome: Z = 4.9214, p = 4.297e-07. - Top-ranked pre-deployment predictors included poor sleep quality (PSQI), higher anxiety (GAD-7), higher depression (PHQ-8), and neurocognitive measures of sustained attention and cognitive flexibility. Blood-based biomarkers (metabolites, epigenomic, immune/inflammatory, and liver function markers) complemented these predictors. - Warzone exposure during the index deployment (DRRI-2, Combat Experiences) differed by trajectory: the “increasing” group reported higher exposure (mean 28.89±10.84) than the “resilient” group (mean 23.90±7.01), t(248) = 2.85, p = 0.005.
Discussion
Findings demonstrate that a comprehensive pre-deployment panel spanning symptoms, cognition, and multi-omic blood markers can meaningfully predict both longitudinal PTSD symptom trajectories and screening-level PTSD status 90–180 days after deployment. The ability to forecast who will follow an increasing trajectory or cross a clinical cutoff supports the feasibility of pre-deployment risk stratification. The prominence of sleep quality, anxiety, depression, and cognitive control measures suggests modifiable psychological and neurocognitive targets for early intervention. Multi-omic biomarkers added complementary predictive signal, consistent with a systems-biology view of stress susceptibility. Significant differences in combat exposure between trajectory groups underscore that both pre-deployment vulnerability and deployment experiences shape outcomes; nevertheless, predictive signal was present prior to deployment. Collectively, these results support using multivariate machine learning to integrate heterogeneous risk factors for practical deployment readiness assessment and targeted prevention.
Conclusion
This study shows that pre-deployment multidomain data can accurately predict post-deployment PTSD symptom trajectories and provisional PTSD screening status in active-duty soldiers. Sleep disturbance, anxiety, depression, and cognitive control deficits emerged as leading predictors, augmented by metabolomic, epigenomic, immune/inflammatory, and liver function markers. These models could inform deployment readiness and guide pre-deployment interventions (e.g., sleep optimization, anxiety/depression management, cognitive training) to mitigate PTSD risk. Future work should externally validate models across units and theaters, extend follow-up windows, refine interpretable biomarker panels, integrate detailed deployment exposures, and evaluate whether targeted pre-deployment interventions based on risk profiles reduce incident PTSD.
Limitations
- Generalizability: Single cohort from the 101st Airborne with a specific deployment context; no external validation cohort reported. - Outcome assessment: Provisional PTSD defined by PCL-5 cutoff rather than diagnostic interview; trajectories derived from three time points limit modeling of nonlinear symptom courses. - Class imbalance and small positive classes: Only 9.1% in the increasing trajectory (N=43) and 7.6% screening positive (N=36), which can affect model stability despite stratification and bootstrap safeguards. - Missing data and feature filtering: Approximately 15% missingness handled via imputation; variables with near-zero variance and those with >45% missing were removed, which may bias variable availability and importance. - Potential overfitting: Although bootstrapping and an internal test set were used, absence of external validation limits certainty about transportability. - Multi-omics breadth: While diverse biomarkers were included, platform- and batch-related effects and biological interpretability of feature importance are not fully addressed in the excerpt.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny